linux

Commit Graph

Author	SHA1	Message	Date
Denis Drakhnia	2b0d079aa6	e2k: fix stat definition	2024-05-27 07:44:00 +03:00
Alibek Omarov	c1ce54bceb	Linux 5.4.193 with MCST patches (6.2)	2023-03-25 04:34:12 +03:00
Alibek Omarov	aeb4689c20	Bundle lttng-modules-2.11.7	2023-03-25 04:23:17 +03:00
Tom Zanussi	34946aa335	Linux 5.4.193-rt74 REBASE Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:38 +03:00
Tom Zanussi	9346d4234f	eventfd: Fix stable-rt v5.4.182-rt71 conflict fixup issue This fixes an issue in stable-rt release v5.4.182-rt71 where a hunk from the context diff was inadvertently included in a conflict fixup where it shouldn't have been. Remove those lines that don't belong. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:38 +03:00
Xie Yongji	3367ff1f04	aio: Fix incorrect usage of eventfd_signal_allowed() [ Upstream commmit 4b3749865374899e115aa8c48681709b086fe6d3 ] We should defer eventfd_signal() to the workqueue when eventfd_signal_allowed() return false rather than return true. Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:38 +03:00
Thomas Gleixner	e2ea925c8f	eventfd: Make signal recursion protection a task bit [ Upstream commit b542e383d8c005f06a131e2b40d5889b812f19c6 ] The recursion protection for eventfd_signal() is based on a per CPU variable and relies on the !RT semantics of spin_lock_irqsave() for protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither disables preemption nor interrupts which allows the spin lock held section to be preempted. If the preempting task invokes eventfd_signal() as well, then the recursion warning triggers. Paolo suggested to protect the per CPU variable with a local lock, but that's heavyweight and actually not necessary. The goal of this protection is to prevent the task stack from overflowing, which can be achieved with a per task recursion protection as well. Replace the per CPU variable with a per task bit similar to other recursion protection bits like task_struct::in_page_owner. This works on both !RT and RT kernels and removes as a side effect the extra per CPU storage. No functional change for !RT kernels. Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx Signed-off-by: Tom Zanussi <zanussi@kernel.org> Conflicts: fs/aio.c include/linux/sched.h	2023-03-25 04:21:38 +03:00
Sebastian Andrzej Siewior	793390ed19	locking: Drop might_resched() from might_sleep_no_state_check() [ Upstream 5.10 commit e88f48e796b2286b565ee95ca8c46f32e051cd8c ] might_sleep_no_state_check() serves the same purpose as might_sleep() except it is used before sleeping locks are acquired and therefore does not check task_struct::state because the state is preserved. That state is preserved in the locking slow path so we must not schedule at the begin of the locking function because the state will be lost and not preserved at that time. Remove might_resched() from might_sleep_no_state_check() to avoid losing the state before it is preserved. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	ac3887f881	fscache: Use only one fscache_object_cong_wait. [ Upstream commit 514342eb43a760575d6d9a366506a41ab7ec4888 ] This is an update of the original patch, removing put_cpu_var() which was overseen in the initial patch. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	0dc66825a3	fscache: Use only one fscache_object_cong_wait. [ Upstream commit 74920695ab51a6d180dcd6554193cc8427758360 ] In the commit mentioned below, fscache was converted from slow-work to workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed() did not use a per-CPU workqueue. They choose from two global waitqueues depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always one waitqueue. I can't find out how it is ensured that a waiter on certain CPU is woken up be the other side. My guess is that the timeout in schedule_timeout() ensures that it does not wait forever (or a random wake up). fscache_object_sleep_till_congested() must be invoked from preemptible context in order for schedule() to work. In this case this_cpu_ptr() should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is bound to one CPU. wake_up() wakes only one waiter and I'm not sure if it is guaranteed that only one waiter exists. Replace the per-CPU waitqueue with one global waitqueue. Fixes: `8b8edefa2f` ("fscache: convert object to use workqueue instead of slow-work") Reported-by: Gregor Beck <gregor.beck@gmail.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	ce417fbc98	mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT [ Upstream commit aae93144898af113331668f53f80cb83f5a07360 ] TRANSPARENT_HUGEPAGE: There are potential non-deterministic delays to an RT thread if a critical memory region is not THP-aligned and a non-RT buffer is located in the same hugepage-aligned region. It's also possible for an unrelated thread to migrate pages belonging to an RT task incurring unexpected page faults due to memory defragmentation even if khugepaged is disabled. Regular HUGEPAGEs are not affected by this can be used. NUMA_BALANCING: There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault samples, increased page faults of regions even if mlocked and non-deterministic delays when migrating pages. [Mel Gorman worded 99% of the commit description]. Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/ Cc: stable-rt@vger.kernel.org Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	b9ba466b1b	preempt: Move preempt_enable_no_resched() to the RT block [ Upstream commit 1a45b3551ef852193c3d338888132c4925d0690d ] preempt_enable_no_resched() should point to preempt_enable() on PREEMPT_RT so nobody is playing any preempt tricks and enables preemption without checking for the need-resched flag. This was misplaced in v3.14.0-rt1 und remained unnoticed until now. Point preempt_enable_no_resched() and preempt_enable() on RT. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	df6e402903	sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD [ Upstream commit 39609ed79d420e0b966e16a1d695733c2d3b9a7f ] With PREEMPT_RT enabled all hrtimers callbacks will be invoked in softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD. During boot kthread_bind() is used for the creation of per-CPU threads and then hangs in wait_task_inactive() if the ksoftirqd is not yet up and running. The hang disappeared since commit `26c7295be0` ("kthread: Do not preempt current task if it is going to call schedule()") but enabling function trace on boot reliably leads to the freeze on boot behaviour again. The timer in wait_task_inactive() can not be directly used by an user interface to abuse it and create a mass wake of several tasks at the same time which would to long sections with disabled interrupts. Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD. Switch the timer to HRTIMER_MODE_REL_HARD. Cc: stable-rt@vger.kernel.org Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Mike Galbraith	352375c53b	mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t [ Upstream 5.10 commit f2d9006d27c9b12563b8e577951ff5021f3b36b2 ] local_lock_t becoming a synonym of spinlock_t had consequences for the RT mods to zsmalloc, which were taking a mutex while holding a local_lock, inspiring a lockdep "BUG: Invalid wait context" gripe. Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Gregor Beck	a0f0e6701c	fscache: fix initialisation of cookie hash table raw spinlocks The original patch, 602660600bcd ("fscache: initialize cookie hash table raw spinlocks"), subtracted 1 from the shift and so still left some spinlocks uninitialized. This fixes that. [zanussi: Added changelog text] Signed-off-by: Gregor Beck <gregor.beck@gmail.com> Fixes: 602660600bcd ("fscache: initialize cookie hash table raw spinlocks") Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Andrew Halaney	22562d5988	locking/rwsem-rt: Remove might_sleep() in __up_read() [ Upstream commit b2ed0a4302faf2bb09e97529dd274233c082689b ] There's no chance of sleeping here, the reader is giving up the lock and possibly waking up the writer who is waiting on it. Reported-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	f57378010e	mm: slub: Don't resize the location tracking cache on PREEMPT_RT [ Upstream commit 87bd0bf324f4c5468ea3d1de0482589f491f3145 ] The location tracking cache has a size of a page and is resized if its current size is too small. This allocation happens with disabled interrupts and can't happen on PREEMPT_RT. Should one page be too small, then we have to allocate more at the beginning. The only downside is that less callers will be visible. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tom Zanussi <zanussi@kernel.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	a43acd1e98	locking/rwsem-rt: Add __down_read_interruptible() The stable backported a patch which adds __down_read_interruptible() for the generic rwsem implementation. Add RT's version __down_read_interruptible(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:37 +03:00
Zanxiong Qiu	c2014f70a6	mm/swap: use local lock in deactivate_page() get_cpu_var() calls preempt_disable(), while on RT kernel, pagevec_lru_move_fn() will call spinlock and might schedule the context out and hence the schedule bug occurred, issue is found on 5.4.70-rt40 and reproducable on 5.4.74-rt41. 32154a0abcc ("mm: Revert the DEFINE_PER_CPU_PAGEVEC implementation") reverted the lock/unlock_swap_pvec function, however, deactivate_page() part was missed at that time as it's newly added in v5.4. Link: https://lore.kernel.org/r/20201127135456.8145-1-zqiu2000@126.com Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Zanxiong Qiu <zqiu2000@126.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior	18d5a2ded1	Revert "hrtimer: Allow raw wakeups during boot" This change is no longer needed since commit `26c7295be0` ("kthread: Do not preempt current task if it is going to call schedule()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:37 +03:00
Steven Rostedt (VMware)	1d895e6e82	Revert "net: Properly annotate the try-lock for the seqlock" This reverts commit 3971227b5af04e6c34ef7b47b2ebe941727563a0. Link: https://lore.kernel.org/r/20201116171958.2opbksmgbznrjxu2@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	5564fa35b3	timers: Don't block on ->expiry_lock for TIMER_IRQSAFE PREEMPT_RT does not spin and wait until a running timer completes its callback but instead it blocks on a sleeping lock to prevent a deadlock. This blocking can not be done for workqueue's IRQ_SAFE timer which will be canceled in an IRQ-off region. It has to happen to in IRQ-off region because changing the PENDING bit and clearing the timer must not be interrupted to avoid a busy-loop. The callback invocation of IRQSAFE timer is not preempted on PREEMPT_RT so there is no need to synchronize on timer_base::expiry_lock. Don't acquire the timer_base::expiry_lock for TIMER_IRQSAFE flagged timer. Add a lockdep annotation to ensure that this function is always invoked in preemptible context on PREEMPT_RT. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Oleg Nesterov	6bd4eef1d8	ptrace: fix ptrace_unfreeze_traced() race with rt-lock The patch "ptrace: fix ptrace vs tasklist_lock race" changed ptrace_freeze_traced() to take task->saved_state into account, but ptrace_unfreeze_traced() has the same problem and needs a similar fix: it should check/update both ->state and ->saved_state. Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Fixes: "ptrace: fix ptrace vs tasklist_lock race" Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	bc0a27b923	mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state() The callers expect disabled preemption/interrupts while invoking __mod_memcg_lruvec_state(). This works mainline because a lock of somekind is acquired. Use preempt_disable_rt() where per-CPU variables are accessed and a stable pointer is expected. This is also done in __mod_zone_page_state() for the same reason. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	13aea34e50	net: Properly annotate the try-lock for the seqlock In patch ("net/Qdisc: use a seqlock instead seqcount") the seqcount has been replaced with a seqlock to allow to reader to boost the preempted writer. The try_write_seqlock() acquired the lock with a try-lock but the seqcount annotation was "lock". Opencode write_seqcount_t_begin() and use the try-lock annotation for lockdep. Reported-by: Mike Galbraith <efault@gmx.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	36c2e4d093	rwsem: Provide down_read_non_owner() and up_read_non_owner() for -RT The rwsem implementation on -RT allows multiple reader and there is no owner tracking anymore. We can provide down_read_non_owner() and up_read_non_owner() by skipping the owner check bits which are only available in the !RT implementation. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Ahmed S. Darwish	3de29a2a9e	net: phy: fixed_phy: Remove unused seqcount Commit `bf7afb29d5` ("phy: improve safety of fixed-phy MII register reading") protected the fixed PHY status with a sequence counter. Two years later, commit `d2b977939b` ("net: phy: fixed-phy: remove fixed_phy_update_state()") removed the sequence counter's write side critical section -- neutralizing its read side retry loop. Remove the unused seqcount. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from v5.8-rc1 commit `79cbb6bc33`) Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	6967cfe198	Bluetooth: Acquire sk_lock.slock without disabling interrupts [ Upstream commit `e6da0edc24` ] There was a lockdep which led to commit `fad003b6c8` ("Bluetooth: Fix inconsistent lock state with RFCOMM") Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling the softirq while the lock was also used in softirq context. Unfortunately the solution back then was to disable interrupts before acquiring the lock which however made lockdep happy. It would have been enough to simply disable the softirq. Disabling interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT because these locks are converted to 'sleeping' spinlocks. Use spin_lock_bh() in order to acquire the `sk_lock.slock'. Cc: stable-rt@vger.kernel.org Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Reported-by: kbuild test robot <lkp@intel.com> [missing unlock] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	58d58190a1	workqueue: Sync with upstream This is an all-on-one patch reverting the following commits: workqueue: Don't assume that the callback has interrupts disabled sched/swait: Add swait_event_lock_irq() workqueue: Use swait for wq_manager_wait workqueue: Convert the locks to raw type and introducing the following commits from upstream: workqueue: Use rcuwait for wq_manager_wait workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t as an replacement. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Matt Fleming	f958e52689	signal: Prevent double-free of user struct The way user struct reference counting works changed significantly with, `fda31c5029` ("signal: avoid double atomic counter increments for user accounting") Now user structs are only freed once the last pending signal is dequeued. Make sigqueue_free_current() follow this new convention to avoid freeing the user struct multiple times and triggering this warning: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50 Call Trace: refcount_dec_and_lock_irqsave+0x16/0x60 free_uid+0x31/0xa0 __dequeue_signal+0x17c/0x190 dequeue_signal+0x5a/0x1b0 do_sigtimedwait+0x208/0x250 __x64_sys_rt_sigtimedwait+0x6f/0xd0 do_syscall_64+0x72/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior	5dd3c8665f	mm/zswap: Use local lock to protect per-CPU data This is an incremental update of the zswap patch. Addtional spots were identified, which were lacking proper locking, during the rework of the patch for upstream. The complete patch description is available as commit 79410590ae87e ("mm/zswap: Use local lock to protect per-CPU data") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
汪勇10269566	8ea11731c2	printk: Force a line break on pr_cont(" ") Since the printk rework, pr_cont("\n") will not lead to a line break. A new line will only be created if - cpu != c->cpu_owner \|\| !(flags & LOG_CONT) - c->len + len > sizeof(c->buf) Flush the buffer to enforce a new line on pr_cont(). [bigeasy: reword commit message ] Signed-off-by: 汪勇10269566 <wang.yong12@zte.com.cn> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:36 +03:00
Kevin Hao	71174ad5ad	mm: slub: Always flush the delayed empty slubs in flush_all() After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions"), when the free_slab() is invoked with the IRQ disabled, the empty slubs are moved to a per-CPU list and will be freed after IRQ enabled later. But in the current codes, there is a check to see if there really has the cpu slub on a specific cpu before flushing the delayed empty slubs, this may cause a reference of already released kmem_cache in a scenario like below: cpu 0 cpu 1 kmem_cache_destroy() flush_all() --->IPI flush_cpu_slab() flush_slab() deactivate_slab() discard_slab() free_slab() c->page = NULL; for_each_online_cpu(cpu) if (!has_cpu_slab(1, s)) continue this skip to flush the delayed empty slub released by cpu1 kmem_cache_free(kmem_cache, s) kmalloc() __slab_alloc() free_delayed() __free_slab() reference to released kmem_cache Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:35 +03:00
Liwei Song	4ed63caeb8	mm: Don't warn about atomic memory allocations during suspend The ACPI code allocates larger amount of memory during resume. This triggers a warning because the allocation happens with disabled interrupts. At this stage only one CPU is active so there should be no lock contention. If SLUB needs to call into the buddy allocator for more memory then it should not enable interrupts. Limit the check to system state with more CPUs and scheduling and only enable interrupts in SLUB at this stage. Signed-off-by: Liwei Song <liwei.song@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [bigeasy: commit description, allocate_slab() hunk] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Sebastian Andrzej Siewior	7f850a018e	fs/dcache: Include swait.h header Include the swait.h header so it compiles even if not all patches are applied. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:35 +03:00
John Ogness	cbf31f0ca6	printk: console must not schedule for drivers Even though the printk kthread is always preemptible, it is still not allowed to call cond_resched() from within console drivers. The task may become non-preemptible in the console driver call chain. For example, vt_console_print() takes a spinlock and then can call into fbcon_redraw(), which can conditionally invoke cond_resched(): \|BUG: sleeping function called from invalid context at kernel/printk/printk.c:2322 \|in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 177, name: printk \|CPU: 0 PID: 177 Comm: printk Not tainted 5.6.2-00011-ga536059557f1d9 #1 \|Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 \|Call Trace: \| dump_stack+0x66/0x8b \| ___might_sleep+0x102/0x120 \| console_conditional_schedule+0x24/0x30 \| fbcon_redraw+0x96/0x1c0 \| fbcon_scroll+0x556/0xd70 \| con_scroll+0x147/0x1e0 \| lf+0x9e/0xb0 \| vt_console_print+0x253/0x3d0 \| printk_kthread_func+0x1d5/0x3b0 Disable cond_resched() for the call into the console drivers. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2023-03-25 04:21:35 +03:00
Thomas Gleixner	7c4ece8254	Add localversion for -RT release Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:35 +03:00
Clark Williams	894114838d	sysfs: Add /sys/kernel/realtime entry Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2023-03-25 04:21:35 +03:00
Ingo Molnar	77c67bcd8a	genirq: Disable irqpoll on -rt Creates long latencies for no value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:35 +03:00
Thomas Gleixner	f9d7455782	signals: Allow rt tasks to cache one sigqueue struct To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:35 +03:00
Haris Okanovic	fdeac550bc	tpm_tis: fix stall after iowrite()s ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic <haris.okanovic@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Julia Cartwright	d04a7ec191	squashfs: make use of local lock in multi_cpu decompressor Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Cc: stable-rt@vger.kernel.org Reported-by: Alexander Stein <alexander.stein@systec-electronic.com> Tested-by: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Mike Galbraith	bc7cc87312	drivers/zram: Don't disable preemption in zcomp_stream_get/put() In v4.7, the driver switched to percpu compression streams, disabling preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We also have to fix an lock order issue in zram_decompress_page() such that zs_map_object() nests inside of zcomp_stream_put() as it does in zram_bvec_write(). Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> [bigeasy: get_locked_var() -> per zcomp_strm lock] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Mike Galbraith	c6cc729f9a	drivers/block/zram: Replace bit spinlocks with rtmutex for -rt They're nondeterministic, and lead to ___might_sleep() splats in -rt. OTOH, they're a lot less wasteful than an rtmutex per page. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Mike Galbraith	caa6dbd8c9	connector/cn_proc: Protect send_msg() with a local lock on RT \|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931 \|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep \|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140 \| \|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106 \|Call Trace: \| [<ffffffff813436cd>] dump_stack+0x65/0x88 \| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180 \| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50 \| [<ffffffff81640978>] rt_read_lock+0x28/0x30 \| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0 \| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90 \| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20 \| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0 \| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20 \| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140 \| [<ffffffff81077f71>] do_exit+0x5d1/0xba0 \| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0 \| [<ffffffff81078654>] SyS_exit_group+0x14/0x20 \| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Since `ab8ed95108` ("connector: fix out-of-order cn_proc netlink message delivery") which is v4.7-rc6. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:35 +03:00
Thomas Gleixner	8ca7ff188f	mips: Disable highmem on RT The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior	a0aa1749a6	POWERPC: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior	17dfb2be0b	powerpc/stackprotector: work around stack-guard init from atomic This is invoked from the secondary CPU in atomic context. On x86 we use tsc instead. On Power we XOR it against mftb() so lets use stack address as the initial value. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:34 +03:00
Thomas Gleixner	a45b2dd64c	powerpc: Disable highmem on RT The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:34 +03:00
Bogdan Purcareata	d9914f69e3	powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT While converting the openpic emulation code to use a raw_spinlock_t enables guests to run on RT, there's still a performance issue. For interrupts sent in directed delivery mode with a multiple CPU mask, the emulated openpic will loop through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop through all the pending interrupts for that VCPU. This is done while holding the raw_lock, meaning that in all this time the interrupts and preemption are disabled on the host Linux. A malicious user app can max both these number and cause a DoS. This temporary fix is sent for two reasons. First is so that users who want to use the in-kernel MPIC emulation are aware of the potential latencies, thus making sure that the hardware MPIC and their usage scenario does not involve interrupts sent in directed delivery mode, and the number of possible pending interrupts is kept small. Secondly, this should incentivize the development of a proper openpic emulation that would be better suited for RT. Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:34 +03:00

1 2 3 4 5 ...

892207 Commits All Branches Search

892207 Commits

All Branches