This fixes an issue in stable-rt release v5.4.182-rt71 where a hunk
from the context diff was inadvertently included in a conflict fixup
where it shouldn't have been. Remove those lines that don't belong.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commmit 4b3749865374899e115aa8c48681709b086fe6d3 ]
We should defer eventfd_signal() to the workqueue when
eventfd_signal_allowed() return false rather than return
true.
Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit b542e383d8c005f06a131e2b40d5889b812f19c6 ]
The recursion protection for eventfd_signal() is based on a per CPU
variable and relies on the !RT semantics of spin_lock_irqsave() for
protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
disables preemption nor interrupts which allows the spin lock held section
to be preempted. If the preempting task invokes eventfd_signal() as well,
then the recursion warning triggers.
Paolo suggested to protect the per CPU variable with a local lock, but
that's heavyweight and actually not necessary. The goal of this protection
is to prevent the task stack from overflowing, which can be achieved with a
per task recursion protection as well.
Replace the per CPU variable with a per task bit similar to other recursion
protection bits like task_struct::in_page_owner. This works on both !RT and
RT kernels and removes as a side effect the extra per CPU storage.
No functional change for !RT kernels.
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Conflicts:
fs/aio.c
include/linux/sched.h
[ Upstream 5.10 commit e88f48e796b2286b565ee95ca8c46f32e051cd8c ]
might_sleep_no_state_check() serves the same purpose as might_sleep()
except it is used before sleeping locks are acquired and therefore does
not check task_struct::state because the state is preserved.
That state is preserved in the locking slow path so we must not schedule
at the begin of the locking function because the state will be lost and
not preserved at that time.
Remove might_resched() from might_sleep_no_state_check() to avoid losing the
state before it is preserved.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit 514342eb43a760575d6d9a366506a41ab7ec4888 ]
This is an update of the original patch, removing put_cpu_var() which
was overseen in the initial patch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit 74920695ab51a6d180dcd6554193cc8427758360 ]
In the commit mentioned below, fscache was converted from slow-work to
workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed()
did not use a per-CPU workqueue. They choose from two global waitqueues
depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always
one waitqueue.
I can't find out how it is ensured that a waiter on certain CPU is woken
up be the other side. My guess is that the timeout in schedule_timeout()
ensures that it does not wait forever (or a random wake up).
fscache_object_sleep_till_congested() must be invoked from preemptible
context in order for schedule() to work. In this case this_cpu_ptr()
should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is
bound to one CPU.
wake_up() wakes only one waiter and I'm not sure if it is guaranteed
that only one waiter exists.
Replace the per-CPU waitqueue with one global waitqueue.
Fixes: 8b8edefa2f ("fscache: convert object to use workqueue instead of slow-work")
Reported-by: Gregor Beck <gregor.beck@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit aae93144898af113331668f53f80cb83f5a07360 ]
TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a critical
memory region is not THP-aligned and a non-RT buffer is located in the same
hugepage-aligned region. It's also possible for an unrelated thread to migrate
pages belonging to an RT task incurring unexpected page faults due to memory
defragmentation even if khugepaged is disabled.
Regular HUGEPAGEs are not affected by this can be used.
NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault
samples, increased page faults of regions even if mlocked and non-deterministic
delays when migrating pages.
[Mel Gorman worded 99% of the commit description].
Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/
Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/
Cc: stable-rt@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit 1a45b3551ef852193c3d338888132c4925d0690d ]
preempt_enable_no_resched() should point to preempt_enable() on
PREEMPT_RT so nobody is playing any preempt tricks and enables
preemption without checking for the need-resched flag.
This was misplaced in v3.14.0-rt1 und remained unnoticed until now.
Point preempt_enable_no_resched() and preempt_enable() on RT.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit 39609ed79d420e0b966e16a1d695733c2d3b9a7f ]
With PREEMPT_RT enabled all hrtimers callbacks will be invoked in
softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD.
During boot kthread_bind() is used for the creation of per-CPU threads
and then hangs in wait_task_inactive() if the ksoftirqd is not
yet up and running.
The hang disappeared since commit
26c7295be0 ("kthread: Do not preempt current task if it is going to call schedule()")
but enabling function trace on boot reliably leads to the freeze on boot
behaviour again.
The timer in wait_task_inactive() can not be directly used by an user
interface to abuse it and create a mass wake of several tasks at the
same time which would to long sections with disabled interrupts.
Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD.
Switch the timer to HRTIMER_MODE_REL_HARD.
Cc: stable-rt@vger.kernel.org
Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream 5.10 commit f2d9006d27c9b12563b8e577951ff5021f3b36b2 ]
local_lock_t becoming a synonym of spinlock_t had consequences for the RT
mods to zsmalloc, which were taking a mutex while holding a local_lock,
inspiring a lockdep "BUG: Invalid wait context" gripe.
Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
The original patch, 602660600bcd ("fscache: initialize cookie hash
table raw spinlocks"), subtracted 1 from the shift and so still left
some spinlocks uninitialized. This fixes that.
[zanussi: Added changelog text]
Signed-off-by: Gregor Beck <gregor.beck@gmail.com>
Fixes: 602660600bcd ("fscache: initialize cookie hash table raw spinlocks")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit b2ed0a4302faf2bb09e97529dd274233c082689b ]
There's no chance of sleeping here, the reader is giving up the
lock and possibly waking up the writer who is waiting on it.
Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Upstream commit 87bd0bf324f4c5468ea3d1de0482589f491f3145 ]
The location tracking cache has a size of a page and is resized if its
current size is too small.
This allocation happens with disabled interrupts and can't happen on
PREEMPT_RT.
Should one page be too small, then we have to allocate more at the
beginning. The only downside is that less callers will be visible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
The stable backported a patch which adds __down_read_interruptible() for
the generic rwsem implementation.
Add RT's version __down_read_interruptible().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
get_cpu_var() calls preempt_disable(), while on RT kernel,
pagevec_lru_move_fn() will call spinlock and might schedule
the context out and hence the schedule bug occurred, issue
is found on 5.4.70-rt40 and reproducable on 5.4.74-rt41.
32154a0abcc ("mm: Revert the DEFINE_PER_CPU_PAGEVEC implementation")
reverted the lock/unlock_swap_pvec function, however,
deactivate_page() part was missed at that time as it's newly
added in v5.4.
Link: https://lore.kernel.org/r/20201127135456.8145-1-zqiu2000@126.com
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Zanxiong Qiu <zqiu2000@126.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This change is no longer needed since commit
26c7295be0 ("kthread: Do not preempt current task if it is going to call schedule()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a deadlock.
This blocking can not be done for workqueue's IRQ_SAFE timer which will
be canceled in an IRQ-off region. It has to happen to in IRQ-off region
because changing the PENDING bit and clearing the timer must not be
interrupted to avoid a busy-loop.
The callback invocation of IRQSAFE timer is not preempted on PREEMPT_RT
so there is no need to synchronize on timer_base::expiry_lock.
Don't acquire the timer_base::expiry_lock for TIMER_IRQSAFE flagged
timer.
Add a lockdep annotation to ensure that this function is always invoked
in preemptible context on PREEMPT_RT.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The patch "ptrace: fix ptrace vs tasklist_lock race" changed
ptrace_freeze_traced() to take task->saved_state into account, but
ptrace_unfreeze_traced() has the same problem and needs a similar fix:
it should check/update both ->state and ->saved_state.
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Fixes: "ptrace: fix ptrace vs tasklist_lock race"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.
Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In patch
("net/Qdisc: use a seqlock instead seqcount")
the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".
Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The rwsem implementation on -RT allows multiple reader and there is no
owner tracking anymore.
We can provide down_read_non_owner() and up_read_non_owner() by skipping
the owner check bits which are only available in the !RT implementation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit bf7afb29d5 ("phy: improve safety of fixed-phy MII register
reading") protected the fixed PHY status with a sequence counter.
Two years later, commit d2b977939b ("net: phy: fixed-phy: remove
fixed_phy_update_state()") removed the sequence counter's write side
critical section -- neutralizing its read side retry loop.
Remove the unused seqcount.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from v5.8-rc1 commit 79cbb6bc33)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[ Upstream commit e6da0edc24 ]
There was a lockdep which led to commit
fad003b6c8 ("Bluetooth: Fix inconsistent lock state with RFCOMM")
Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling
the softirq while the lock was also used in softirq context.
Unfortunately the solution back then was to disable interrupts before
acquiring the lock which however made lockdep happy.
It would have been enough to simply disable the softirq. Disabling
interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT
because these locks are converted to 'sleeping' spinlocks.
Use spin_lock_bh() in order to acquire the `sk_lock.slock'.
Cc: stable-rt@vger.kernel.org
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Reported-by: kbuild test robot <lkp@intel.com> [missing unlock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This is an all-on-one patch reverting the following commits:
workqueue: Don't assume that the callback has interrupts disabled
sched/swait: Add swait_event_lock_irq()
workqueue: Use swait for wq_manager_wait
workqueue: Convert the locks to raw type
and introducing the following commits from upstream:
workqueue: Use rcuwait for wq_manager_wait
workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t
as an replacement.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The way user struct reference counting works changed significantly with,
fda31c5029 ("signal: avoid double atomic counter increments for user accounting")
Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
Call Trace:
refcount_dec_and_lock_irqsave+0x16/0x60
free_uid+0x31/0xa0
__dequeue_signal+0x17c/0x190
dequeue_signal+0x5a/0x1b0
do_sigtimedwait+0x208/0x250
__x64_sys_rt_sigtimedwait+0x6f/0xd0
do_syscall_64+0x72/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This is an incremental update of the zswap patch. Addtional spots were
identified, which were lacking proper locking, during the rework of the
patch for upstream.
The complete patch description is available as commit
79410590ae87e ("mm/zswap: Use local lock to protect per-CPU data")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since the printk rework, pr_cont("\n") will not lead to a line break.
A new line will only be created if
- cpu != c->cpu_owner || !(flags & LOG_CONT)
- c->len + len > sizeof(c->buf)
Flush the buffer to enforce a new line on pr_cont().
[bigeasy: reword commit message ]
Signed-off-by: 汪勇10269566 <wang.yong12@zte.com.cn>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
cpu 0 cpu 1
kmem_cache_destroy()
flush_all()
--->IPI flush_cpu_slab()
flush_slab()
deactivate_slab()
discard_slab()
free_slab()
c->page = NULL;
for_each_online_cpu(cpu)
if (!has_cpu_slab(1, s))
continue
this skip to flush the delayed
empty slub released by cpu1
kmem_cache_free(kmem_cache, s)
kmalloc()
__slab_alloc()
free_delayed()
__free_slab()
reference to released kmem_cache
Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ACPI code allocates larger amount of memory during resume. This
triggers a warning because the allocation happens with disabled
interrupts.
At this stage only one CPU is active so there should be no lock
contention. If SLUB needs to call into the buddy allocator for more
memory then it should not enable interrupts.
Limit the check to system state with more CPUs and scheduling and only
enable interrupts in SLUB at this stage.
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bigeasy: commit description, allocate_slab() hunk]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Include the swait.h header so it compiles even if not all patches are
applied.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Even though the printk kthread is always preemptible, it is still not
allowed to call cond_resched() from within console drivers. The
task may become non-preemptible in the console driver call chain. For
example, vt_console_print() takes a spinlock and then can call into
fbcon_redraw(), which can conditionally invoke cond_resched():
|BUG: sleeping function called from invalid context at kernel/printk/printk.c:2322
|in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 177, name: printk
|CPU: 0 PID: 177 Comm: printk Not tainted 5.6.2-00011-ga536059557f1d9 #1
|Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
|Call Trace:
| dump_stack+0x66/0x8b
| ___might_sleep+0x102/0x120
| console_conditional_schedule+0x24/0x30
| fbcon_redraw+0x96/0x1c0
| fbcon_scroll+0x556/0xd70
| con_scroll+0x147/0x1e0
| lf+0x9e/0xb0
| vt_console_print+0x253/0x3d0
| printk_kthread_func+0x1d5/0x3b0
Disable cond_resched() for the call into the console drivers.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>