linux

Commit Graph

Author	SHA1	Message	Date
Thomas Gleixner	3d0991b2f3	mm: Enable SLUB for RT Avoid the memory allocation in IRQ section Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: factor out everything except the kcalloc() workaorund ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:22 +03:00
Ingo Molnar	8dd0922414	mm/vmstat: Protect per cpu variables with preempt disable on RT Disable preemption on -RT for the vmstat code. On vanila the code runs in IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the same ressources is not updated in parallel due to preemption. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:22 +03:00
Thomas Gleixner	179fc64984	preempt: Provide preempt_*_(no)rt variants RT needs a few preempt_disable/enable points which are not necessary otherwise. Implement variants to avoid #ifdeffery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:22 +03:00
Sebastian Andrzej Siewior	cc47dc4fc7	mm/page_alloc: Use migrate_disable() in drain_local_pages_wq() The drain_local_pages_wq() uses preempt_disable() to ensure that there will be no CPU migration while drain_local_pages() is invoked which might happen if the CPU is going down. drain_local_pages() acquires a sleeping lock on RT which can not be acquired with disabled preemption. Use migrate_disable() instead of preempt_disable(): On RT it ensures that the CPU won't go down and on !RT it is replaced with preempt_disable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:22 +03:00
Luiz Capitulino	17b38f7b98	mm: perform lru_add_drain_all() remotely lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run on all CPUs that have non-empty LRU pagevecs and then waiting for the scheduled work to complete. However, workqueue threads may never have the chance to run on a CPU that's running a SCHED_FIFO task. This causes lru_add_drain_all() to block forever. This commit solves this problem by changing lru_add_drain_all() to drain the LRU pagevecs of remote CPUs. This is done by grabbing swapvec_lock and calling lru_add_drain_cpu(). PS: This is based on an idea and initial implementation by Rik van Riel. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:22 +03:00
Ingo Molnar	3dae63e7b1	mm/swap: Convert to percpu locked Replace global locks (get_cpu + local_irq_save) with "local_locks()". Currently there is one of for "rotate" and one for "swap". Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:22 +03:00
Ingo Molnar	4901761f49	mm: page_alloc: rt-friendly per-cpu pages rt-friendly per-cpu pages: convert the irqs-off per-cpu locking method into a preemptible, explicit-per-cpu-locks method. Contains fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl> Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:22 +03:00
Thomas Gleixner	1bdce1f597	mm/SLUB: delay giving back empty slubs to IRQ enabled regions __free_slab() is invoked with disabled interrupts which increases the irq-off time while __free_pages() is doing the work. Allow __free_slab() to be invoked with enabled interrupts and move everything from interrupts-off invocations to a temporary per-CPU list so it can be processed later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:22 +03:00
Thomas Gleixner	4dd9f4c8bc	mm/SLxB: change list_lock to raw_spinlock_t The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t otherwise the interrupts won't be disabled on -RT. The locking rules remain the same on !RT. This patch changes it for SLAB and SLUB since both share the same header file for struct kmem_cache_node defintion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:22 +03:00
Peter Zijlstra	a826a9cad9	Split IRQ-off and zone->lock while freeing pages from PCP list #2 Split the IRQ-off section while accessing the PCP list from zone->lock while freeing pages. Introcude isolate_pcp_pages() which separates the pages from the PCP list onto a temporary list and then free the temporary list via free_pcppages_bulk(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Peter Zijlstra	ea19afca31	Split IRQ-off and zone->lock while freeing pages from PCP list #1 Split the IRQ-off section while accessing the PCP list from zone->lock while freeing pages. Introcude isolate_pcp_pages() which separates the pages from the PCP list onto a temporary list and then free the temporary list via free_pcppages_bulk(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Oleg Nesterov	896cf0cb5d	signal/x86: Delay calling signals in atomic On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	647eaa4930	softirq: Add preemptible softirq Add preemptible softirq for RT's needs. By removing the softirq count from the preempt counter, the softirq becomes preemptible. A per-CPU lock ensures that there is no parallel softirq processing or that per-CPU variables are not access in parallel by multiple threads. local_bh_enable() will process all softirq work that has been raised in its BH-disabled section once the BH counter gets to 0. [+ rcu_read_lock() as part of local_bh_disable() by Scott Wood] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	8cf972c7b3	locallock: Include header for the `current' macro Include the header for `current' macro so that CONFIG_KERNEL_HEADER_TEST=y passes. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Thomas Gleixner	5b3302566a	rt: Add local irq locks Introduce locallock. For !RT this maps to preempt_disable()/ local_irq_disable() so there is not much that changes. For RT this will map to a spinlock. This makes preemption possible and locked "ressource" gets the lockdep anotation it wouldn't have otherwise. The locks are recursive for owner == current. Also, all locks user migrate_disable() which ensures that the task is not migrated to another CPU while the lock is held and the owner is preempted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	266005de95	x86: Disable HAVE_ARCH_JUMP_LABEL __text_poke() does: \| local_irq_save(flags); … \| ptep = get_locked_pte(poking_mm, poking_addr, &ptl); which does not work on -RT because the PTE-lock is a spinlock_t typed lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	d03792e5cb	efi: Allow efi=runtime In case the command line option "efi=noruntime" is default at built-time, the user could overwrite its state by `efi=runtime' and allow it again. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	2a59f5dff1	efi: Disable runtime services on RT Based on meassurements the EFI functions get_variable / get_next_variable take up to 2us which looks okay. The functions get_time, set_time take around 10ms. Those 10ms are too much. Even one ms would be too much. Ard mentioned that SetVariable might even trigger larger latencies if the firware will erase flash blocks on NOR. The time-functions are used by efi-rtc and can be triggered during runtimed (either via explicit read/write or ntp sync). The variable write could be used by pstore. These functions can be disabled without much of a loss. The poweroff / reboot hooks may be provided by PSCI. Disable EFI's runtime wrappers. This was observed on "EFI v2.60 by SoftIron Overdrive 1000". Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	cb5b81f8f0	md: disable bcache It uses anon semaphores \|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’: \|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration] \| up_read_non_owner(&dc->writeback_lock); \| ^ \|drivers/md/bcache/request.c: In function ‘request_write’: \|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration] \| down_read_non_owner(&dc->writeback_lock); \| ^ either we get rid of those or we have to introduce them… Link: http://lkml.kernel.org/r/20130820111602.3cea203c@gandalf.local.home Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	2c6cd632e3	net/core: disable NET_RX_BUSY_POLL on RT napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire sleeping locks with disabled preemption so we would have to work around this and add explicit locking for synchronisation against ksoftirqd. Without explicit synchronisation a low priority process would "own" the NAPI state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no preempt_disable() and BH is preemptible on RT). In case a network packages arrives then the interrupt handler would set NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI would be scheduled in again. Should a task with RT priority busy poll then it would consume the CPU instead allowing tasks with lower priority to run. The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong locking context on RT. Should this feature be considered useful on RT systems then it could be enabled again with proper locking and synchronisation. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Thomas Gleixner	915de11543	sched: Disable CONFIG_RT_GROUP_SCHED on RT Carsten reported problems when running: taskset 01 chrt -f 1 sleep 1 from within rc.local on a F15 machine. The task stays running and never gets on the run queue because some of the run queues have rt_throttled=1 which does not go away. Works nice from a ssh login shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:21 +03:00
Sebastian Andrzej Siewior	94462480e5	rcu: make RCU_BOOST default on RT Since it is no longer invoked from the softirq people run into OOM more often if the priority of the RCU thread is too low. Making boosting default on RT should help in those case and it can be switched off if someone knows better. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:21 +03:00
Ingo Molnar	0923af59b9	mm: Allow only SLUB on RT Memory allocation disables interrupts as part of the allocation and freeing process. For -RT it is important that this section remain short and don't depend on the size of the request or an internal state of the memory allocator. At the beginning the SLAB memory allocator was adopted for RT's needs and it required substantial changes. Later, with the addition of the SLUB memory allocator we adopted this one as well and the changes were smaller. More important, due to the design of the SLUB allocator it performs better and its worst case latency was smaller. In the end only SLUB remained supported. Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Thomas Gleixner	e3bbca899d	kconfig: Disable config options which are not RT compatible Disable stuff which is known to have issues on RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	2ddbdd8512	fs/dcache: use swait_queue instead of waitqueue __d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock() which disables preemption. As a workaround convert it to swait. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	e852fda5d8	fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init Commit `3d375d7859` ("mm: update callers to use HASH_ZERO flag") removed INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However on RT we have also a spinlock which needs an init call so we can't use that. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Clark Williams	b0408fdcf9	fscache: initialize cookie hash table raw spinlocks The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT the structures get used uninitialized, causing warnings about bad magic numbers when spinlock debugging is turned on. Use the init function for fscache cookies. Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Paul Gortmaker	902a0c0b45	list_bl: Make list head locking RT safe As per changes in include/linux/jbd_common.h for avoiding the bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal head lock rt safe") we do the same thing here. We use the non atomic __set_bit and __clear_bit inside the scope of the lock to preserve the ability of the existing LIST_DEBUG code to use the zero'th bit in the sanity checks. As a bit spinlock, we had no lockdep visibility into the usage of the list head locking. Now, if we were to implement it as a standard non-raw spinlock, we would see: BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd 5 locks held by udevd/122: #0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0 #1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60 #2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130 #3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130 #4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130 Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7 Call Trace: [<ffffffff810b9624>] __might_sleep+0x134/0x1f0 [<ffffffff817a24d4>] rt_spin_lock+0x24/0x60 [<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0 [<ffffffff811a1b2d>] __d_drop+0x1d/0x40 [<ffffffff811a24be>] __d_move+0x8e/0x320 [<ffffffff811a278e>] d_move+0x3e/0x60 [<ffffffff81199598>] vfs_rename+0x198/0x4c0 [<ffffffff8119b093>] sys_renameat+0x213/0x240 [<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0 [<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13 [<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8119b0db>] sys_rename+0x1b/0x20 [<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f Since we are only taking the lock during short lived list operations, lets assume for now that it being raw won't be a significant latency concern. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> [julia@ni.com: Use #define instead static inline to avoid false positive from lockdep] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	ae0bba2146	fs/dcache: disable preemption on i_dir_seq's write side i_dir_seq is an opencoded seqcounter. Based on the code it looks like we could have two writers in parallel despite the fact that the d_lock is held. The problem is that during the write process on RT the preemption is still enabled and if this process is interrupted by a reader with RT priority then we lock up. To avoid that lock up I am disabling the preemption during the update. The rename of i_dir_seq is here to ensure to catch new write sides in future. Cc: stable-rt@vger.kernel.org Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	1bc5345a56	fs/nfs: turn rmdir_sem into a semaphore The RW semaphore had a reader side which used the _non_owner version because it most likely took the reader lock in one thread and released it in another which would cause lockdep to complain if the "regular" version was used. On -RT we need the owner because the rw lock is turned into a rtmutex. The semaphores on the hand are "plain simple" and should work as expected. We can't have multiple readers but on -RT we don't allow multiple readers anyway so that is not a loss. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	cac578e31b	userfaultfd: Use a seqlock instead of seqcount On RT write_seqcount_begin() disables preemption which leads to warning in add_wait_queue() while the spinlock_t is acquired. The waitqueue can't be converted to swait_queue because userfaultfd_wake_function() is used as a custom wake function. Use seqlock instead seqcount to avoid the preempt_disable() section during add_wait_queue(). Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	c587de4dc8	net/Qdisc: use a seqlock instead seqcount The seqcount disables preemption on -RT while it is held which can't remove. Also we don't want the reader to spin for ages if the writer is scheduled out. The seqlock on the other hand will serialize / sleep on the lock while writer is active. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	6e91cb9b19	NFSv4: replace seqcount_t with a seqlock_t The raw_write_seqcount_begin() in nfs4_reclaim_open_state() causes a preempt_disable() on -RT. The spin_lock()/spin_unlock() in that section does not work. The lockdep part was removed in commit `abbec2da13` ("NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state") because lockdep complained. The whole seqcount thing was introduced in commit `c137afabe3` ("NFSv4: Allow the state manager to mark an open_owner as being recovered"). The recovery threads runs only once. write_seqlock() does not work on !RT because it disables preemption and it the writer side is preemptible (has to remain so despite the fact that it will block readers). Reported-by: kernel test robot <xiaolong.ye@intel.com> Link: https://lkml.kernel.org/r/20161021164727.24485-1-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Thomas Gleixner	a6ffaaf304	seqlock: Prevent rt starvation If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. For seqcount writers we disable preemption over the update code path. Thanks to Al Viro for distangling some VFS code to make that possible. Nicholas Mc Guire: - spin_lock+unlock => spin_unlock_wait - __write_seqcount_begin => __raw_write_seqcount_begin Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:20 +03:00
Sebastian Andrzej Siewior	e96faa4aee	dma-buf: Use seqlock_t instread disabling preemption "dma reservation" disables preemption while acquiring the write access for "seqcount". Replace the seqcount with a seqlock_t which provides seqcount like semantic and lock for writer. Link: https://lkml.kernel.org/r/f410b429-db86-f81c-7c67-f563fa808b62@free.fr Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:20 +03:00
Thomas Gleixner	88199aa670	signal: Revert ptrace preempt magic Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more than a bandaid around the ptrace design trainwreck. It's not a correctness issue, it's merily a cosmetic bandaid. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:19 +03:00
Thomas Gleixner	40c5c3896a	timekeeping: Split jiffies seqlock Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so it can be taken in atomic context on RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	e6598a0bd6	mm: Warn on memory allocation in non-preemptible context on RT The memory allocation via kmalloc(, GFP_ATOMIC) in atomic context (disabled preemption or interrupts) is not allowed on RT because the buddy allocator is using sleeping locks which can't be acquired in this context. Such an an allocation may not trigger a warning in the buddy allocator if it is always satisfied in the SLUB allocator. Add a warning on RT if a memory allocation was attempted in not preemptible region. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>	2023-03-25 04:21:19 +03:00
Rob Herring	5150ed530f	of: Rework and simplify phandle cache to use a fixed size The phandle cache was added to speed up of_find_node_by_phandle() by avoiding walking the whole DT to find a matching phandle. The implementation has several shortcomings: - The cache is designed to work on a linear set of phandle values. This is true for dtc generated DTs, but not for other cases such as Power. - The cache isn't enabled until of_core_init() and a typical system may see hundreds of calls to of_find_node_by_phandle() before that point. - The cache is freed and re-allocated when the number of phandles changes. - It takes a raw spinlock around a memory allocation which breaks on RT. Change the implementation to a fixed size and use hash_32() as the cache index. This greatly simplifies the implementation. It avoids the need for any re-alloc of the cache and taking a reference on nodes in the cache. We only have a single source of removing cache entries which is of_detach_node(). Using hash_32() removes any assumption on phandle values improving the hit rate for non-linear phandle values. The effect on linear values using hash_32() is about a 10% collision. The chances of thrashing on colliding values seems to be low. To compare performance, I used a RK3399 board which is a pretty typical system. I found that just measuring boot time as done previously is noisy and may be impacted by other things. Also bringing up secondary cores causes some issues with measuring, so I booted with 'nr_cpus=1'. With no caching, calls to of_find_node_by_phandle() take about 20124 us for 1248 calls. There's an additional 288 calls before time keeping is up. Using the average time per hit/miss with the cache, we can calculate these calls to take 690 us (277 hit / 11 miss) with a 128 entry cache and 13319 us with no cache or an uninitialized cache. Comparing the 3 implementations the time spent in of_find_node_by_phandle() is: no cache: 20124 us (+ 13319 us) 128 entry cache: 5134 us (+ 690 us) current cache: 819 us (+ 13319 us) We could move the allocation of the cache earlier to improve the current cache, but that just further complicates the situation as it needs to be after slab is up, so we can't do it when unflattening (which uses memblock). Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lkml.kernel.org/r/20191211232345.24810-1-robh@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	a66f88a0d5	tpm: remove tpm_dev_wq_lock Added in commit `9e1b74a63f` ("tpm: add support for nonblocking operation") but never actually used it. Cc: Philip Tricca <philip.b.tricca@intel.com> Cc: Tadeusz Struk <tadeusz.struk@intel.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	34b227ff95	mm: workingset: replace IRQ-off check with a lockdep assert. Commit `68d48e6a2d` ("mm: workingset: add vmstat counter for shadow nodes") introduced an IRQ-off check to ensure that a lock is held which also disabled interrupts. This does not work the same way on -RT because none of the locks, that are held, disable interrupts. Replace this check with a lockdep assert which ensures that the lock is held. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	95faf05ec7	cgroup: Acquire cgroup_rstat_lock with enabled interrupts There is no need to disable interrupts while cgroup_rstat_lock is acquired. The lock is never used in-IRQ context so a simple spin_lock() is enough for synchronisation purpose. Acquire cgroup_rstat_lock without disabling interrupts and ensure that cgroup_rstat_cpu_lock is acquired with disabled interrupts (this one is acquired in-IRQ context). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	76880e1fa4	cgroup: Remove `may_sleep' from cgroup_rstat_flush_locked() cgroup_rstat_flush_locked() is always invoked with `may_sleep' set to true so that this case can be made default and the parameter removed. Remove the `may_sleep' parameter. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	bff81c5d86	cgroup: Consolidate users of cgroup_rstat_lock. cgroup_rstat_flush_irqsafe() has no users, remove it. cgroup_rstat_flush_hold() and cgroup_rstat_flush_release() are only used within this file. Make it static. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	9cb4141c73	cgroup: Remove ->css_rstat_flush() I was looking at the lifetime of the the ->css_rstat_flush() to see if cgroup_rstat_cpu_lock should remain a raw_spinlock_t. I didn't find any users and is unused since it was introduced in commit `8f53470bab` ("cgroup: Add cgroup_subsys->css_rstat_flush()") Remove the css_rstat_flush callback because it has no users. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	baf1451e1b	workqueue: Convert the locks to raw type After all the workqueue and the timer rework, we can finally make the worker_pool lock raw. The lock is not held over an unbounded period of time/iterations. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	a2c9e4039a	workqueue: Use swait for wq_manager_wait In order for the workqueue code use raw_spinlock_t typed locking there must not be a spinlock_t typed lock be acquired. A wait_queue_head uses a spinlock_t lock for its list protection. Use a swait based queue head to avoid raw_spinlock_t -> spinlock_t locking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:19 +03:00
Sebastian Andrzej Siewior	a1468913de	sched/swait: Add swait_event_lock_irq() The swait_event_lock_irq() is inspired by wait_event_lock_irq(). This is required by the workqueue code once it switches to swait. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:18 +03:00
Sebastian Andrzej Siewior	b065527a55	workqueue: Don't assume that the callback has interrupts disabled Due to the TIMER_IRQSAFE flag, the timer callback is invoked with disabled interrupts. On -RT the callback is invoked in softirq context with enabled interrupts. Since the interrupts are threaded, there are are no in_irq() users. The local_bh_disable() around the threaded handler ensures that there is either a timer or a threaded handler active on the CPU. Disable interrupts before __queue_work() is invoked from the timer callback. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:18 +03:00
Sebastian Andrzej Siewior	9e86cc66fa	Use CONFIG_PREEMPTION Thisi is an all-in-one patch of the current `PREEMPTION' branch. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:18 +03:00

... 3 4 5 6 7 ...

892207 Commits All Branches Search

892207 Commits

All Branches