linux/kernel
Mel Gorman 65d8fc777f futex: Remove requirement for lock_page() in get_futex_key()
When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.

Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.

Task A
     get_futex_key()
     lock_page()
    ---> preempted

Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.

With this patch, we pretty much have a lockless futex_get_key().

Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.

Signed-off-by: Mel Gorman <mgorman@suse.de>
[ Ported on top of thp refcount rework, changelog, comments, fixes. ]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Mason <clm@fb.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:42:17 +01:00
..
bpf bpf: fix branch offset adjustment on backjumps after patching ctx expansion 2016-02-10 16:56:47 -05:00
configs
debug module: use a structure to encapsulate layout. 2015-12-04 22:46:25 +01:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:38:27 -08:00
gcov gcov: use within_module() helper. 2015-12-04 22:46:25 +01:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 14:48:58 -08:00
livepatch livepatch: Cleanup module page permission changes 2015-12-04 22:51:07 +01:00
locking kernel/locking/lockdep.c: convert hash tables to hlists 2016-02-11 18:35:48 -08:00
power PM: APM_EMULATION does not depend on PM 2016-01-27 23:20:14 +01:00
printk kernel: printk: specify alignment for struct printk_log 2016-01-20 17:09:18 -08:00
rcu Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD 2015-12-07 17:02:54 -08:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
time Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:49:06 -08:00
trace A cleanup to the stack tracer broke stack tracing on s390. 2016-02-03 09:31:34 -08:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-01-17 19:13:15 -08:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: make sure a parent css isn't freed before its children 2016-01-22 10:42:58 -05:00
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
compat.c
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu.c kernel/cpu.c: make set_cpu_* static inlines 2016-01-20 17:09:18 -08:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c cpuset: make mm migration asynchronous 2016-01-22 10:22:46 -05:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c exit: remove unneeded declaration of exit_mm() 2016-01-20 17:09:18 -08:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c mm: rework virtual memory accounting 2016-01-14 16:00:49 -08:00
freezer.c
futex.c futex: Remove requirement for lock_page() in get_futex_key() 2016-02-17 10:42:17 +01:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
kexec.c kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list() 2016-01-20 17:09:18 -08:00
kexec_core.c kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_list 2016-01-20 17:09:18 -08:00
kexec_file.c kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c mm: fix pfn_t vs highmem 2016-02-11 18:35:48 -08:00
module-internal.h
module.c modules: fix longstanding /proc/kallsyms vs module insertion race. 2016-02-03 16:58:15 +10:30
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c
panic.c printk: do cond_resched() between lines while outputting to consoles 2016-01-16 11:17:25 -08:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
pid_namespace.c
profile.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
ptrace.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c restrict /dev/mem to idle io memory ranges 2016-01-09 06:30:49 -08:00
seccomp.c seccomp: always propagate NO_NEW_PRIVS on tsync 2016-01-27 07:38:25 -08:00
signal.c signals: avoid random wakeups in sigsuspend() 2016-02-05 18:10:40 -08:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys.c prctl: take mmap sem for writing to protect against others 2016-01-20 17:09:18 -08:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sysctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-22 10:24:03 -08:00
sysctl_binary.c
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
utsname.c
utsname_sysctl.c
watchdog.c Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-01-11 18:53:13 -08:00
workqueue.c workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup 2016-02-10 12:13:05 -05:00
workqueue_internal.h