linux/arch
Davidlohr Bueso efb5fea230 mm: per-thread vma caching
commit 615d6e8756 upstream.

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09 12:21:29 -07:00
..
alpha alpha: fix broken network checksum 2014-01-31 09:21:55 -08:00
arc ARC: Implement ptrace(PTRACE_GET_THREAD_AREA) 2014-07-28 08:06:04 -07:00
arm ARM: DRA7: Add support for soc_is_dra74x() and soc_is_dra72x() variants 2014-10-05 14:52:24 -07:00
arm64 arm64: ptrace: fix compat hardware watchpoint reporting 2014-10-05 14:52:11 -07:00
avr32 avr32: add generic vga.h to Kbuild 2014-02-17 11:24:48 +01:00
blackfin Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-01-31 09:31:14 -08:00
c6x Build fix for c6x 2014-03-07 09:52:46 -08:00
cris cris: convert ffs from an object-like macro to a function-like macro 2014-03-10 17:26:21 -07:00
frv Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2014-01-30 16:58:05 -08:00
hexagon Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
ia64 hugetlb: restrict hugepage_migration_support() to x86_64 2014-06-30 20:11:53 -07:00
m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
m68k m68k: Skip futex_atomic_cmpxchg_inatomic() test 2014-04-14 06:50:05 -07:00
metag parisc,metag: Do not hardcode maximum userspace stack size 2014-07-17 16:21:03 -07:00
microblaze microblaze: Fix a typo when disabling stack protection 2014-02-10 07:44:11 +01:00
mips MIPS: mcount: Adjust stack pointer for static trace in MIPS32 2014-10-05 14:52:16 -07:00
mn10300 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2014-01-30 16:58:05 -08:00
openrisc OpenRISC updates for 3.14 2014-01-30 17:08:41 -08:00
parisc parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds 2014-10-05 14:52:21 -07:00
powerpc powerpc: Add smp_mb()s to arch_spin_unlock_wait() 2014-10-05 14:52:21 -07:00
s390 KVM: s390/mm: try a cow on read only pages for key ops 2014-10-05 14:52:17 -07:00
score Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-01-31 09:31:14 -08:00
sh hugetlb: restrict hugepage_migration_support() to x86_64 2014-06-30 20:11:53 -07:00
sparc arch/sparc/math-emu/math_32.c: drop stray break operator 2014-08-14 09:38:26 +08:00
tile hugetlb: restrict hugepage_migration_support() to x86_64 2014-06-30 20:11:53 -07:00
um Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2014-01-26 11:06:16 -08:00
unicore32 mm: per-thread vma caching 2014-10-09 12:21:29 -07:00
x86 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU 2014-10-05 14:52:24 -07:00
xtensa xtensa: fix a6 and a7 handling in fast_syscall_xtensa 2014-10-05 14:52:13 -07:00
.gitignore
Kconfig stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG 2013-12-20 09:38:40 +01:00