linux/mm
Christoph Lameter 8a5ec0ba42 Lockless (and preemptless) fastpaths for slub
Use the this_cpu_cmpxchg_double functionality to implement a lockless
allocation algorithm on arches that support fast this_cpu_ops.

Each of the per cpu pointers is paired with a transaction id that ensures
that updates of the per cpu information can only occur in sequence on
a certain cpu.

A transaction id is a "long" integer that is comprised of an event number
and the cpu number. The event number is incremented for every change to the
per cpu state. This means that the cmpxchg instruction can verify for an
update that nothing interfered and that we are updating the percpu structure
for the processor where we picked up the information and that we are also
currently on that processor when we update the information.

This results in a significant decrease of the overhead in the fastpaths. It
also makes it easy to adopt the fast path for realtime kernels since this
is lockless and does not require the use of the current per cpu area
over the critical section. It is only important that the per cpu area is
current at the beginning of the critical section and at the end.

So there is no need even to disable preemption.

Test results show that the fastpath cycle count is reduced by up to ~ 40%
(alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
adds a few cycles.

Sadly this does nothing for the slowpath which is where the main issues with
performance in slub are but the best case performance rises significantly.
(For that see the more complex slub patches that require cmpxchg_double)

Kmalloc: alloc/free test

Before:

10000 times kmalloc(8)/kfree -> 134 cycles
10000 times kmalloc(16)/kfree -> 152 cycles
10000 times kmalloc(32)/kfree -> 144 cycles
10000 times kmalloc(64)/kfree -> 142 cycles
10000 times kmalloc(128)/kfree -> 142 cycles
10000 times kmalloc(256)/kfree -> 132 cycles
10000 times kmalloc(512)/kfree -> 132 cycles
10000 times kmalloc(1024)/kfree -> 135 cycles
10000 times kmalloc(2048)/kfree -> 135 cycles
10000 times kmalloc(4096)/kfree -> 135 cycles
10000 times kmalloc(8192)/kfree -> 144 cycles
10000 times kmalloc(16384)/kfree -> 754 cycles

After:

10000 times kmalloc(8)/kfree -> 78 cycles
10000 times kmalloc(16)/kfree -> 78 cycles
10000 times kmalloc(32)/kfree -> 82 cycles
10000 times kmalloc(64)/kfree -> 88 cycles
10000 times kmalloc(128)/kfree -> 79 cycles
10000 times kmalloc(256)/kfree -> 79 cycles
10000 times kmalloc(512)/kfree -> 85 cycles
10000 times kmalloc(1024)/kfree -> 82 cycles
10000 times kmalloc(2048)/kfree -> 82 cycles
10000 times kmalloc(4096)/kfree -> 85 cycles
10000 times kmalloc(8192)/kfree -> 82 cycles
10000 times kmalloc(16384)/kfree -> 706 cycles

Kmalloc: Repeatedly allocate then free test

Before:

10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles

After:

10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-03-11 17:42:49 +02:00
..
Kconfig mm: compaction: don't depend on HUGETLB_PAGE 2011-01-26 10:50:02 +10:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
Makefile thp: transparent hugepage core 2011-01-13 17:32:42 -08:00
backing-dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
bootmem.c x86, memblock: Replace e820_/_early string with memblock_ 2010-08-27 11:13:47 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
compaction.c mm: compaction: prevent division-by-zero during user-requested compaction 2011-01-20 17:02:05 -08:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc() 2011-01-13 17:32:48 -08:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap.c mm: remove likely() from grab_cache_page_write_begin() 2011-01-13 17:32:36 -08:00
filemap_xip.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fremap.c Avoid pgoff overflow in remap_file_pages 2010-09-25 09:34:58 -07:00
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c 2010-10-27 18:03:05 -07:00
huge_memory.c mm: use correct numa policy node for transparent hugepages 2011-03-04 17:53:39 -08:00
hugetlb.c hugetlb: fix handling of parse errors in sysfs 2011-01-13 17:32:49 -08:00
hwpoison-inject.c HWPOISON, hugetlb: support hwpoison injection for hugepage 2010-08-11 09:23:11 +02:00
init-mm.c mm: provide init_mm mm_context initializer 2010-08-09 20:44:54 -07:00
internal.h Revert "mm: batch activate_page() to reduce lock contention" 2011-01-17 14:42:19 -08:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Allow kmemleak metadata allocations to fail 2011-01-27 18:32:06 +00:00
ksm.c ksm: drain pagevecs to lru 2011-01-13 17:32:49 -08:00
maccess.c MN10300: Save frame pointer in thread_info struct rather than global var 2010-10-27 17:29:01 +01:00
madvise.c thp: khugepaged: make khugepaged aware about madvise 2011-01-13 17:32:47 -08:00
memblock.c memblock: don't adjust size in memblock_find_base() 2011-02-11 16:12:20 -08:00
memcontrol.c memcg: fix event counting breakage from recent THP update 2011-02-02 16:03:19 -08:00
memory-failure.c thp: fix unsuitable behavior for hwpoisoned tail page 2011-02-02 16:03:19 -08:00
memory.c mm: prevent concurrent unmap_mapping_range() on the same inode 2011-02-23 19:52:52 -08:00
memory_hotplug.c Merge branch 'slub/hotplug' into slab/urgent 2011-01-15 13:28:17 +02:00
mempolicy.c mm: use correct numa policy node for transparent hugepages 2011-03-04 17:53:39 -08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: grab rcu read lock in move_pages() 2011-02-25 15:07:36 -08:00
mincore.c thp: mincore transparent hugepage support 2011-01-13 17:32:44 -08:00
mlock.c mlock: operate on any regions with protection != PROT_NONE 2011-02-02 10:20:50 +11:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c brk: fix min_brk lower bound computation for COMPAT_BRK 2011-01-13 17:32:48 -08:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c mm: fix possible cause of a page_mapped BUG 2011-02-23 21:55:06 -08:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nommu.c mlock: do not hold mmap_sem for extended periods of time 2011-01-13 17:32:36 -08:00
oom_kill.c oom: kill all threads sharing oom killed task's mm 2010-10-26 16:52:05 -07:00
page-writeback.c writeback: avoid unnecessary determine_dirtyable_memory call 2011-01-13 17:32:38 -08:00
page_alloc.c mm: fix dubious code in __count_immobile_pages() 2011-02-25 15:07:37 -08:00
page_cgroup.c kmemleak: Annotate false positive in init_section_page_cgroup() 2010-07-19 11:54:14 +01:00
page_io.c block: unify flags for struct bio and struct request 2010-08-07 18:20:39 +02:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
pagewalk.c thp: split_huge_page_mm/vma 2011-01-13 17:32:41 -08:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: remove gfp mask from pcpu_get_vm_areas 2011-01-13 17:32:34 -08:00
percpu.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead.c: fix comment 2010-05-25 08:07:00 -07:00
rmap.c memcg: create extensible page stat update routines 2011-01-13 17:32:50 -08:00
shmem.c fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
slab.c mm/slab.c: make local symbols static 2011-01-15 13:28:36 +02:00
slob.c kernel: kmem_ptr_validate considered harmful 2011-01-07 17:50:16 +11:00
slub.c Lockless (and preemptless) fastpaths for slub 2011-03-11 17:42:49 +02:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c thp: remove PG_buddy 2011-01-13 17:32:43 -08:00
swap.c Revert "mm: simplify code of swap.c" 2011-01-17 14:42:34 -08:00
swap_state.c thp: split_huge_page paging 2011-01-13 17:32:41 -08:00
swapfile.c mm: fix refcounting in swapon 2011-02-24 08:55:01 -08:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c memcg: more mem_cgroup_uncharge() batching 2011-02-25 15:07:37 -08:00
util.c kernel: kmem_ptr_validate considered harmful 2011-01-07 17:50:16 +11:00
vmalloc.c Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2011-01-13 20:15:35 -08:00
vmscan.c mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-25 15:07:36 -08:00
vmstat.c thp: transparent hugepage vmstat 2011-01-13 17:32:43 -08:00