linux/lib
Peter Zijlstra f0f1d32f93 llist: Remove cpu_relax() usage in cmpxchg loops
Initial benchmarks show they're a net loss:

 $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
 $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
 $ ./sembench -t 2048 -w 1900 -o 0

Pre:

 run time 30 seconds 778936 worker burns per second
 run time 30 seconds 912190 worker burns per second
 run time 30 seconds 817506 worker burns per second
 run time 30 seconds 830870 worker burns per second
 run time 30 seconds 845056 worker burns per second

Post:

 run time 30 seconds 905920 worker burns per second
 run time 30 seconds 849046 worker burns per second
 run time 30 seconds 886286 worker burns per second
 run time 30 seconds 822320 worker burns per second
 run time 30 seconds 900283 worker burns per second

So about 4% faster. (!)

cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:

 - allows SMT siblings to have a go;
 - reduces pressure on the CPU interconnect.

However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.

A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.

Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:44:03 +02:00
..
lzo
raid6
reed_solomon
xz XZ: Fix incorrect XZ_BUF_ERROR 2011-09-21 13:39:59 -07:00
zlib_deflate
zlib_inflate
.gitignore
argv_split.c
atomic64_test.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
atomic64.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
audit.c
average.c
bcd.c
bch.c
bitmap.c Merge branch 'apei' into apei-release 2011-08-03 11:30:42 -04:00
bitrev.c
bsearch.c
btree.c
bug.c
bust_spinlocks.c
check_signature.c
checksum.c lib/checksum.c: optimize do_csum a bit 2011-07-07 04:52:24 -07:00
cmdline.c
cordic.c
cpu_rmap.c
cpu-notifier-error-inject.c
cpumask.c cpumask: alloc_cpumask_var() use NUMA_NO_NODE 2011-07-26 16:49:44 -07:00
crc7.c
crc8.c
crc16.c
crc32.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
ctype.c
debug_locks.c
debugobjects.c
dec_and_lock.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
decompress_bunzip2.c
decompress_inflate.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress.c
devres.c devres: fix possible use after free 2011-07-25 20:57:14 -07:00
div64.c
dma-debug.c
dump_stack.c
dynamic_debug.c
extable.c
fault-inject.c fault-injection: add ability to export fault_attr in arbitrary directory 2011-08-03 14:25:20 -10:00
find_last_bit.c
find_next_bit.c
flex_array.c
gcd.c
gen_crc32table.c
genalloc.c lib, Make gen_pool memory allocator lockless 2011-08-03 11:15:57 -04:00
halfmd4.c
hexdump.c
hweight.c
idr.c ida: simplified functions for id allocation 2011-08-03 14:25:20 -10:00
inflate.c
int_sqrt.c
iomap_copy.c
iomap.c iomap: make IOPORT/PCI mapping functions conditional 2011-07-22 18:46:26 +02:00
iommu-helper.c
ioremap.c
irq_regs.c
is_single_threaded.c
kasprintf.c
Kconfig llist: Make some llist functions inline 2011-10-04 11:30:53 +02:00
Kconfig.debug Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-22 16:45:02 -07:00
Kconfig.kgdb
Kconfig.kmemcheck
klist.c
kobject_uevent.c
kobject.c
kref.c
kstrtox.c lib: make _tolower() public 2011-07-25 20:57:16 -07:00
lcm.c lib/lcm.c: quiet sparse noise 2011-07-25 20:57:15 -07:00
libcrc32c.c
list_debug.c
list_sort.c
llist.c llist: Remove cpu_relax() usage in cmpxchg loops 2011-10-04 12:44:03 +02:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lru_cache.c
Makefile llist: Make some llist functions inline 2011-10-04 11:30:53 +02:00
md5.c crypto: Move md5_transform to lib/md5.c 2011-08-06 18:32:45 -07:00
nlattr.c
parser.c
percpu_counter.c
plist.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
prio_heap.c
prio_tree.c
proportions.c
radix-tree.c tmpfs radix_tree: locate_item to speed up swapoff 2011-08-03 14:25:24 -10:00
random32.c
ratelimit.c
rational.c
rbtree.c
reciprocal_div.c
rwsem-spinlock.c
rwsem.c
scatterlist.c
sha1.c lib/sha1.c: quiet sparse noise about symbol not declared 2011-09-13 16:09:41 -07:00
show_mem.c
smp_processor_id.c
sort.c
spinlock_debug.c
string_helpers.c
string.c
swiotlb.c
syscall.c
test-kstrtox.c
textsearch.c
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
uuid.c
vsprintf.c Merge 'akpm' patch series 2011-07-25 21:00:19 -07:00