linux/lib
Nick Piggin 643b52b9c0 radix-tree: fix small lockless radix-tree bug
We shrink a radix tree when its root node has only one child, in the left
most slot.  The child becomes the new root node.  To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.

However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next).  For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing.  Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.

This is all pretty standard RCU stuff.  It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.

What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period.  The lookup would return a false
negative as a result.

Fix it by doing that clearing in the RCU callback.  I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).

This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
 I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.

Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:41 -07:00
..
lzo lzo: fix typo in decompressor 2008-04-10 15:34:05 -07:00
reed_solomon lib: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:17 -04:00
zlib_deflate lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
zlib_inflate
.gitignore
argv_split.c LIB: Replace inappropriate include of <linux/bug.h> 2007-10-20 00:26:10 +02:00
audit.c
bitmap.c cpumask: remove bitmap_scnprintf_len and cpumask_scnprintf_len 2008-05-13 08:02:25 -07:00
bitrev.c lib: export bitrev16 2008-06-06 11:29:10 -07:00
bug.c
bust_spinlocks.c
check_signature.c
cmdline.c
cpumask.c
crc7.c
crc16.c
crc32.c lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
ctype.c
debug_locks.c
debugobjects.c infrastructure to debug (dynamic) objects 2008-04-30 08:29:53 -07:00
dec_and_lock.c
devres.c [POWERPC] devres: Add devm_ioremap_prot() 2008-05-05 16:47:14 +10:00
div64.c add an inlined version of iter_div_u64_rem 2008-06-12 10:47:58 +02:00
dump_stack.c
extable.c lib/extable.c: remove an expensive integer divide in search_extable() 2008-02-06 10:41:08 -08:00
fault-inject.c libfs: allow error return from simple attributes 2008-02-08 09:22:34 -08:00
find_next_bit.c bitops: remove "optimizations" 2008-04-29 08:11:16 -07:00
gen_crc32table.c
genalloc.c
halfmd4.c
hexdump.c lib: create common ascii hex array 2008-05-14 19:11:14 -07:00
hweight.c remove asm/bitops.h includes 2007-10-19 11:53:41 -07:00
idr.c idr: fix idr_remove() 2008-05-01 08:04:00 -07:00
inflate.c lib/inflate.c: handle failed malloc() 2008-04-29 08:06:02 -07:00
int_sqrt.c
iomap_copy.c
iomap.c iomap: fix 64 bits resources on 32 bits 2008-04-29 08:06:02 -07:00
iommu-helper.c iommu: export iommu_is_span_boundary helper function 2008-03-04 16:35:17 -08:00
ioremap.c
irq_regs.c
kasprintf.c
Kconfig x86, bitops: select the generic bitmap search functions 2008-04-26 19:21:17 +02:00
Kconfig.debug debugobjects: add timer specific object debugging code 2008-04-30 08:29:53 -07:00
Kconfig.kgdb kgdb: kconfig fix xconfig/menuconfig element 2008-05-05 07:13:21 -05:00
kernel_lock.c BKL: revert back to the old spinlock implementation 2008-05-10 20:58:02 -07:00
klist.c klist: fix coding style errors in klist.h and klist.c 2008-04-30 16:52:58 -07:00
kobject_uevent.c lib: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
kobject.c kobject: do not copy vargs, just pass them around 2008-04-30 16:52:48 -07:00
kref.c kref: add kref_set() 2008-01-24 20:40:05 -08:00
libcrc32c.c [LIB] crc32c: Keep intermediate crc state in cpu order 2007-11-08 21:34:09 +08:00
list_debug.c
lmb.c lmb: Fix compile warning 2008-05-18 23:35:43 -05:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
Makefile infrastructure to debug (dynamic) objects 2008-04-30 08:29:53 -07:00
parser.c add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust 2008-05-14 19:23:25 -05:00
percpu_counter.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
plist.c
prio_heap.c Fix cpusets update_cpumask 2007-10-19 11:53:41 -07:00
prio_tree.c
proportions.c mm: bdi: allow setting a maximum for the bdi dirty limit 2008-04-30 08:29:50 -07:00
radix-tree.c radix-tree: fix small lockless radix-tree bug 2008-06-12 18:05:41 -07:00
random32.c [NET]: srandom32 fixes for networking v2 2008-04-03 14:07:02 -07:00
ratelimit.c isolate ratelimit from printk.c for other use 2008-04-29 08:06:06 -07:00
rbtree.c
reciprocal_div.c
rwsem-spinlock.c lib: remove fastcall from lib/* 2008-02-08 09:22:31 -08:00
rwsem.c x86: fix UML and -regparm=3 2008-01-30 13:33:00 +01:00
scatterlist.c [SCSI] block: add sg buffer copy helper functions 2008-04-07 12:15:45 -05:00
sha1.c
smp_processor_id.c debug_smp_processor_id() fixlets 2008-02-06 10:41:09 -08:00
sort.c
spinlock_debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
string.c Add a new sysfs_streq() string comparison function 2008-05-01 08:03:59 -07:00
swiotlb.c dma/ia64: update ia64 machvecs, swiotlb.c 2008-04-29 08:06:12 -07:00
textsearch.c [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure 2007-12-01 00:03:52 +11:00
ts_bm.c
ts_fsm.c
ts_kmp.c
vsprintf.c lib/vsprintf.c: fix bug omitting minus sign of numbers (module_param) 2008-02-23 17:12:14 -08:00