linux/lib
Nick Piggin 643b52b9c0 radix-tree: fix small lockless radix-tree bug
We shrink a radix tree when its root node has only one child, in the left
most slot.  The child becomes the new root node.  To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.

However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next).  For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing.  Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.

This is all pretty standard RCU stuff.  It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.

What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period.  The lookup would return a false
negative as a result.

Fix it by doing that clearing in the RCU callback.  I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).

This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
 I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.

Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:41 -07:00
..
lzo
reed_solomon
zlib_deflate
zlib_inflate
.gitignore
argv_split.c
audit.c
bitmap.c cpumask: remove bitmap_scnprintf_len and cpumask_scnprintf_len 2008-05-13 08:02:25 -07:00
bitrev.c lib: export bitrev16 2008-06-06 11:29:10 -07:00
bug.c
bust_spinlocks.c
check_signature.c
cmdline.c
cpumask.c
crc7.c
crc16.c
crc32.c
crc32defs.h
crc-ccitt.c
crc-itu-t.c
ctype.c
debug_locks.c
debugobjects.c infrastructure to debug (dynamic) objects 2008-04-30 08:29:53 -07:00
dec_and_lock.c
devres.c [POWERPC] devres: Add devm_ioremap_prot() 2008-05-05 16:47:14 +10:00
div64.c add an inlined version of iter_div_u64_rem 2008-06-12 10:47:58 +02:00
dump_stack.c
extable.c
fault-inject.c
find_next_bit.c bitops: remove "optimizations" 2008-04-29 08:11:16 -07:00
gen_crc32table.c
genalloc.c
halfmd4.c
hexdump.c lib: create common ascii hex array 2008-05-14 19:11:14 -07:00
hweight.c
idr.c idr: fix idr_remove() 2008-05-01 08:04:00 -07:00
inflate.c lib/inflate.c: handle failed malloc() 2008-04-29 08:06:02 -07:00
int_sqrt.c
iomap_copy.c
iomap.c iomap: fix 64 bits resources on 32 bits 2008-04-29 08:06:02 -07:00
iommu-helper.c
ioremap.c
irq_regs.c
kasprintf.c
Kconfig x86, bitops: select the generic bitmap search functions 2008-04-26 19:21:17 +02:00
Kconfig.debug debugobjects: add timer specific object debugging code 2008-04-30 08:29:53 -07:00
Kconfig.kgdb kgdb: kconfig fix xconfig/menuconfig element 2008-05-05 07:13:21 -05:00
kernel_lock.c BKL: revert back to the old spinlock implementation 2008-05-10 20:58:02 -07:00
klist.c klist: fix coding style errors in klist.h and klist.c 2008-04-30 16:52:58 -07:00
kobject_uevent.c lib: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
kobject.c kobject: do not copy vargs, just pass them around 2008-04-30 16:52:48 -07:00
kref.c
libcrc32c.c
list_debug.c
lmb.c lmb: Fix compile warning 2008-05-18 23:35:43 -05:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
Makefile infrastructure to debug (dynamic) objects 2008-04-30 08:29:53 -07:00
parser.c add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust 2008-05-14 19:23:25 -05:00
percpu_counter.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
plist.c
prio_heap.c
prio_tree.c
proportions.c mm: bdi: allow setting a maximum for the bdi dirty limit 2008-04-30 08:29:50 -07:00
radix-tree.c radix-tree: fix small lockless radix-tree bug 2008-06-12 18:05:41 -07:00
random32.c
ratelimit.c isolate ratelimit from printk.c for other use 2008-04-29 08:06:06 -07:00
rbtree.c
reciprocal_div.c
rwsem-spinlock.c
rwsem.c
scatterlist.c
sha1.c
smp_processor_id.c
sort.c
spinlock_debug.c
string.c Add a new sysfs_streq() string comparison function 2008-05-01 08:03:59 -07:00
swiotlb.c dma/ia64: update ia64 machvecs, swiotlb.c 2008-04-29 08:06:12 -07:00
textsearch.c
ts_bm.c
ts_fsm.c
ts_kmp.c
vsprintf.c