linux/lib
Nick Piggin e2848a0efe radix-tree: avoid atomic allocations for preloaded insertions
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone->lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
..
lzo
reed_solomon
zlib_deflate lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
zlib_inflate
.gitignore
argv_split.c
audit.c
bitmap.c
bitrev.c
bug.c
bust_spinlocks.c
check_signature.c
cmdline.c
cpumask.c
crc7.c
crc16.c
crc32.c lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
ctype.c
debug_locks.c
dec_and_lock.c
devres.c
div64.c
dump_stack.c
extable.c
fault-inject.c
find_next_bit.c ext4: Add ext4_find_next_bit() 2008-01-28 23:58:27 -05:00
gen_crc32table.c
genalloc.c
halfmd4.c
hexdump.c
hweight.c
idr.c
inflate.c
int_sqrt.c
iomap_copy.c
iomap.c
iommu-helper.c iommu sg: add IOMMU helper functions for the free area management 2008-02-05 09:44:11 -08:00
ioremap.c
irq_regs.c
kasprintf.c
Kconfig
Kconfig.debug kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH 2008-02-03 08:58:07 +01:00
kernel_lock.c sched: remove the !PREEMPT_BKL code 2008-01-25 21:08:33 +01:00
klist.c
kobject_uevent.c Kobject: fix coding style issues in kobject c files 2008-01-24 21:59:04 -08:00
kobject.c kobject: kerneldoc comment fix 2008-02-02 15:14:48 -08:00
kref.c
libcrc32c.c
list_debug.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
Makefile iommu sg: add IOMMU helper functions for the free area management 2008-02-05 09:44:11 -08:00
parser.c
pcounter.c [LIB] pcounter : unline too big functions 2008-01-28 15:00:35 -08:00
percpu_counter.c
plist.c
prio_heap.c
prio_tree.c
proportions.c
radix-tree.c radix-tree: avoid atomic allocations for preloaded insertions 2008-02-05 09:44:17 -08:00
random32.c
rbtree.c
reciprocal_div.c
rwsem-spinlock.c
rwsem.c x86: fix UML and -regparm=3 2008-01-30 13:33:00 +01:00
scatterlist.c SG: work with the SCSI fixed maximum allocations. 2008-01-28 10:54:49 +01:00
semaphore-sleepers.c
sha1.c
smp_processor_id.c
sort.c
spinlock_debug.c
string.c
swiotlb.c iommu sg merging: swiotlb: respect the segment boundary limits 2008-02-05 09:44:12 -08:00
textsearch.c
ts_bm.c
ts_fsm.c
ts_kmp.c
vsprintf.c