linux/lib
Nick Piggin e2848a0efe radix-tree: avoid atomic allocations for preloaded insertions
Most pagecache (and some other) radix tree insertions have the great
opportunity to preallocate a few nodes with relaxed gfp flags.  But the
preallocation is squandered when it comes time to allocate a node, we
default to first attempting a GFP_ATOMIC allocation -- that doesn't
normally fail, but it can eat into atomic memory reserves that we don't
need to be using.

Another upshot of this is that it removes the sometimes highly contended
zone->lock from underneath tree_lock.  Pagecache insertions are always
performed with a radix tree preload, and after this change, such a
situation will never fall back to kmem_cache_alloc within
radix_tree_node_alloc.

David Miller reports seeing this allocation fail on a highly threaded
sparc64 system:

[527319.459981] dd: page allocation failure. order:0, mode:0x20
[527319.460403] Call Trace:
[527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
[527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
[527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
[527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
[527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
[527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
[527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
[527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
[527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
[527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
[527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
[527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
[527319.461361]  [00000000004bb920] sys_read+0x34/0x60
[527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40

The calltrace is significant: __do_page_cache_readahead allocates a number
of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
goes to mpage_readpages, there can be significant intervals (including disk
IO) before all the pages are inserted into the radix-tree.  So the reserves
can easily be depleted at that point.  The patch is confirmed to fix the
problem.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
..
lzo lzo: add some missing casts 2007-07-31 15:39:37 -07:00
reed_solomon [MTD] [NAND] Replace -1 with -EBADMSG in nand error correction code 2007-10-20 22:30:54 +01:00
zlib_deflate lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
zlib_inflate [ZLIB]: Fix external builds of zlib_inflate code. 2007-10-11 22:17:20 -07:00
.gitignore
Kconfig Introduce CONFIG_CHECK_SIGNATURE 2007-08-22 19:52:45 -07:00
Kconfig.debug kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH 2008-02-03 08:58:07 +01:00
Makefile iommu sg: add IOMMU helper functions for the free area management 2008-02-05 09:44:11 -08:00
argv_split.c LIB: Replace inappropriate include of <linux/bug.h> 2007-10-20 00:26:10 +02:00
audit.c [PATCH] audit signal recipients 2007-05-11 05:38:25 -04:00
bitmap.c Fix bitmap_scnlistprintf for empty masks 2007-11-05 15:12:32 -08:00
bitrev.c [PATCH] add MODULE_* attributes to bit reversal library 2006-12-10 10:07:52 -08:00
bug.c generic bug: use show_regs() instead of dump_stack() 2007-07-16 09:05:51 -07:00
bust_spinlocks.c handle recursive calls to bust_spinlocks() 2007-10-17 08:42:56 -07:00
check_signature.c uninline check_signature() 2007-07-16 09:05:50 -07:00
cmdline.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
cpumask.c Safer nr_node_ids and nr_node_ids determination and initial values 2007-05-07 12:12:51 -07:00
crc-ccitt.c
crc-itu-t.c CRC ITU-T V.41 2007-05-10 18:24:13 +02:00
crc7.c CRC7 support 2007-07-17 10:23:04 -07:00
crc16.c
crc32.c lib/: Spelling fixes 2008-02-03 17:48:52 +02:00
crc32defs.h
ctype.c
debug_locks.c
dec_and_lock.c
devres.c iomap: implement pcim_iounmap_regions() 2007-04-28 14:15:58 -04:00
div64.c [S390]: Fix build on 31-bit. 2007-04-25 22:28:53 -07:00
dump_stack.c
extable.c
fault-inject.c fault_inject: silence a warning 2007-07-24 12:24:59 -07:00
find_next_bit.c ext4: Add ext4_find_next_bit() 2008-01-28 23:58:27 -05:00
gen_crc32table.c
genalloc.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
halfmd4.c
hexdump.c hexdump: don't print bytes with bit 7 set 2007-11-29 09:24:53 -08:00
hweight.c remove asm/bitops.h includes 2007-10-19 11:53:41 -07:00
idr.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
inflate.c [PATCH] x86-64: deflate inflate_dynamic too 2007-05-02 19:27:15 +02:00
int_sqrt.c
iomap.c lib/iomap.c:bad_io_access(): print 0x hex prefix 2007-10-17 08:42:57 -07:00
iomap_copy.c
iommu-helper.c iommu sg: add IOMMU helper functions for the free area management 2008-02-05 09:44:11 -08:00
ioremap.c lib/ioremap.c should #include <linux/io.h> 2007-10-17 08:42:50 -07:00
irq_regs.c
kasprintf.c lib: move kasprintf to a separate file 2007-07-31 15:39:39 -07:00
kernel_lock.c sched: remove the !PREEMPT_BKL code 2008-01-25 21:08:33 +01:00
klist.c
kobject.c kobject: kerneldoc comment fix 2008-02-02 15:14:48 -08:00
kobject_uevent.c Kobject: fix coding style issues in kobject c files 2008-01-24 21:59:04 -08:00
kref.c kref: add kref_set() 2008-01-24 20:40:05 -08:00
libcrc32c.c [LIB] crc32c: Keep intermediate crc state in cpu order 2007-11-08 21:34:09 +08:00
list_debug.c [PATCH] More list debugging context 2006-12-07 08:39:35 -08:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c [PATCH] lockdep: show more details about self-test failures 2006-12-07 08:39:43 -08:00
parser.c [AFS]: Make the match_*() functions take const options. 2007-05-03 03:10:39 -07:00
pcounter.c [LIB] pcounter : unline too big functions 2008-01-28 15:00:35 -08:00
percpu_counter.c Add irq protection in the percpu-counters cpu-hotplug-callback path 2007-10-19 11:53:44 -07:00
plist.c
prio_heap.c Fix cpusets update_cpumask 2007-10-19 11:53:41 -07:00
prio_tree.c
proportions.c lib: proportion: fix underflow in prop_norm_percpu() 2007-12-23 12:54:37 -08:00
radix-tree.c radix-tree: avoid atomic allocations for preloaded insertions 2008-02-05 09:44:17 -08:00
random32.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
rbtree.c
reciprocal_div.c [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() 2006-12-13 09:05:49 -08:00
rwsem-spinlock.c Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() 2006-10-11 01:45:14 -04:00
rwsem.c x86: fix UML and -regparm=3 2008-01-30 13:33:00 +01:00
scatterlist.c SG: work with the SCSI fixed maximum allocations. 2008-01-28 10:54:49 +01:00
semaphore-sleepers.c
sha1.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
smp_processor_id.c
sort.c lib/sort.c optimization 2007-10-17 08:42:52 -07:00
spinlock_debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
string.c [STRING]: Move strcasecmp/strncasecmp to lib/string.c 2007-04-26 01:54:39 -07:00
swiotlb.c iommu sg merging: swiotlb: respect the segment boundary limits 2008-02-05 09:44:12 -08:00
textsearch.c [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure 2007-12-01 00:03:52 +11:00
ts_bm.c
ts_fsm.c
ts_kmp.c
vsprintf.c lib: move kasprintf to a separate file 2007-07-31 15:39:39 -07:00