Hugh Dickins e2bdb933ab radix_tree: take radix_tree_path off stack
Down, down in the deepest depths of GFP_NOIO page reclaim, we have
shrink_page_list() calling __remove_mapping() calling __delete_from_
swap_cache() or __delete_from_page_cache().

You would not expect those to need much stack, but in fact they call
radix_tree_delete(): which declares a 192-byte radix_tree_path array on
its stack (to record the node,offsets it visits when descending, in case
it needs to ascend to update them).  And if any tag is still set [1],
that calls radix_tree_tag_clear(), which declares a further such
192-byte radix_tree_path array on the stack.  (At least we have
interrupts disabled here, so won't then be pushing registers too.)

That was probably a good choice when most users were 32-bit (array of
half the size), and adding fields to radix_tree_node would have bloated
it unnecessarily.  But nowadays many are 64-bit, and each
radix_tree_node contains a struct rcu_head, which is only used when
freeing; whereas the radix_tree_path info is only used for updating the
tree (deleting, clearing tags or setting tags if tagged) when a lock
must be held, of no interest when accessing the tree locklessly.

So add a parent pointer to the radix_tree_node, in union with the
rcu_head, and remove all uses of the radix_tree_path.  There would be
space in that union to save the offset when descending as before (we can
argue that a lock must already be held to exclude other users), but
recalculating it when ascending is both easy (a constant shift and a
constant mask) and uncommon, so it seems better just to do that.

Two little optimizations: no need to decrement height when descending,
adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
tag on a node and its ancestors, it need not ascend from that node
again.

perf on the radix tree test harness reports radix_tree_insert() as 2%
slower (now having to set parent), but radix_tree_delete() 24% faster.
Surely that's an exaggeration from rtth's artificially low map shift 3,
but forcing it back to 6 still rates radix_tree_delete() 8% faster.

[1] Can a pagecache tag (dirty, writeback or towrite) actually still be
set at the time of radix_tree_delete()? Perhaps not if the filesystem is
well-behaved.  But although I've not tracked any stack overflow down to
this cause, I have observed a curious case in which a dirty tag is set
and left set on tmpfs: page migration's migrate_page_copy() happens to
use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and
that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
filesystem which doesn't use tags, except for this stack depth issue.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nai Xia <nai.xia@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:12 -08:00
..
2011-09-21 13:39:59 -07:00
2011-07-26 16:49:47 -07:00
2008-07-24 10:47:33 -07:00
2011-03-11 14:25:50 +00:00
2008-06-06 11:29:10 -07:00
2007-07-16 09:05:50 -07:00
2011-10-29 21:20:22 +02:00
2007-07-17 10:23:04 -07:00
2012-01-10 16:30:51 -08:00
2007-05-10 18:24:13 +02:00
2011-07-26 16:49:47 -07:00
2011-03-31 11:26:23 -03:00
2012-01-03 22:54:56 -05:00
2009-06-18 13:04:05 -07:00
2010-05-25 08:07:06 -07:00
2011-09-20 23:24:44 -04:00
2010-04-06 15:52:11 -07:00
2011-11-02 16:07:00 -07:00
2011-11-28 21:12:42 +02:00
2010-08-09 20:45:05 -07:00
2011-07-25 20:57:15 -07:00
2011-08-06 18:32:45 -07:00
2011-11-04 17:47:34 -04:00
2011-03-31 11:26:23 -03:00
2011-11-28 21:12:42 +02:00
2010-06-16 18:08:13 +02:00
2010-01-11 09:34:05 -08:00
2011-12-08 19:52:43 -05:00
2008-07-26 12:00:10 -07:00
2011-03-31 11:26:23 -03:00
2010-05-19 22:40:47 -04:00