linux/fs/xfs
David Chinner 59a33f9f77 [XFS] Ensure a btree insert returns a valid cursor.
When writing into preallocated regions there is a case where XFS can oops
or hang doing the unwritten extent conversion on I/O completion. It turns
out that the problem is related to the btree cursor being invalid.

When we do an insert into the tree, we may need to split blocks in the
tree. When we only split at the leaf level (i.e. level 0), everything
works just fine. However, if we have a multi-level split in the btreee,
the cursor passed to the insert function is no longer valid once the
insert is complete.

The leaf level split is handled correctly because all the operations at
level 0 are done using the original cursor, hence it is updated correctly.
However, when we need to update the next level up the tree, we don't use
that cursor - we use a cloned cursor that points to the index in the next
level up where we need to do the insert.

Hence if we need to split a second level, the changes to the tree are
reflected in the cloned cursor and not the original cursor. This
clone-and-move-up-a-level-on-split behaviour recurses all the way to the
top of the tree.

The complexity here is that these cloned cursors do not point to the
original index that was inserted - they point to the newly allocated block
(the right block) and the original cursor pointer to that level may still
point to the left block. Hence, without deep examination of the cloned
cursor and buffers, we cannot update the original cursor with the new path
from the cloned cursor.

In these cases the original cursor could be pointing to the wrong block(s)
and hence a subsequent modification to the tree using that cursor will
lead to corruption of the tree.

The crash case occurs when the tree changes height - we insert a new level
in the tree, and the cursor does not have a buffer in it's path for that
level. Hence any attempt to walk back up the cursor to the root block will
result in a null pointer dereference.

To make matters even more complex, the BMAP BT is rooted in an inode, so
we can have a change of height in the btree *without a root split*. That
is, if the root block in the inode is full when we split a leaf node, we
cannot fit the pointer to the new block in the root, so we allocate a new
block, migrate all the ptrs out of the inode into the new block and point
the inode root block at the newly allocated block. This changes the height
of the tree without a root split having occurred and hence invalidates the
path in the original cursor.

The patch below prevents xfs_bmbt_insert() from returning with an invalid
cursor by detecting the cases that invalidate the original cursor and
refresh it by do a lookup into the btree for the original index we were
inserting at.

Note that the INOBT, AGFBNO and AGFCNT btree implementations also have
this bug, but the cursor is currently always destroyed or revalidated
after an insert for those trees. Hence this patch only address the problem
in the BMBT code.

SGI-PV: 979339
SGI-Modid: xfs-linux-melb:xfs-kern:30701a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18 11:42:21 +10:00
..
linux-2.6 [XFS] cleanup vnode use in xfs_iops.c 2008-04-18 11:41:14 +10:00
quota [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
support [XFS] Use power-of-2 sized buffers to reduce overhead 2008-04-18 11:40:04 +10:00
Kconfig
Makefile [XFS] Added quota targets and removed dmapi directory 2008-02-18 13:06:17 +11:00
xfs.h [XFS] clean up vnode/inode tracing 2008-02-07 16:42:19 +11:00
xfs_acl.c [XFS] use generic_permission 2008-02-07 18:22:38 +11:00
xfs_acl.h [XFS] use generic_permission 2008-02-07 18:22:38 +11:00
xfs_ag.h [XFS] Unwrap pagb_lock. 2008-02-07 16:46:39 +11:00
xfs_alloc.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_alloc.h
xfs_alloc_btree.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_alloc_btree.h
xfs_arch.h xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_attr.c [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_attr.h [XFS] kill struct bhv_vnode 2007-10-16 11:40:24 +10:00
xfs_attr_leaf.c [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
xfs_attr_leaf.h
xfs_attr_sf.h
xfs_bit.c [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac 2008-02-26 17:05:44 +11:00
xfs_bit.h [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac 2008-02-26 17:05:44 +11:00
xfs_bmap.c [XFS] cleanup vnode use in xfs_bmap.c 2008-04-18 11:41:25 +10:00
xfs_bmap.h [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_bmap_btree.c [XFS] Ensure a btree insert returns a valid cursor. 2008-04-18 11:42:21 +10:00
xfs_bmap_btree.h [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
xfs_btree.c
xfs_btree.h [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_buf_item.c [XFS] Unwrap AIL_LOCK 2008-02-07 16:44:23 +11:00
xfs_buf_item.h [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_clnt.h [XFS] If you mount an XFS filesystem with no mount options at all, then 2008-02-28 20:37:56 -08:00
xfs_da_btree.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_da_btree.h [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_dfrag.c [XFS] stop re-checking permissions in xfs_swapext 2008-02-07 18:22:24 +11:00
xfs_dfrag.h
xfs_dinode.h [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros. 2008-02-07 18:19:24 +11:00
xfs_dir2.c [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
xfs_dir2.h [XFS] decontaminate vnode operations from behavior details 2007-10-15 16:54:29 +10:00
xfs_dir2_block.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_dir2_block.h [XFS] use filldir internally 2007-10-15 16:49:49 +10:00
xfs_dir2_data.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_dir2_data.h
xfs_dir2_leaf.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_dir2_leaf.h [XFS] use filldir internally 2007-10-15 16:49:49 +10:00
xfs_dir2_node.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_dir2_node.h
xfs_dir2_sf.c [XFS] Put the correct offset in dirent d_off 2007-12-18 17:16:23 +11:00
xfs_dir2_sf.h [XFS] use filldir internally 2007-10-15 16:49:49 +10:00
xfs_dir2_trace.c
xfs_dir2_trace.h
xfs_dmapi.h [XFS] kill the vfs_flags member in struct bhv_vfs 2007-10-16 11:45:57 +10:00
xfs_dmops.c [XFS] fixups after behavior removal merge into mainline git 2007-10-19 17:14:45 +10:00
xfs_error.c [XFS] lose xfs_hex_dump in favor of print_hex_dump 2008-02-07 18:13:05 +11:00
xfs_error.h [XFS] lose xfs_hex_dump in favor of print_hex_dump 2008-02-07 18:13:05 +11:00
xfs_extfree_item.c [XFS] Unwrap AIL_LOCK 2008-02-07 16:44:23 +11:00
xfs_extfree_item.h
xfs_filestream.c [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_filestream.h
xfs_fs.h [XFS] fix 32-bit compat ioctls for GETXFLAGS, SETXFLAGS, GETVERSION 2008-02-07 18:13:17 +11:00
xfs_fsops.c [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
xfs_fsops.h
xfs_ialloc.c [XFS] Account for inode cluster alignment in all allocations 2008-04-18 11:42:09 +10:00
xfs_ialloc.h
xfs_ialloc_btree.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_ialloc_btree.h [XFS] kill XFS_INOBT_IS_FREE_DISK 2008-02-07 18:12:41 +11:00
xfs_iget.c [XFS] Remove the xfs_icluster structure 2008-04-18 11:37:41 +10:00
xfs_imap.h
xfs_inode.c [XFS] Use xfs_inode_clean() in more places 2008-04-18 11:37:51 +10:00
xfs_inode.h [XFS] Remove the xfs_icluster structure 2008-04-18 11:37:41 +10:00
xfs_inode_item.c [XFS] remove shouting-indirection macros from xfs_sb.h 2008-04-10 16:24:45 +10:00
xfs_inode_item.h [XFS] Use xfs_inode_clean() in more places 2008-04-18 11:37:51 +10:00
xfs_inum.h
xfs_iomap.c [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config 2008-02-07 18:16:43 +11:00
xfs_iomap.h [XFS] kill unnessecary ioops indirection 2008-02-07 16:44:14 +11:00
xfs_itable.c [XFS] Don't block pdflush when writing back inodes 2008-04-18 11:37:32 +10:00
xfs_itable.h
xfs_log.c [XFS] Use atomics for iclog reference counting 2008-04-18 11:38:10 +10:00
xfs_log.h [XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations 2008-02-07 18:11:47 +11:00
xfs_log_priv.h [XFS] Use atomics for iclog reference counting 2008-04-18 11:38:10 +10:00
xfs_log_recover.c [XFS] Don't block pdflush when writing back inodes 2008-04-18 11:37:32 +10:00
xfs_log_recover.h
xfs_mount.c [XFS] Remove superflous xfs_readsb call in xfs_mountfs. 2008-04-18 11:41:46 +10:00
xfs_mount.h [XFS] Replace custom AIL linked-list code with struct list_head 2008-04-18 11:41:57 +10:00
xfs_mru_cache.c [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_mru_cache.h
xfs_qmops.c [XFS] Unwrap XFS_SB_LOCK. 2008-02-07 16:47:15 +11:00
xfs_quota.h [XFS] remove dependency of the quota module on behaviors 2007-10-16 11:43:26 +10:00
xfs_refcache.h
xfs_rename.c [XFS] cleanup vnode use in xfs_symlink and xfs_rename 2008-04-18 11:40:45 +10:00
xfs_rtalloc.c [XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac 2008-02-26 17:05:44 +11:00
xfs_rtalloc.h [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config 2008-02-07 18:16:43 +11:00
xfs_rw.c [XFS] decontaminate vfs operations from behavior details 2007-10-16 11:43:55 +10:00
xfs_rw.h [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config 2008-02-07 18:16:43 +11:00
xfs_sb.h [XFS] Ensure "both" features2 slots are consistent 2008-04-10 16:25:26 +10:00
xfs_trans.c xfs: convert beX_add to beX_add_cpu (new common API) 2008-02-13 16:21:19 -08:00
xfs_trans.h [XFS] Replace custom AIL linked-list code with struct list_head 2008-04-18 11:41:57 +10:00
xfs_trans_ail.c [XFS] Replace custom AIL linked-list code with struct list_head 2008-04-18 11:41:57 +10:00
xfs_trans_buf.c [XFS] Don't block pdflush when writing back inodes 2008-04-18 11:37:32 +10:00
xfs_trans_extfree.c [XFS] Radix tree based inode caching 2007-10-15 16:50:50 +10:00
xfs_trans_inode.c
xfs_trans_item.c [XFS] Fix up sparse warnings. 2008-02-07 18:14:38 +11:00
xfs_trans_priv.h [XFS] Move AIL pushing into it's own thread 2008-02-07 18:22:51 +11:00
xfs_trans_space.h
xfs_types.h [XFS] use filldir internally 2007-10-15 16:49:49 +10:00
xfs_utils.c [XFS] kill xfs_get_dir_entry 2008-04-18 11:39:14 +10:00
xfs_utils.h [XFS] kill xfs_get_dir_entry 2008-04-18 11:39:14 +10:00
xfs_vfsops.c [XFS] cleanup vnode use in dmapi calls 2008-04-18 11:40:15 +10:00
xfs_vfsops.h [XFS] kill xfs_root 2008-02-07 18:24:00 +11:00
xfs_vnodeops.c [XFS] cleanup vnode use in xfs_lookup 2008-04-18 11:40:55 +10:00
xfs_vnodeops.h [XFS] cleanup vnode use in xfs_lookup 2008-04-18 11:40:55 +10:00