Implement locking around struct ocfs2_refcount_tree. This protects
all read/write operations on refcount trees. ocfs2_refcount_tree
has its own lock and its own caching_info, protecting buffers among
multiple nodes.
User must call ocfs2_lock_refcount_tree before his operation on
the tree and unlock it after that.
ocfs2_refcount_trees are referenced by the block number of the
refcount tree root block, So we create an rb-tree on the ocfs2_super
to look them up.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
refcount tree should use its own caching info so that when
we downconvert the refcount tree lock, we can drop all the
cached buffer head.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
In meta downconvert, we need to checkpoint the metadata in an inode.
For refcount tree, we also need it. So abstract the process out.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info. Phew!
Signed-off-by: Joel Becker <joel.becker@oracle.com>
We only allow unwritten extents on data, so the toplevel
ocfs2_mark_extent_written() can use an inode all it wants. But the
subfunction isn't even using the inode argument.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
ocfs2_insert_extent() wants to insert a record into the extent map if
it's an inode data extent. But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree. Other tree types
can leave it empty.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
ocfs2_remove_extent() wants to truncate the extent map if it's
truncating an inode data extent. But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree. Other tree types
can leave it empty.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
ocfs2_grow_branch() not really using it other than to pass it to the
subfunctions ocfs2_shift_tree_depth(), ocfs2_find_branch_target(), and
ocfs2_add_branch(). The first two weren't it either, so they drop the
argument. ocfs2_add_branch() only passed it to
ocfs2_adjust_rightmost_branch(), which drops the inode argument and uses
the ocfs2_extent_tree as well.
ocfs2_append_rec_to_path() can be take an ocfs2_extent_tree instead of
the inode. The function ocfs2_adjust_rightmost_records() goes along for
the ride.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
It already gets ocfs2_extent_tree, so we can just use that. This chains
to the same modification for ocfs2_remove_rightmost_path() and
ocfs2_rotate_rightmost_leaf_left().
Signed-off-by: Joel Becker <joel.becker@oracle.com>
It already has struct ocfs2_extent_tree, which has the caching info. So
we don't need to pass it struct inode.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
It already has struct ocfs2_extent_tree, which has the caching info. So
we don't need to pass it struct inode.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Get rid of the inode argument. Use extent_tree instead. This means a
few more functions have to pass an extent_tree around.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Pass the ocfs2_extent_list down through ocfs2_rotate_tree_right() and
get rid of struct inode in ocfs2_rotate_subtree_root_right().
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Pass struct ocfs2_extent_tree into ocfs2_create_new_meta_bhs(). It no
longer needs struct inode or ocfs2_super.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent
blocks. They need struct ocfs2_caching_info for that, but not struct
inode.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
extent blocks belong to btrees on more than just inodes, so we want to
pass the ocfs2_caching_info structure directly to
ocfs2_read_extent_block(). A number of places in alloc.c can now drop
struct inode from their argument list.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
What do we cache? Metadata blocks. What are most of our non-inode metadata
blocks? Extent blocks for our btrees. struct ocfs2_extent_tree is the
main structure for managing those. So let's store the associated
ocfs2_caching_info there.
This means that ocfs2_et_root_journal_access() doesn't need struct inode
anymore, and any place that has an et can refer to et->et_ci instead of
INODE_CACHE(inode).
Signed-off-by: Joel Becker <joel.becker@oracle.com>
The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.
This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Similar ip_last_trans, ip_created_trans tracks the creation of a journal
managed inode. This specifically tracks what transaction created the
inode. This is so the code can know if the inode has ever been written
to disk.
This behavior is desirable for any journal managed object. We move it
to struct ocfs2_caching_info as ci_created_trans so that any object
using ocfs2_caching_info can rely on this behavior.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
We have the read side of metadata caching isolated to struct
ocfs2_caching_info, now we need the write side. This means the journal
functions. The journal only does a couple of things with struct inode.
This change moves the ip_last_trans field onto struct
ocfs2_caching_info as ci_last_trans. This field tells the journal
whether a pending journal flush is required.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
We don't really want to cart around too many new fields on the
ocfs2_caching_info structure. So let's wrap all our access of the
parent object in a set of operations. One pointer on caching_info, and
more flexibility to boot.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
We want to use the ocfs2_caching_info structure in places that are not
inodes. To do that, it can no longer rely on referencing the inode
directly.
This patch moves the flags to ocfs2_caching_info->ci_flags, stores
pointers to the parent's locks on the ocfs2_caching_info, and renames
the constants and flags to reflect its independant state.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Bug introduced by mainline commit e7432675f8
The bug causes ocfs2_write_begin_nolock() to oops when len=0.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In commit a5a0a63092, when
ocfs2_attch_dentry_lock fails, we call an extra iput and reset
dentry->d_fsdata to NULL. This resolve a bug, but it isn't
completed and the dentry is still there. When we want to use
it again, ocfs2_dentry_revalidate doesn't catch it and return
true. That make future ocfs2_dentry_lock panic out.
One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.
The resolution is to add a check for dentry->d_fsdata in
revalidate process and return false if dentry->d_fsdata is NULL,
so that a new ocfs2_lookup will be called again.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In case a downconvert is queued, and a flock receives a signal,
BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered
because a lock cancel triggers a dlmunlock while an AST is
scheduled.
To avoid this, allow a LKM_CANCEL to pass through, and let it
wait on __dlm_wait_on_lockres().
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Acked-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>