Commit Graph

185 Commits

Author SHA1 Message Date
Bob Peterson 5e2f7d617b GFS2: Make sure rindex is uptodate before starting transactions
This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-05 10:20:10 +01:00
Steven Whitehouse 66fc061bda GFS2: FITRIM ioctl support
The FITRIM ioctl provides an alternative way to send discard requests to
the underlying device. Using the discard mount option results in every
freed block generating a discard request to the block device. This can
be slow, since many block devices can only process discard requests of
larger sizes, and also such operations can be time consuming.

Rather than using the discard mount option, FITRIM allows a sweep of the
filesystem on an occasional basis, and also to optionally avoid sending
down discard requests for smaller regions.

In GFS2 FITRIM will work at resource group granularity. There is a flag
for each resource group which keeps track of which resource groups have
been trimmed. This flag is reset whenever a deallocation occurs in the
resource group, and set whenever a successful FITRIM of that resource
group has taken place. This helps to reduce repeated discard requests
for the same block ranges, again improving performance.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 17:10:21 +00:00
Steven Whitehouse a365fbf354 GFS2: Read resource groups on mount
This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:52:39 +00:00
Bob Peterson 718b97bd6b GFS2: Read in rindex if necessary during unlink
This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:48:02 +00:00
Steven Whitehouse 66ad863b41 GFS2: Fix nlink setting on inode creation
Since the nlink count will be 0, we need to use set_nlink rather
than inc_nlink in order to avoid triggering the inc_nlink warning
which was added recently.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11 12:35:05 +00:00
Linus Torvalds 1619ed8f60 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: local functions should be static
  GFS2: We only need one ACL getting function
  GFS2: Fix multi-block allocation
  GFS2: decouple quota allocations from block allocations
  GFS2: split function rgblk_search
  GFS2: Fix up "off by one" in the previous patch
  GFS2: move toward a generic multi-block allocator
  GFS2: O_(D)SYNC support for fallocate
  GFS2: remove vestigial al_alloced
  GFS2: combine gfs2_alloc_block and gfs2_alloc_di
  GFS2: Add non-try locks back to get_local_rgrp
  GFS2: f_ra is always valid in dir readahead function
  GFS2: Fix very unlikley memory leak in ACL xattr code
  GFS2: More automated code analysis fixes
  GFS2: Add readahead to sequential directory traversal
  GFS2: Fix up REQ flags
2012-01-08 13:07:54 -08:00
Al Viro 175a4eb7ea fs: propagate umode_t, misc bits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro 1a67aafb5f switch ->mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro 4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro 18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
H Hartley Sweeten 46cc1e5fce GFS2: local functions should be static
Quiets the sparse noise:

warning: symbol 'gfs2_initxattrs' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-12-06 09:46:41 +00:00
Steven Whitehouse 6a8099ed56 GFS2: Fix multi-block allocation
Clean up gfs2_alloc_blocks so that it takes the full extent length
rather than just the number of non-inode blocks as an argument. That
will only make a difference in the inode allocation case for now.

Also, this fixes the extent length handling around gfs2_alloc_extent() so
that multi block allocations will work again.

The rd_last_alloc block is set to the final block in the allocated
extent (as per the update to i_goal, but referenced to a different
start point).

This also removes the dinode argument to rgblk_search() which is no
longer used.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 12:18:51 +00:00
Bob Peterson 564e12b115 GFS2: decouple quota allocations from block allocations
This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 10:25:21 +00:00
Bob Peterson 6e87ed0fc9 GFS2: move toward a generic multi-block allocator
This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21 10:04:09 +00:00
Bob Peterson 3c5d785acf GFS2: combine gfs2_alloc_block and gfs2_alloc_di
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15 15:25:03 +00:00
Steven Whitehouse 87654896ca GFS2: More automated code analysis fixes
A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08 14:04:20 +00:00
Linus Torvalds f793f29611 Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw
* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits)
  GFS2: Move readahead of metadata during deallocation into its own function
  GFS2: Remove two unused variables
  GFS2: Misc fixes
  GFS2: rewrite fallocate code to write blocks directly
  GFS2: speed up delete/unlink performance for large files
  GFS2: Fix off-by-one in gfs2_blk2rgrpd
  GFS2: Clean up ->page_mkwrite
  GFS2: Correctly set goal block after allocation
  GFS2: Fix AIL flush issue during fsync
  GFS2: Use cached rgrp in gfs2_rlist_add()
  GFS2: Call do_strip() directly from recursive_scan()
  GFS2: Remove obsolete assert
  GFS2: Cache the most recently used resource group in the inode
  GFS2: Make resource groups "append only" during life of fs
  GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme
  GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added
  GFS2: Clean up gfs2_create
  GFS2: Use ->dirty_inode()
  GFS2: Fix bug trap and journaled data fsync
  GFS2: Fix inode allocation error path
  ...
2011-10-28 10:44:50 -07:00
Steven Whitehouse 54335b1fca GFS2: Cache the most recently used resource group in the inode
This means that after the initial allocation for any inode, the
last used resource group is cached in the inode for future use.
This drastically reduces the number of lookups of resource
groups in the common case, and this the contention on that
data structure.

The allocation algorithm is the same as previously, except that we
always check to see if the goal block is within the cached rgrp
first before going to the rbtree to look one up.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:34 +01:00
Steven Whitehouse 8339ee543e GFS2: Make resource groups "append only" during life of fs
Since we have ruled out supporting online filesystem shrink,
it is possible to make the resource group list append only
during the life of a super block. This gives several benefits:

Firstly, we only need to read new rindex elements as they are added
rather than needing to reread the whole rindex file each time one
element is added.

Secondly, the rindex glock can be held for much shorter periods of
time, and is completely removed from the fast path for allocations.
The lock is taken in shared mode only when updating the resource
groups when the first allocation occurs, and after a grow has
taken place.

Thirdly, this results in a reduction in code size, and everything
gets a lot simpler to understand in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:33 +01:00
Steven Whitehouse 9a63edd12b GFS2: Clean up gfs2_create
If we pass through knowledge of whether the creation is intended to be
exclusive or not, then we can deal with that in gfs2_create_inode
and remove one set of locking. Also this removes the loop in
gfs2_create and simplifies the code a bit.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:28 +01:00
Steven Whitehouse ab9bbda020 GFS2: Use ->dirty_inode()
The aim of this patch is to use the newly enhanced ->dirty_inode()
super block operation to deal with atime updates, rather than
piggy backing that code into ->write_inode() as is currently
done.

The net result is a simplification of the code in various places
and a reduction of the number of gfs2_dinode_out() calls since
this is now implied by ->dirty_inode().

Some of the mark_inode_dirty() calls have been moved under glocks
in order to take advantage of then being able to avoid locking in
->dirty_inode() when we already have suitable locks.

One consequence is that generic_write_end() now correctly deals
with file size updates, so that we do not need a separate check
for that afterwards. This also, indirectly, means that fdatasync
should work correctly on GFS2 - the current code always syncs the
metadata whether it needs to or not.

Has survived testing with postmark (with and without atime) and
also fsx.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:26 +01:00
Steven Whitehouse 40ac218f52 GFS2: Fix inode allocation error path
If we have got far enough through the inode allocation code
path that an inode has already been allocated, then we must
call iput to dispose of it, if an error occurs during a
later part of the process. This will always be the final iput
since there will be no other references to the inode.

Unlike when the inode has been unlinked, its block state will
be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
to skip the test in ->evict_inode() for this one case in order
to ensure that it will be deallocated correctly. This patch adds
a new flag in order to ensure that this will happen correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21 12:39:23 +01:00
James Morris 5a2f3a02ae Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next
Conflicts:
	fs/attr.c

Resolve conflict manually.

Signed-off-by: James Morris <jmorris@namei.org>
2011-08-09 10:31:03 +10:00
Christoph Hellwig 4e34e719e4 fs: take the ACL checks to common code
Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss.  This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:30:23 -04:00
Al Viro 6c673ab393 simplify gfs2_lookup()
d_splice_alias() will DTRT when given NULL or ERR_PTR

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:02 -04:00
Al Viro 10556cb21a ->permission() sanitizing: don't pass flags to ->permission()
not used by the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:24 -04:00
Al Viro 2830ba7f34 ->permission() sanitizing: don't pass flags to generic_permission()
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:22 -04:00
Al Viro 178ea73521 kill check_acl callback of generic_permission()
its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:16 -04:00
Mimi Zohar 9d8f13ba3f security: new security_inode_init_security API adds function callback
This patch changes the security_inode_init_security API by adding a
filesystem specific callback to write security extended attributes.
This change is in preparation for supporting the initialization of
multiple LSM xattrs and the EVM xattr.  Initially the callback function
walks an array of xattrs, writing each xattr separately, but could be
optimized to write multiple xattrs at once.

For existing security_inode_init_security() calls, which have not yet
been converted to use the new callback function, such as those in
reiserfs and ocfs2, this patch defines security_old_inode_init_security().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2011-07-18 12:29:38 -04:00
Steven Whitehouse f2741d9898 GFS2: Move all locking inside the inode creation function
Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 12:11:17 +01:00
Steven Whitehouse 160b4026dc GFS2: Clean up symlink creation
This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 10:34:59 +01:00
Steven Whitehouse e2d0a13bba GFS2: Clean up mkdir
This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13 09:55:55 +01:00
Steven Whitehouse 2ab9cd1c63 GFS2: Rename ops_inode.c to inode.c
This is the final part of the ops_inode.c/inode.c reordering. We
are left with a single file called inode.c which now contains
all the inode operations, as expected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:12:49 +01:00
Steven Whitehouse 64ea540258 GFS2: Inode.c is empty now, remove it
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 13:09:53 +01:00
Steven Whitehouse 9eed04cd99 GFS2: Move final part of inode.c into super.c
Now inode.c is empty.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:38 +01:00
Steven Whitehouse 194c011fc4 GFS2: Move most of the remaining inode.c into ops_inode.c
This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:14 +01:00
Steven Whitehouse d4b2cf1b05 GFS2: Move gfs2_refresh_inode() and friends into glops.c
Eventually there will only be a single caller of this code, so lets
move it where it can be made static at some future date.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:49 +01:00
Steven Whitehouse 94fb763b1a GFS2: Remove gfs2_dinode_print() function
This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:29 +01:00
Steven Whitehouse 3d6ecb7d16 GFS2: When adding a new dir entry, inc link count if it is a subdir
This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:43:53 +01:00
Steven Whitehouse d192a8e5c6 GFS2: Double check link count under glock
To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:35:40 +01:00
Steven Whitehouse 4667a0ec32 GFS2: Make writeback more responsive to system conditions
This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:37 +01:00
Steven Whitehouse f42ab08529 GFS2: Optimise glock lru and end of life inodes
The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:17 +01:00
Bob Peterson 44ad37d69b GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:23:50 +01:00
James Morris fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
Eric Paris 2a7dba391e fs/vfs/security: pass last path component to LSM on inode creation
SELinux would like to implement a new labeling behavior of newly created
inodes.  We currently label new inodes based on the parent and the creating
process.  This new behavior would also take into account the name of the
new object when deciding the new label.  This is not the (supposed) full path,
just the last component of the path.

This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations.  We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly.  This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook.  If no such name
exists it is fine to pass NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2011-02-01 11:12:29 -05:00
Steven Whitehouse 24d9765fc1 GFS2: Fix error path in gfs2_lookup_by_inum()
In the (impossible, except if there is fs corruption) error path
in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
fails, it was leaving the function by calling iput() rather
than iget_failed(). This would cause future lookups of the same
inode to block forever.

This patch fixes the problem by moving the call to gfs2_inode_refresh()
into gfs2_inode_lookup() where iget_failed() is part of the error path
already. Also this cleans up some unreachable code and makes
gfs2_set_iop() static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:49:08 +00:00
Linus Torvalds b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Steven Whitehouse 9e55cd5372 GFS2: Remove unreachable calls to vmtruncate
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30 10:22:48 +00:00
Steven Whitehouse 044b9414c7 GFS2: Fix inode deallocation race
This area of the code has always been a bit delicate due to the
subtleties of lock ordering. The problem is that for "normal"
alloc/dealloc, we always grab the inode locks first and the rgrp lock
later.

In order to ensure no races in looking up the unlinked, but still
allocated inodes, we need to hold the rgrp lock when we do the lookup,
which means that we can't take the inode glock.

The solution is to borrow the technique already used by NFS to solve
what is essentially the same problem (given an inode number, look up
the inode carefully, checking that it really is in the expected
state).

We cannot do that directly from the allocation code (lock ordering
again) so we give the job to the pre-existing delete workqueue and
carry on with the allocation as normal.

If we find there is no space, we do a journal flush (required anyway
if space from a deallocation is to be released) which should block
against the pending deallocations, so we should always get the space
back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-15 12:44:42 +00:00