Commit Graph

1524 Commits

Author SHA1 Message Date
David Howells 17d2c0a0c4 NFS: Fix RCU issues in the NFSv4 delegation code
Fix a number of RCU issues in the NFSv4 delegation code.

 (1) delegation->cred doesn't need to be RCU protected as it's essentially an
     invariant refcounted structure.

     By the time we get to nfs_free_delegation(), the delegation is being
     released, so no one else should be attempting to use the saved
     credentials, and they can be cleared.

     However, since the list of delegations could still be under traversal at
     this point by such as nfs_client_return_marked_delegations(), the cred
     should be released in nfs_do_free_delegation() rather than in
     nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
     insufficient as that doesn't stop the cred from being destroyed, and nor
     does calling put_rpccred() after call_rcu(), given that the latter is
     asynchronous.

 (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
     rcu_derefence_protected() because they can only be called if
     nfs_client::cl_lock is held, and that guards against anyone changing
     nfsi->delegation under it.  Furthermore, the barrier imposed by
     rcu_dereference() is superfluous, given that the spin_lock() is also a
     barrier.

 (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
     struct so that it can issue lockdep advice based on clp->cl_lock for (2).

 (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
     should use rcu_access_pointer() outside the spinlocked region as they
     merely examine the pointer and don't follow it, thus rendering unnecessary
     the need to impose a partial ordering over the one item of interest.

     These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
 #0:  (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
 #1:  (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a

stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
 [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
 [<ffffffff810c2d92>] clear_inode+0x9e/0xf8
 [<ffffffff810c3028>] dispose_list+0x67/0x10e
 [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
 [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
 [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
 [<ffffffff810b25bc>] deactivate_super+0x68/0x80
 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
 [<ffffffff810c681b>] release_mounts+0x9a/0xb0
 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
 [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
 [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
 [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
 [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8
 [<ffffffff810c810b>] do_mount+0x782/0x7f9
 [<ffffffff810c8205>] sys_mount+0x83/0xbe
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Also on:

fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
 [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
 [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
 ...

And:

fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
 [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
 [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
 [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
 ...

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-01 12:37:18 -04:00
Trond Myklebust 8f649c3762 NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-01 12:36:18 -04:00
Linus Torvalds 27fb8d7b1f Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
  nfs: fix some issues in nfs41_proc_reclaim_complete()
  NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
  NFS: Fix an unstable write data integrity race
  nfs: testing for null instead of ERR_PTR()
  NFS: rsize and wsize settings ignored on v4 mounts
  NFSv4: Don't attempt an atomic open if the file is a mountpoint
  SUNRPC: Fix a bug in rpcauth_prune_expired
2010-04-29 10:23:44 -07:00
Al Viro d9e80b7de9 nfs d_revalidate() is too trigger-happy with d_drop()
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.

Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-28 20:40:03 -07:00
Xiaotian Feng 9699eda6bc nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for
export_path in nfs4_validate_text_mount_data, so we need to free it then.
This is addressed in following kmemleak report:

unreferenced object 0xffff88016bf48a50 (size 16):
  comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s)
  hex dump (first 16 bytes):
    2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5  /opt/work.kkkkk.
  backtrace:
    [<ffffffff814b34f9>] kmemleak_alloc+0x60/0xa7
    [<ffffffff81102c76>] kmemleak_alloc_recursive.clone.5+0x1b/0x1d
    [<ffffffff811046b3>] __kmalloc_track_caller+0x18f/0x1b7
    [<ffffffff810e1b08>] kstrndup+0x37/0x54
    [<ffffffffa0336971>] nfs_parse_devname+0x152/0x204 [nfs]
    [<ffffffffa0336af3>] nfs4_validate_text_mount_data+0xd0/0xdc [nfs]
    [<ffffffffa0338deb>] nfs_get_sb+0x325/0x736 [nfs]
    [<ffffffff81113671>] vfs_kern_mount+0xbd/0x17c
    [<ffffffff81113798>] do_kern_mount+0x4d/0xed
    [<ffffffff81129a87>] do_mount+0x787/0x7fe
    [<ffffffff81129b86>] sys_mount+0x88/0xc2
    [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-28 13:46:28 -04:00
Dan Carpenter acf82b85a7 nfs: fix some issues in nfs41_proc_reclaim_complete()
The original code passed an ERR_PTR() to rpc_put_task() and instead of
returning zero on success it returned -ENOMEM.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-28 13:45:12 -04:00
Trond Myklebust ba8b06e67e NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush. According to the trace in
     https://bugzilla.novell.com/show_bug.cgi?id=599628
the problem appears to be due to nfs_wb_page() not waiting for the
PG_writeback flag to clear.

There is a ditto problem in nfs_wb_page_cancel()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-27 18:33:54 -04:00
Trond Myklebust 71d0a6112a NFS: Fix an unstable write data integrity race
Commit 2c61be0a94 (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.

This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:57 -04:00
Dan Carpenter cdd29ecfcb nfs: testing for null instead of ERR_PTR()
nfs_path() returns an ERR_PTR(), it doesn't return null.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:56 -04:00
Chuck Lever 356e76b855 NFS: rsize and wsize settings ignored on v4 mounts
NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both.  This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.

I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:56 -04:00
Trond Myklebust 1f063d2cdf NFSv4: Don't attempt an atomic open if the file is a mountpoint
Fix https://bugzilla.kernel.org/show_bug.cgi?id=15789

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:55 -04:00
Trond Myklebust 0df5dd4aae NFSv4: fix delegated locking
Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6d (NFSv4:
Don't allow posix locking against servers that don't support it).

According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.

The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-04-12 07:55:15 -04:00
Trond Myklebust 2c61be0a94 NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
We always want to ensure that WRITE and COMMIT completes, whether or not
the user presses ^C. Do this by making the call asynchronous, and allowing
the user to do an interruptible wait for rpc_task completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:54:50 -04:00
Trond Myklebust a6305ddb08 NFS: Fix a race with the new commit code
This patch fixes a race which occurs due to the fact that we release the
PG_writeback flag while still holding the nfs_page locked.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:17 -04:00
Trond Myklebust b80c3cb628 NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
Since writeback_single_inode() checks the inode->i_state flags _before_ it
flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is
already set. Otherwise we risk not seeing a call to write_inode(), which
again means that we break fsync() et al...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:17 -04:00
Trond Myklebust 1544fa0f7a NFS: Fix the mode calculation in nfs_find_open_context
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:16 -04:00
Trond Myklebust 80e60639f1 NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-04-09 19:08:16 -04:00
Al Viro 04287f975e Have nfs ->d_revalidate() report errors properly
If nfs atomic open implementation ends up doing open request from
->d_revalidate() codepath and gets an error from server, return that error
to caller explicitly and don't bother with lookup_instantiate_filp() at all.
->d_revalidate() can return an error itself just fine...

See
	http://bugzilla.kernel.org/show_bug.cgi?id=15674
	http://marc.info/?l=linux-kernel&m=126988782722711&w=2

for original report.

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 16:10:16 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jeff Layton 556ae3bb32 NFS: don't try to decode GETATTR if DELEGRETURN returned error
The reply parsing code attempts to decode the GETATTR response even if
the DELEGRETURN portion of the compound returned an error. The GETATTR
response won't actually exist if that's the case and we're asking the
parser to read past the end of the response.

This bug is fairly benign. The parser catches this without reading past
the end of the response and decode_getfattr returns -EIO. Earlier
kernels however had decode_op_hdr using the READ_BUF macro, and this
bug would make this printk pop any time the client got an error from
a delegreturn:

kernel: decode_op_hdr: reply buffer overflowed in line XXXX

More recent kernels seem to have replaced this printk with a dprintk.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:34:13 -04:00
Trond Myklebust d812e57582 NFS: Prevent another deadlock in nfs_release_page()
We should not attempt to free the page if __GFP_FS is not set. Otherwise we
can deadlock as per

  http://bugzilla.kernel.org/show_bug.cgi?id=15578

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-19 13:55:17 -04:00
NeilBrown cfbc0683af NFS: ensure bdi_unregister is called on mount failure.
bdi_unregister is called by nfs_put_super which is only called by
generic_shutdown_super if ->s_root is not NULL.  So if we error out
in a circumstance where we called nfs_bdi_register (i.e. server !=
NULL) but have not set s_root, then we need to call bdi_unregister
explicitly in nfs_get_sb and various other *_get_sb() functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-15 15:37:45 -04:00
Trond Myklebust bb6fbc4548 NFS: Avoid a deadlock in nfs_release_page
J.R. Okajima reports the following deadlock:

INFO: task kswapd0:305 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0       D 0000000000000001     0   305      2 0x00000000
 ffff88001f21d4f0 0000000000000046 ffff88001fdea680 ffff88001f21c000
 ffff88001f21dfd8 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21dfd8
 ffff88001fdea040 0000000000014c00 0000000000000001 ffff88001fdea040
Call Trace:
 [<ffffffff8146155d>] io_schedule+0x4d/0x70
 [<ffffffff810d2be5>] sync_page+0x65/0xa0
 [<ffffffff81461b12>] __wait_on_bit_lock+0x52/0xb0
 [<ffffffff810d2b80>] ? sync_page+0x0/0xa0
 [<ffffffff810d2b64>] __lock_page+0x64/0x70
 [<ffffffff81070ce0>] ? wake_bit_function+0x0/0x40
 [<ffffffff810df1d4>] truncate_inode_pages_range+0x344/0x4a0
 [<ffffffff810df340>] truncate_inode_pages+0x10/0x20
 [<ffffffff8112cbfe>] generic_delete_inode+0x15e/0x190
 [<ffffffff8112cc8d>] generic_drop_inode+0x5d/0x80
 [<ffffffff8112bb88>] iput+0x78/0x80
 [<ffffffff811bc908>] nfs_dentry_iput+0x38/0x50
 [<ffffffff811285f4>] dentry_iput+0x84/0x110
 [<ffffffff811286ae>] d_kill+0x2e/0x60
 [<ffffffff8112912a>] dput+0x7a/0x170
 [<ffffffff8111e925>] path_put+0x15/0x40
 [<ffffffff811c3a44>] __put_nfs_open_context+0xa4/0xb0
 [<ffffffff811cb5d0>] ? nfs_free_request+0x0/0x50
 [<ffffffff811c3b0b>] put_nfs_open_context+0xb/0x10
 [<ffffffff811cb5f9>] nfs_free_request+0x29/0x50
 [<ffffffff81234b7e>] kref_put+0x8e/0xe0
 [<ffffffff811cb594>] nfs_release_request+0x14/0x20
 [<ffffffff811cf769>] nfs_find_and_lock_request+0x89/0xa0
 [<ffffffff811d1180>] nfs_wb_page+0x80/0x110
 [<ffffffff811c0770>] nfs_release_page+0x70/0x90
 [<ffffffff810d18ee>] try_to_release_page+0x5e/0x80
 [<ffffffff810e1178>] shrink_page_list+0x638/0x860
 [<ffffffff810e19de>] shrink_zone+0x63e/0xc40

We can fix this by making the call to put_nfs_open_context() happen when we
actually remove the write request from the inode (which is done by the
nfsiod thread in this case).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-11 09:19:35 -05:00
Trond Myklebust b4d2314bb8 NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
If the NFS_INO_REVAL_FORCED flag is set, that means that we don't yet have
an up to date attribute cache. Even if we hold a delegation, we must
put a GETATTR on the wire.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-10 15:21:44 -05:00
Steve Dickson 49697ee792 nfs4: Make the v4 callback service hidden
To avoid hangs in the svc_unregister(), on version 4 mounts
(and unmounts), when rpcbind is not running, make the nfs4 callback
program an 'hidden' service by setting the 'vs_hidden' flag in the
nfs4_callback_version structure.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:56:42 -05:00
Dan Carpenter 7dd08a570d nfs: fix unlikely memory leak
I'll admit that it's unlikely for the first allocation to fail and
the second one to succeed.  I won't be offended if you ignore this
patch.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:10:00 -05:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
Trond Myklebust 3fa04ecd72 Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34 2010-03-05 15:46:18 -05:00
Trond Myklebust 1cda707d52 NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:56 -05:00
Trond Myklebust 5cf95214cc NFS: Clean up nfs_sync_mapping
Remove the redundant call to filemap_write_and_wait().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:56 -05:00
Trond Myklebust 7f2f12d963 NFS: Simplify nfs_wb_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust acdc53b214 NFS: Replace __nfs_write_mapping with sync_inode()
Now that we have correct COMMIT semantics in writeback_single_inode, we can
reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.

With that done, we can eliminate nfs_write_mapping() altogether.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust c988950eb6 NFS: Simplify nfs_wb_page_cancel()
In all cases we should be able to just remove the request and call
cancel_dirty_page().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust 2928db1ffe NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
Since nfs_scan_list() doesn't wait for locked pages, we have a race in
which it is possible to end up with an inode that needs to send a COMMIT,
but which does not have the I_DIRTY_DATASYNC flag set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 5bad5abec4 NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 420e3646bb NFS: Reduce the number of unnecessary COMMIT calls
If the caller is doing a non-blocking flush, and there are still writebacks
pending on the wire, we can usually defer the COMMIT call until those
writes are done.

Also ensure that we honour the wbc->nonblocking flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust ff778d02bf NFS: Add a count of the number of unstable writes carried by an inode
In order to know when we should do opportunistic commits of the unstable
writes, when the VM is doing a background flush, we add a field to count
the number of unstable writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 8fc795f703 NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
The sole purpose of nfs_write_inode is to commit unstable writes, so
move it into fs/nfs/write.c, and make nfs_commit_inode static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:53 -05:00
Christoph Hellwig a9185b41a4 pass writeback_control to ->write_inode
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:52 -05:00
Christoph Hellwig 26821ed40b make sure data is on disk before calling ->write_inode
Similar to the fsync issue fixed a while ago in commit
2daea67e96 we need to write for data to
actually hit the disk before writing out the metadata to guarantee
data integrity for filesystems that modify the inode in the data I/O
completion path.  Currently XFS and NFS handle this manually, and AFS
has a write_inode method that does nothing but waiting for data, while
others are possibly missing out on this.

Fortunately this change has a lot less impact than the fsync change
as none of the write_inode methods starts data writeout of any form
by itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:10 -05:00
J. Bruce Fields 4ea41e2de5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs into for-2.6.34-incoming
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
2010-03-04 12:04:51 -05:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro f694869709 a couple of mntget+dget -> path_get in nfs4proc
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:56 -05:00
Al Viro 6eae7974d0 Switch alloc_nfs_open_context() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:56 -05:00
Linus Torvalds 0a135ba14d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: add __percpu sparse annotations to what's left
  percpu: add __percpu sparse annotations to fs
  percpu: add __percpu sparse annotations to core kernel subsystems
  local_t: Remove leftover local.h
  this_cpu: Remove pageset_notifier
  this_cpu: Page allocator conversion
  percpu, x86: Generic inc / dec percpu instructions
  local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
  module: Use this_cpu_xx to dynamically allocate counters
  local_t: Remove cpu_local_xx macros
  percpu: refactor the code in pcpu_[de]populate_chunk()
  percpu: remove compile warnings caused by __verify_pcpu_ptr()
  percpu: make accessors check for percpu pointer in sparse
  percpu: add __percpu for sparse.
  percpu: make access macros universal
  percpu: remove per_cpu__ prefix.
2010-03-03 07:34:18 -08:00
Andy Adamson 180b62a3d8 nfs41 fix NFS4ERR_CLID_INUSE for exchange id
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:45:33 -05:00
Trond Myklebust ebed9203b6 NFS: Fix an allocation-under-spinlock bug
sunrpc_cache_update() will always call detail->update() from inside the
detail->hash_lock, so it cannot allocate memory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-02 13:06:22 -05:00
Trond Myklebust 0f79fd6f5c NFSv4.1: Various fixes to the sequence flag error handling
Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time)
when we want to reset all state. This is mainly needed when the server
tells us that it is revoking our open or lock stateids.

Handle revoking of recallable state by expiring the delegations.

Handle callback path issues by expiring the delegations and then resetting
the session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:06:21 -05:00
Alexandros Batsakis 0851de0617 nfs4: renewd renew operations should take/put a client reference
renewd sends RENEW requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the client

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:00:03 -05:00
Alexandros Batsakis 7135840fc7 nfs41: renewd sequence operations should take/put client reference
renewd sends SEQUENCE requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the session/client

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 12:54:30 -05:00