Commit Graph

1866 Commits

Author SHA1 Message Date
Arnd Bergmann 6de5bd128d BKL: introduce CONFIG_BKL.
With all the patches we have queued in the BKL removal tree, only a
few dozen modules are left that actually rely on the BKL, and even
there are lots of low-hanging fruit. We need to decide what to do
about them, this patch illustrates one of the options:

Every user of the BKL is marked as 'depends on BKL' in Kconfig,
and the CONFIG_BKL becomes a user-visible option. If it gets
disabled, no BKL using module can be built any more and the BKL
code itself is compiled out.

The one exception is file locking, which is practically always
enabled and does a 'select BKL' instead. This effectively forces
CONFIG_BKL to be enabled until we have solved the fs/lockd
mess and can apply the patch that removes the BKL from fs/locks.c.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-21 15:44:13 +02:00
Trond Myklebust 6eaa61496f NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTID
If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
thread is busy reclaiming state, we do want to treat all state that wasn't
reclaimed before the STALE_CLIENTID as if a network partition occurred (see
the edge conditions described in RFC3530 and RFC5661).
What we do not want to do is to send an nfs4_reclaim_complete(), since we
haven't yet even started reclaiming state after the server rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:42:53 -04:00
Trond Myklebust ae1007d37e NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers
In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:42:33 -04:00
Trond Myklebust b0ed9dbc24 NFSv4: Fix open recovery
NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:41:55 -04:00
Trond Myklebust bc4866b6e0 NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation
In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:37:54 -04:00
Bryan Schumaker 955a857e06 NFS: new idmapper
This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names.  The old
idmapper was single threaded, which prevented more than one request from running
at a single time.  This means that a user would have to wait for an upcall to
finish before accessing a cached result.

The upcall result is stored on a keyring of type id_resolver.  See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-07 18:48:49 -04:00
Arnd Bergmann b89f432133 fs/locks.c: prepare for BKL removal
This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it and ceph might sleep, but we can flip the definition
to a private spinlock as soon as that's done.
All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.

Based on an earlier patch to use a spinlock from Matthew
Wilcox, who has attempted this a few times before, the
earliest patch from over 10 years ago turned it into
a semaphore, which ended up being slower than the BKL
and was subsequently reverted.

Someone should do some serious performance testing when
this becomes a spinlock, since this has caused problems
before. Using a spinlock should be at least as good
as the BKL in theory, but who knows...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
2010-10-05 11:02:04 +02:00
Pavel Emelyanov c653ce3f0a sunrpc: Add net to rpc_create_args
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:56 -04:00
Pavel Emelyanov fc5d00b04a sunrpc: Add net argument to svc_create_xprt
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:54 -04:00
Trond Myklebust aa510da5bf NFS: We must use list_for_each_entry_safe in nfs_access_cache_shrinker
We may end up removing the current entry from nfs_access_lru_list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-29 15:16:25 -04:00
Jeff Layton a00dd6c03d NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls (try #2)
WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
also include not waiting for COMMIT calls to complete.

WB_SYNC_NONE is also implied when wbc->nonblocking and
wbc->for_background are set, so we can replace those checks in
nfs_commit_unstable_pages with a check for WB_SYNC_NONE.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-29 14:42:30 -04:00
Trond Myklebust 5c78f58e2d NFS: Really fix put_nfs_open_context()
In nfs_open_revalidate(), if the open_context() call returns an inode that
is not the same as dentry->d_inode, then we will call
put_nfs_open_context() with a valid dentry->d_inode, but without the
context being part of the nfsi->open_files list.

In this case too, we want to just skip the list removal, but we do want to
call the ->close_context() callback in order to close the NFSv4 state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2010-09-29 14:41:36 -04:00
Benny Halevy dfb4f30983 NFSv4.1: keep seq_res.sr_slot as pointer rather than an index
Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
resulted in numerous bugs.  Keeping the current slot as a pointer
to the slot table is more straight forward and robust as it's
implicitly set up to NULL wherever the seq_res member is initialized
to zeroes.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-24 09:17:01 -04:00
Suresh Jayaraman 7c563cc9f3 nfs: show "local_lock" mount option in /proc/mounts
Display the status of 'local_lock' mount option in /proc/mounts.


Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-23 14:26:48 -04:00
Benny Halevy ef84303ebc NFS: handle inode==NULL in __put_nfs_open_context
inode may be NULL when put_nfs_open_context is called from nfs_atomic_lookup
before d_add_unique(dentry, inode)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-23 12:22:09 -04:00
Suresh Jayaraman 5eebde2322 nfs: introduce mount option '-olocal_lock' to make locks local
NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
locks. Due to this, some windows applications which seem to use both flock
(share mode lock mapped as flock by Samba) and fcntl locks sequentially on
the same file, can't lock as they falsely assume the file is already locked.
The problem was reported on a setup with windows clients accessing excel files
on a Samba exported share which is originally a NFS mount from a NetApp filer.

Older NFS clients (< 2.6.12) did not see this problem as flock locks were
considered local. To support legacy flock behavior, this patch adds a mount
option "-olocal_lock=" which can take the following values:

   'none'  		- Neither flock locks nor POSIX locks are local
   'flock' 		- flock locks are local
   'posix' 		- fcntl/POSIX locks are local
   'all'		- Both flock locks and POSIX locks are local

Testing:

   - This patch was tested by using -olocal_lock option with different values
     and the NLM calls were noted from the network packet captured.

     'none'  - NLM calls were seen during both flock() and fcntl(), flock lock
   	       was granted, fcntl was denied
     'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
   	       granted
     'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
     'all'   - no NLM calls were seen during both flock() and fcntl()

   - No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
     reboot recovery.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-23 08:55:58 -04:00
Chuck Lever b4687da7fc SUNRPC: Refactor logic to NUL-terminate strings in pages
Clean up: Introduce a helper to '\0'-terminate XDR strings
that are placed in a page in the page cache.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-21 16:55:48 -04:00
Chuck Lever d141d97437 NFS: Fix NFSv3 debugging messages in fs/nfs/nfs3proc.c
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-21 16:55:47 -04:00
Trond Myklebust 609588928f NFS: Convert nfsiod to use alloc_workqueue()
create_singlethread_workqueue() is deprecated.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-21 16:55:31 -04:00
Trond Myklebust d688e11007 NFSv4.1: Fix the slotid initialisation in nfs_async_rename()
This fixes an Oopsable condition that was introduced by commit
d3d4152a5d (nfs: make sillyrename an async
operation)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-21 16:52:40 -04:00
Trond Myklebust f7732d6573 NFS: Fix a use-after-free case in nfs_async_rename()
The call to nfs_async_rename_release() after rpc_run_task() is incorrect.
The rpc_run_task() is always guaranteed to call the ->rpc_release() method.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-21 16:52:40 -04:00
J. Bruce Fields c88739b373 Merge remote branch 'trond/bugfixes' into for-2.6.37
Without some client-side fixes, server testing is currently difficult.
2010-09-19 23:48:32 -04:00
Jeff Layton d3d4152a5d nfs: make sillyrename an async operation
A synchronous rename can be interrupted by a SIGKILL. If that happens
during a sillyrename operation, it's possible for the rename call to
be sent to the server, but the task exits before processing the
reply. If this happens, the sillyrenamed file won't get cleaned up
during nfs_dentry_iput and the server is left with a dangling .nfs* file
hanging around.

Fix this problem by turning sillyrename into an asynchronous operation
and have the task doing the sillyrename just wait on the reply. If the
task is killed before the sillyrename completes, it'll still proceed
to completion.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 17:31:57 -04:00
Jeff Layton 779c51795b nfs: move nfs_sillyrename to unlink.c
...since that's where most of the sillyrenaming code lives. A comment
block is added to the beginning as well to clarify how sillyrenaming
works. Also, make nfs_async_unlink static as nfs_sillyrename is the only
caller.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 17:31:30 -04:00
Jeff Layton e8582a8b96 nfs: standardize the rename response container
Right now, v3 and v4 have their own variants. Create a standard struct
that will work for v3 and v4. v2 doesn't get anything but a simple error
and so isn't affected by this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 17:31:06 -04:00
Jeff Layton 920769f031 nfs: standardize the rename args container
Each NFS version has its own version of the rename args container.
Standardize them on a common one that's identical to the one NFSv4
uses.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 17:30:25 -04:00
Trond Myklebust 2b484297e4 NFS: Add an 'open_context' element to struct nfs_rpc_ops
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:51 -04:00
Trond Myklebust c0204fd2b8 NFS: Clean up nfs4_proc_create()
Remove all remaining references to the struct nameidata from the low level
NFS layers. Again pass down a partially initialised struct nfs_open_context
when we want to do atomic open+create.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:51 -04:00
Trond Myklebust 535918f141 NFSv4: Further cleanups for nfs4_open_revalidate()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:51 -04:00
Trond Myklebust b8d4caddd8 NFSv4: Clean up nfs4_open_revalidate
Remove references to 'struct nameidata' from the low-level open_revalidate
code, and replace them with a struct nfs_open_context which will be
correctly initialised upon success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:51 -04:00
Trond Myklebust f46e0bd34e NFSv4: Further minor cleanups for nfs4_atomic_open()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:50 -04:00
Trond Myklebust cd9a1c0e5a NFSv4: Clean up nfs4_atomic_open
Start moving the 'struct nameidata' dependent code out of the lower level
NFS code in preparation for the removal of open intents.

Instead of the struct nameidata, we pass down a partially initialised
struct nfs_open_context that will be fully initialised by the atomic open
upon success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:56:50 -04:00
Chuck Lever 306a075362 NFS: Allow NFSROOT debugging messages to be enabled dynamically
As a convenience, introduce a kernel command line option to enable
NFSROOT debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Chuck Lever 8d23210378 NFS: Clean up nfsroot.c
Clean up: now that mount option parsing for nfsroot is handled
in fs/nfs/super.c, remove code in fs/nfs/nfsroot.c that is no
longer used.  This includes code that constructs the legacy
nfs_mount_data structure, and code that does a MNT call to the
server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Chuck Lever 56463e50d1 NFS: Use super.c for NFSROOT mount option parsing
Replace duplicate code in NFSROOT for mounting an NFS server on '/'
with logic that uses the existing mainline text-based logic in the NFS
client.

Add documenting comments where appropriate.

Note that this means NFSROOT mounts now use the same default settings
as v2/v3 mounts done via mount(2) from user space.

  vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default>

As before, however, no version/protocol negotiation with the server is
done.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Chuck Lever 60ac03685b NFS: Clean up NFSROOT command line parsing
Clean up: To reduce confusion, rename nfs_root_name as nfs_root_parms,
as this buffer contains more than just the name of the remote server.

Introduce documenting comments around nfs_root_setup().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Chuck Lever ed58b2917b NFS: Remove \t from mount debugging message
During boot, a random character is displayed instead of a tab.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Trond Myklebust 827e345702 SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
The NFSv4 client's callback server calls svc_gss_principal(), which
is defined in the auth_rpcgss.ko

The NFSv4 server has the same dependency, and in addition calls
svcauth_gss_flavor(), gss_mech_get_by_pseudoflavor(),
gss_pseudoflavor_to_service() and gss_mech_put() from the same module.

The module auth_rpcgss itself has no dependencies aside from sunrpc,
so we only need to select RPCSEC_GSS.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12 19:57:50 -04:00
Menyhart Zoltan fbf3fdd244 statfs() gives ESTALE error
Hi,

An NFS client executes a statfs("file", &buff) call.
"file" exists / existed, the client has read / written it,
but it has already closed it.

user_path(pathname, &path) looks up "file" successfully in the
directory-cache  and restarts the aging timer of the directory-entry.
Even if "file" has already been removed from the server, because the
lookupcache=positive option I use, keeps the entries valid for a while.

nfs_statfs() returns ESTALE if "file" has already been removed from the
server.

If the user application repeats the statfs("file", &buff) call, we
are stuck: "file" remains young forever in the directory-cache.

Signed-off-by: Zoltan Menyhart  <Zoltan.Menyhart@bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-09-12 19:55:26 -04:00
Trond Myklebust b20d37ca95 NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-09-12 19:55:26 -04:00
Fabio Olive Leite b1bde04c6d Remove incorrect do_vfs_lock message
The do_vfs_lock function on fs/nfs/file.c is only called if NLM is
not being used, via the -onolock mount option. Therefore it cannot
really be "out of sync with lock manager" when the local locking
function called returns an error, as there will be no corresponding
call to the NLM. For details, simply check the if/else on do_setlk
and do_unlk on fs/nfs/file.c.

Signed-Off-By: Fabio Olive Leite <fleite@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12 19:55:25 -04:00
NeilBrown c5b29f885a sunrpc: use seconds since boot in expiry cache
This protects us from confusion when the wallclock time changes.

We convert to and from wallclock when  setting or reading expiry
times.

Also use seconds since boot for last_clost time.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:21:20 -04:00
Linus Torvalds 763008c435 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix an Oops in the NFSv4 atomic open code
  NFS: Fix the selection of security flavours in Kconfig
  NFS: fix the return value of nfs_file_fsync()
  rpcrdma: Fix SQ size calculation when memreg is FRMR
  xprtrdma: Do not truncate iova_start values in frmr registrations.
  nfs: Remove redundant NULL check upon kfree()
  nfs: Add "lookupcache" to displayed mount options
  NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
  SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
2010-08-18 15:45:23 -07:00
Trond Myklebust 0a377cff94 NFS: Fix an Oops in the NFSv4 atomic open code
Adam Lackorzynski reports:

with 2.6.35.2 I'm getting this reproducible Oops:

[  110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[  110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638] PGD be89f067 PUD bf18f067 PMD 0
[  110.828638] Oops: 0000 [#1] SMP
[  110.828638] last sysfs file: /sys/class/net/lo/operstate
[  110.828638] CPU 2
[  110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[  110.828638]
[  110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[  110.828638] RIP: 0010:[<ffffffff811247b7>]  [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[  110.828638] RSP: 0000:ffff88003bf5b878  EFLAGS: 00010296
[  110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[  110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[  110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[  110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[  110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[  110.828638] FS:  0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[  110.828638] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[  110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[  110.828638] Stack:
[  110.828638]  0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] Call Trace:
[  110.828638]  [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[  110.828638]  [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[  110.828638]  [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[  110.828638]  [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[  110.828638]  [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[  110.828638]  [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[  110.828638]  [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[  110.828638]  [<ffffffff810ac521>] ? ifind+0x4e/0x90
[  110.828638]  [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[  110.828638]  [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[  110.828638]  [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[  110.828638]  [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[  110.828638]  [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[  110.828638]  [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[  110.828638]  [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[  110.828638]  [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[  110.828638]  [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[  110.828638]  [<ffffffff810a9b1b>] ? dput+0x37/0x152
[  110.828638]  [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[  110.828638]  [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[  110.828638]  [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[  110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[  110.828638] RIP  [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638]  RSP <ffff88003bf5b878>
[  110.828638] CR2: 0000000000000000
[  112.840396] ---[ end trace 95282e83fd77358f ]---

We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.

Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-18 09:25:42 -04:00
Trond Myklebust df486a2590 NFS: Fix the selection of security flavours in Kconfig
Randy Dunlap reports:

ERROR: "svc_gss_principal" [fs/nfs/nfs.ko] undefined!


because in fs/nfs/Kconfig, NFS_V4 selects RPCSEC_GSS_KRB5
and/or in fs/nfsd/Kconfig, NFSD_V4 selects RPCSEC_GSS_KRB5.

RPCSEC_GSS_KRB5 does 5 selects, but none of these is enforced/followed
by the fs/nfs[d]/Kconfig configs:

	select SUNRPC_GSS
	select CRYPTO
	select CRYPTO_MD5
	select CRYPTO_DES
	select CRYPTO_CBC

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-17 17:42:45 -04:00
Linus Torvalds 2897c684d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [NFS] Set CONFIG_KEYS when CONFIG_NFS_USE_KERNEL_DNS is set
  AFS: Implement an autocell mount capability [ver #2]
  DNS: If the DNS server returns an error, allow that to be cached [ver #2]
  NFS: Use kernel DNS resolver [ver #2]
  cifs: update README to include details about 'fsc' option
2010-08-13 10:37:30 -07:00
Steve French 3f43231230 [NFS] Set CONFIG_KEYS when CONFIG_NFS_USE_KERNEL_DNS is set
Previous patch relied on DNS_RESOLVER setting CONFIG_KEYS
but needs to be selected in NFS config when using the new
DNS resolver

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-12 18:16:45 +00:00
Bryan Schumaker c2e8139c9f NFS: Use kernel DNS resolver [ver #2]
Use the kernel DNS resolver to translate hostnames to IP addresses.  Create a
new config option to choose between the legacy DNS resolver and the new
resolver.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:28 +00:00
J. R. Okajima 0702099bd8 NFS: fix the return value of nfs_file_fsync()
By the commit af7fa16 2010-08-03 NFS: Fix up the fsync code
close(2) became returning the non-zero value even if it went well.
nfs_file_fsync() should return 0 when "status" is positive.

Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-11 13:10:16 -04:00
Davidlohr Bueso 5d7ca35a18 nfs: Remove redundant NULL check upon kfree()
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-11 12:42:15 -04:00
Rusty Russell 9bbb9e5a33 param: use ops in struct kernel_param, rather than get and set fns directly
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.

The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).

Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are).  This causes some
harmless warnings until we fix them (in following patches).

To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
2010-08-11 23:04:13 +09:30
Patrick J. LoPresti 9b00c64318 nfs: Add "lookupcache" to displayed mount options
Running "cat /proc/mounts" fails to display the "lookupcache" option.
This oversight cost me a bunch of wasted time recently.

The following simple patch fixes it.

CC: stable <stable@kernel.org>
Signed-off-by: Patrick LoPresti <lopresti@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-10 17:28:01 -04:00
Linus Torvalds 5f248c9c25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
  no need for list_for_each_entry_safe()/resetting with superblock list
  Fix sget() race with failing mount
  vfs: don't hold s_umount over close_bdev_exclusive() call
  sysv: do not mark superblock dirty on remount
  sysv: do not mark superblock dirty on mount
  btrfs: remove junk sb_dirt change
  BFS: clean up the superblock usage
  AFFS: wait for sb synchronization when needed
  AFFS: clean up dirty flag usage
  cifs: truncate fallout
  mbcache: fix shrinker function return value
  mbcache: Remove unused features
  add f_flags to struct statfs(64)
  pass a struct path to vfs_statfs
  update VFS documentation for method changes.
  All filesystems that need invalidate_inode_buffers() are doing that explicitly
  convert remaining ->clear_inode() to ->evict_inode()
  Make ->drop_inode() just return whether inode needs to be dropped
  fs/inode.c:clear_inode() is gone
  fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
  ...

Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-10 11:26:52 -07:00
Neil Brown f5a73672d1 NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
To obey NFS cache semantics, the client must verify the cached
attributes when a file is opened.  In most cases this is done by a call to
d_validate as one of the last steps in path_walk.

However for the root of a filesystem, d_validate is only ever called
on the mounted-on filesystem (except when the path ends '.' or '..').
So NFS has no chance to validate the attributes.

So, in nfs_opendir, we revalidate the attributes if the opened
directory is the mountpoint.  This may cause double-validation for "."
and ".." lookups, but that is better than missing regular /path/name
lookups completely.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-10 10:20:05 -04:00
Al Viro b57922d97f convert remaining ->clear_inode() to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:37 -04:00
Linus Torvalds 5df6b8e65a Merge branch 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits)
  NFS: NFSv4.1 is no longer a "developer only" feature
  NFS: NFS_V4 is no longer an EXPERIMENTAL feature
  NFS: Fix /proc/mount for legacy binary interface
  NFS: Fix the locking in nfs4_callback_getattr
  SUNRPC: Defer deleting the security context until gss_do_free_ctx()
  SUNRPC: prevent task_cleanup running on freed xprt
  SUNRPC: Reduce asynchronous RPC task stack usage
  SUNRPC: Move the bound cred to struct rpc_rqst
  SUNRPC: Clean up of rpc_bindcred()
  SUNRPC: Move remaining RPC client related task initialisation into clnt.c
  SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task
  SUNRPC: Make the credential cache hashtable size configurable
  SUNRPC: Store the hashtable size in struct rpc_cred_cache
  NFS: Ensure the AUTH_UNIX credcache is allocated dynamically
  NFS: Fix the NFS users of rpc_restart_call()
  SUNRPC: The function rpc_restart_call() should return success/failure
  NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
  NFSv4: Clean up the process of renewing the NFSv4 lease
  NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
  NFS: nfs_rename() should not have to flush out writebacks
  ...
2010-08-07 13:19:36 -07:00
Trond Myklebust 3dce9a5c3a NFS: NFSv4.1 is no longer a "developer only" feature
Mark it as 'experimental' instead, since in practice, NFSv4.1 should now be
relatively stable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:41 -04:00
Trond Myklebust b3edc2bc19 NFS: NFS_V4 is no longer an EXPERIMENTAL feature
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:40 -04:00
Bryan Schumaker d5eff1a341 NFS: Fix /proc/mount for legacy binary interface
Add a flag so we know if we mounted the NFS server using the legacy
binary interface.  If we used the legacy interface, then we should not
show the mountd options.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:39 -04:00
Trond Myklebust 761fe93cdf NFS: Fix the locking in nfs4_callback_getattr
The delegation is protected by RCU now, so we need to replace the
nfsi->rwsem protection with an rcu protected section.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:39 -04:00
Trond Myklebust a17c2153d2 SUNRPC: Move the bound cred to struct rpc_rqst
This will allow us to save the original generic cred in rpc_message, so
that if we migrate from one server to another, we can generate a new bound
cred without having to punt back to the NFS layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-04 08:54:09 -04:00
Trond Myklebust d05dd4e98f NFS: Fix the NFS users of rpc_restart_call()
Fix up those functions that depend on knowing whether or not
rpc_restart_call is successful or not.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:44 -04:00
Trond Myklebust a6f03393ec NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
There is no real reason to have RPC_ASSASSINATED() checks in the NFS code.
As far as it is concerned, this is just an RPC error...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:43 -04:00
Trond Myklebust 452e93523d NFSv4: Clean up the process of renewing the NFSv4 lease
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:42 -04:00
Trond Myklebust 14516c3a30 NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
In RFC5661, an NFS4ERR_DELAY error on a SEQUENCE operation has the special
meaning that the server is not finished processing the request. In this
case we want to just retry the request without touching the slot.

Also fix a bug whereby we would fail to update the sequence id if the
server returned any error other than NFS_OK/NFS4ERR_DELAY.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:42 -04:00
Trond Myklebust 0a8ebba943 NFS: nfs_rename() should not have to flush out writebacks
We don't really support nfs servers that invalidate the file handle after a
rename, so precautions such as flushing out dirty data before renaming the
file are superfluous.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:41 -04:00
Trond Myklebust 1b924e5f87 NFS: Clean up the callers of nfs_wb_all()
There is no need to flush out writes before calling nfs_wb_all().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:40 -04:00
Trond Myklebust af7fa16506 NFS: Fix up the fsync code
Christoph points out that the VFS will always flush out data before calling
nfs_fsync(), so we can dispense with a full call to nfs_wb_all(), and
replace that with a simpler call to nfs_commit_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:07 -04:00
Eric Paris 9cfcac810e vfs: re-introduce MAY_CHDIR
Currently MAY_ACCESS means that filesystems must check the permissions
right then and not rely on cached results or the results of future
operations on the object.  This can be because of a call to sys_access() or
because of a call to chdir() which needs to check search without relying on
any future operations inside that dir.  I plan to use MAY_ACCESS for other
purposes in the security system, so I split the MAY_ACCESS and the
MAY_CHDIR cases.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by:  Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2010-08-02 15:35:06 +10:00
Trond Myklebust 77a63f3d1e NFS: Fix a typo in include/linux/nfs_fs.h
nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.

Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.

Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-01 15:10:01 -07:00
Trond Myklebust cfb506e1d3 NFS: Ensure that writepage respects the nonblock flag
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 15:38:56 -04:00
Trond Myklebust b608b283a9 NFS: kswapd must not block in nfs_release_page
See https://bugzilla.kernel.org/show_bug.cgi?id=16056

If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-07-30 15:38:42 -04:00
Dan Carpenter 674b222292 nfs: include space for the NUL in root path
In root_nfs_name() it does the following:

        if (strlen(buf) + strlen(cp) > NFS_MAXPATHLEN) {
                printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n");
                return -1;
        }
        sprintf(nfs_export_path, buf, cp);

In the original code if (strlen(buf) + strlen(cp) == NFS_MAXPATHLEN)
then the sprintf() would lead to an overflow.  Generally the rest of the
code assumes that the path can have NFS_MAXPATHLEN (1024) characters and
a NUL terminator so the fix is to add space to the nfs_export_path[]
buffer.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 15:33:39 -04:00
Trond Myklebust 77041ed9b4 NFSv4: Ensure the lockowners are labelled using the fl_owner and/or fl_pid
flock locks want to be labelled using the process pid, while posix locks
want to be labelled using the fl_owner.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:46:10 -04:00
Trond Myklebust d3c7b7ccc1 NFSv4: Add support for the RELEASE_LOCKOWNER operation
This is needed by NFSv4.0 servers in order to keep the number of locking
stateids at a manageable level.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:46:10 -04:00
Trond Myklebust daccbded7f NFSv4: Clean up for lockowner XDR encoding
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:46:09 -04:00
Trond Myklebust f11ac8db5d NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.
This patch fixes bugzilla entry 14501:
  https://bugzilla.kernel.org/show_bug.cgi?id=14501

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:41:56 -04:00
Dave Chinner 7f8275d0d6 mm: add context argument to shrinker callback
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-19 14:56:17 +10:00
Trond Myklebust 1f0e890dba NFSv4: Clean up struct nfs4_state_owner
The 'so_delegations' list appears to be unused.

Also eliminate so_client. If we already have so_server, we can get to the
nfs_client structure.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-24 15:11:43 -04:00
Trond Myklebust 1055d76d91 NFSv4.1: There is no need to init the session more than once...
Set up a flag to ensure that is indeed the case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:03 -04:00
Trond Myklebust fe74ba3a8d NFSv41: Cleanup for nfs4_alloc_session.
There is no reason to change the nfs_client state every time we allocate a
new session. Move that line into nfs4_init_client_minor_version.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:03 -04:00
Trond Myklebust d77d76ffb6 NFSv41: Clean up exclusive create
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:03 -04:00
Trond Myklebust a443234535 NFSv41: Deprecate nfs_client->cl_minorversion
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust e047a10c12 NFSv41: Fix nfs_async_inode_return_delegation() ugliness
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust c48f4f3541 NFSv41: Convert the various reboot recovery ops etc to minor version ops
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust 97dc135947 NFSv41: Clean up the NFSv4.1 minor version specific operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust a2118c33aa NFSv41: Don't store session state in the nfs_client->cl_state
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust df8964554a NFSv41: Further cleanup for nfs4_sequence_done
Instead of testing if the nfs_client has a session, we should be testing if
the struct nfs4_sequence_res was set up with one.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust 035168ab39 NFSv4.1: Make nfs4_setup_sequence take a nfs_server argument
In anticipation of the day when we have per-filesystem sessions, and also
in order to allow the session to change in the event of a filesystem
migration event.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:02 -04:00
Trond Myklebust 71ac6da994 NFSv4.1: Merge the nfs41_proc_async_sequence() and nfs4_proc_sequence()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:01 -04:00
Trond Myklebust aa5190d0ed NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:01 -04:00
Trond Myklebust d185a334c7 NFSv4.1: Simplify nfs41_sequence_done()
Nobody uses the rpc_status parameter.

It is not obvious why we need the struct nfs_client argument either, when
we already have that information in the session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:01 -04:00
Trond Myklebust 2a6e26cdb8 NFSv4.1: Clean up nfs4_setup_sequence
Firstly, there is little point in first zeroing out the entire struct
nfs4_sequence_res, and then initialising all fields save one. Just
initialise the last field to zero...

Secondly, nfs41_setup_sequence() has only 2 possible return values: 0, or
-EAGAIN, so there is no 'terminate rpc task' case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:01 -04:00
Trond Myklebust d5f8d3fe72 NFSv41: Fix a memory leak in nfs41_proc_async_sequence()
If the call to rpc_call_async() fails, then the arguments will not be
freed, since there will be no call to nfs41_sequence_call_done

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:24:01 -04:00
Trond Myklebust d3f6baaa34 NFSv4: Fix an embarassing typo in encode_attrs()
Apparently, we have never been able to set the atime correctly from the
NFSv4 client.

Reported-by: 小倉一夫 <ka-ogura@bd6.so-net.ne.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-06-22 13:22:54 -04:00
Trond Myklebust 0be8189f2c NFSv4: Ensure that /proc/self/mountinfo displays the minor version number
Currently, we do not display the minor version mount parameter in the
/proc mount info.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-06-22 13:22:53 -04:00
Trond Myklebust 44950b67a6 NFSv4.1: Ensure that we initialise the session when following a referral
Put the code that is common to both the referral and ordinary mount cases
into a common helper routine.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-06-22 13:22:53 -04:00
Andy Adamson f799bdb355 nfs4 use mandatory attribute file type in nfs4_get_root
S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits,
so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-06-22 13:17:43 -04:00
Christoph Hellwig 7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Trond Myklebust 0522f6aded NFS: Fix another nfs_wb_page() deadlock
J.R. Okajima reports that the call to sync_inode() in nfs_wb_page() can
deadlock with other writeback flush calls. It boils down to the fact
that we cannot ever call writeback_single_inode() while holding a page
lock (even if we do set nr_to_write to zero) since another process may
already be waiting in the call to do_writepages(), and so will deny us
the I_SYNC lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26 08:43:53 -04:00
Trond Myklebust c5efa5fc91 NFS: Ensure that we mark the inode as dirty if we exit early from commit
If we exit from nfs_commit_inode() without ensuring that the COMMIT rpc
call has been completed, we must re-mark the inode as dirty. Otherwise,
future calls to sync_inode() with the WB_SYNC_ALL flag set will fail to
ensure that the data is on the disk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26 08:43:52 -04:00
Trond Myklebust 59844a9bd7 NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker
Commit 9c7e7e2337 (NFS: Don't call iput() in
nfs_access_cache_shrinker) unintentionally removed the spin unlock for the
inode->i_lock.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26 08:43:51 -04:00
Alexey Dobriyan 4be929be34 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Trond Myklebust 9c7e7e2337 NFS: Don't call iput() in nfs_access_cache_shrinker
iput() can potentially attempt to allocate memory, so we should avoid
calling it in a memory shrinker. Instead, rely on the fact that iput() will
call nfs_access_zap_cache().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust 1a81bb8a1f NFS: Clean up nfs_access_zap_cache()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:35 -04:00
Trond Myklebust 61d5eb2985 NFS: Don't run nfs_access_cache_shrinker() when the mask is GFP_NOFS
Both iput() and put_rpccred() might allocate memory under certain
circumstances, so make sure that we don't recurse and deadlock...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:35 -04:00
Trond Myklebust 93870d76fe NFS: Read requests can use GFP_KERNEL.
There is no danger of deadlock should the allocation trigger page
writeback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust 18eb884282 NFS: Clean up nfs_create_request()
There is no point in looping if we're out of memory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust 8535b2be51 NFSv4: Don't use GFP_KERNEL allocations in state recovery
We do not want to have the state recovery thread kick off and wait for a
memory reclaim, since that may deadlock when the writebacks end up
waiting for the state recovery thread to complete.

The safe thing is therefore to use GFP_NOFS in all open, close,
delegation return, lock, etc. operations that may be called by the
state recovery thread.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever 9bc4e3ca46 NFS: Calldata for nfs4_renew_done()
I'm about to change task->tk_start from a jiffies value to a ktime_t
value in order to make RPC RTT reporting more precise.

Recently (commit dc96aef9) nfs4_renew_done() started to reference
task->tk_start so that a jiffies value no longer had to be passed
from nfs4_proc_async_renew().  This allowed the calldata to point to
an nfs_client instead.

Changing task->tk_start to a ktime_t value makes it effectively
useless for renew timestamps, so we need to restore the pre-dc96aef9
logic that provided a jiffies "start" timestamp to nfs4_renew_done().

Both an nfs_client pointer and a timestamp need to be passed to
nfs4_renew_done(), so create a new nfs_renewdata structure that
contains both, resembling what is already done for delegreturn,
lock, and unlock.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:32 -04:00
Chuck Lever dfe52c0419 NFS: Squelch compiler warning in nfs_add_server_stats()
Clean up:

fs/nfs/iostat.h: In function ‘nfs_add_server_stats’:
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions

Commit fce22848 replaced the open-coded per-cpu logic in several
functions in fs/nfs/iostat.h with a single invocation of
this_cpu_ptr().  This macro assumes its second argument is signed,
not unsigned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:31 -04:00
Chuck Lever a6d5ff64ba NFS: Clean up fscache_uniq mount option
Clean up: fscache_uniq takes a string, so it should be included
with the other string mount option definitions, by convention.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:31 -04:00
Chuck Lever 0f15c53d5b NFS: Squelch compiler warning
Seen with -Wextra:

/home/cel/linux/fs/nfs/fscache.c: In function ‘__nfs_readpages_from_fscache’:
/home/cel/linux/fs/nfs/fscache.c:479: warning: comparison between signed and unsigned integer expressions

The comparison implicitly converts "int" to "unsigned", making it
safe.  But there's no need for the implicit type conversions here, and
the dfprintk() already uses a "%u" formatter for "npages."  Better to
reduce confusion.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:31 -04:00
Trond Myklebust bb8b27e504 NFSv4: Clean up the NFSv4 setclientid operation
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:30 -04:00
Trond Myklebust d7cf8dd012 NFSv4: Allow attribute caching with 'noac' mounts if client holds a delegation
If the server has given us a delegation on a file, we _know_ that we can
cache the attribute information even when the user has specified 'noac'.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:30 -04:00
Trond Myklebust fd86dfd263 NFSv4: Fix up the documentation for nfs_do_refmount
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:29 -04:00
Trond Myklebust 1b4c6065b9 NFS: Replace nfsroot on-stack filehandle
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:28 -04:00
Trond Myklebust b157b06ca2 NFS: Cleanup file handle allocations in fs/nfs/super.c
Use the new helper functions instead of open coding.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:28 -04:00
Trond Myklebust ce587e07ba NFS: Prevent the mount code from looping forever on broken exports
Keep a global count of how many referrals that the current task has
traversed on a path lookup. Return ELOOP if the count exceeds
MAX_NESTED_LINKS.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:28 -04:00
Trond Myklebust 6e94d62993 NFS: Reduce stack footprint of nfs3_proc_getacl() and nfs3_proc_setacl()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:28 -04:00
Trond Myklebust ca7e9a0df2 NFS: Reduce stack footprint of nfs_statfs()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:27 -04:00
Trond Myklebust 987f8dfc98 NFS: Reduce stack footprint of nfs_setattr()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:27 -04:00
Trond Myklebust 0ab64e0e14 NFS: Reduce stack footprint of nfs4_proc_create()
Move the O_EXCL open handling into _nfs4_do_open() where it belongs. Doing
so also allows us to reuse the struct fattr from the opendata.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:27 -04:00
Trond Myklebust 23a306120f NFS: Reduce the stack footprint of nfs_proc_symlink()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:27 -04:00
Trond Myklebust eb872f0c8e NFS: Reduce the stack footprint of nfs_proc_create
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:26 -04:00
Trond Myklebust 39967ddf19 NFS: Reduce the stack footprint of nfs_rmdir
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:26 -04:00
Trond Myklebust d346890bea NFS: Reduce stack footprint of nfs_proc_remove()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:26 -04:00
Trond Myklebust 3b14d6542d NFS: Reduce stack footprint of nfs3_proc_readlink()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:25 -04:00
Trond Myklebust 136f2627c9 NFS: Reduce the stack footprint of nfs_link()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:25 -04:00
Trond Myklebust aa49b4cf7d NFS: Reduce stack footprint of nfs_readdir()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:25 -04:00
Trond Myklebust 011fff7239 NFS: Reduce stack footprint of nfs3_proc_rename() and nfs4_proc_rename()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:25 -04:00
Trond Myklebust a3cba2aad9 NFS: Reduce stack footprint of nfs_revalidate_inode()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:24 -04:00
Trond Myklebust c407d41a16 NFSv4: Reduce stack footprint of nfs4_proc_access() and nfs3_proc_access()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:24 -04:00
Trond Myklebust 4f727296d2 NFSv4: Reduce the stack footprint of nfs4_remote_referral_get_sb
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:23 -04:00
Trond Myklebust 8bac9db9cf NFSv4: Reduce stack footprint of nfs4_get_root()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:23 -04:00
Trond Myklebust 04ffdbe2e6 NFS: Reduce the stack footprint of nfs_follow_remote_path()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:23 -04:00
Trond Myklebust e1fb4d05d5 NFS: Reduce the stack footprint of nfs_lookup
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:23 -04:00
Trond Myklebust 364d015e52 NFSv4: Reduce the stack footprint of try_location()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:22 -04:00
Trond Myklebust fbca779a8d NFS: Reduce the stack footprint of nfs_create_server
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:22 -04:00
Trond Myklebust a4d7f16806 NFS: Reduce the stack footprint of nfs_follow_mountpoint()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:22 -04:00
Trond Myklebust 815409d22d NFSv4: Eliminate nfs4_path_walk()
All we really want is the ability to retrieve the root file handle. We no
longer need the ability to walk down the path, since that is now done in
nfs_follow_remote_path().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:21 -04:00
Trond Myklebust 2d36bfde85 NFS: Add helper functions for allocating filehandles and fattr structs
NFS Filehandles and struct fattr are really too large to be allocated on
the stack. This patch adds in a couple of helper functions to allocate them
dynamically instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:21 -04:00
David Howells 17d2c0a0c4 NFS: Fix RCU issues in the NFSv4 delegation code
Fix a number of RCU issues in the NFSv4 delegation code.

 (1) delegation->cred doesn't need to be RCU protected as it's essentially an
     invariant refcounted structure.

     By the time we get to nfs_free_delegation(), the delegation is being
     released, so no one else should be attempting to use the saved
     credentials, and they can be cleared.

     However, since the list of delegations could still be under traversal at
     this point by such as nfs_client_return_marked_delegations(), the cred
     should be released in nfs_do_free_delegation() rather than in
     nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
     insufficient as that doesn't stop the cred from being destroyed, and nor
     does calling put_rpccred() after call_rcu(), given that the latter is
     asynchronous.

 (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
     rcu_derefence_protected() because they can only be called if
     nfs_client::cl_lock is held, and that guards against anyone changing
     nfsi->delegation under it.  Furthermore, the barrier imposed by
     rcu_dereference() is superfluous, given that the spin_lock() is also a
     barrier.

 (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
     struct so that it can issue lockdep advice based on clp->cl_lock for (2).

 (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
     should use rcu_access_pointer() outside the spinlocked region as they
     merely examine the pointer and don't follow it, thus rendering unnecessary
     the need to impose a partial ordering over the one item of interest.

     These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
 #0:  (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
 #1:  (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a

stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
 [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
 [<ffffffff810c2d92>] clear_inode+0x9e/0xf8
 [<ffffffff810c3028>] dispose_list+0x67/0x10e
 [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
 [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
 [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
 [<ffffffff810b25bc>] deactivate_super+0x68/0x80
 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
 [<ffffffff810c681b>] release_mounts+0x9a/0xb0
 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
 [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
 [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
 [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
 [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8
 [<ffffffff810c810b>] do_mount+0x782/0x7f9
 [<ffffffff810c8205>] sys_mount+0x83/0xbe
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Also on:

fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
 [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
 [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
 ...

And:

fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
 [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
 [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
 [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
 ...

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-01 12:37:18 -04:00
Trond Myklebust 8f649c3762 NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-01 12:36:18 -04:00
Linus Torvalds 27fb8d7b1f Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
  nfs: fix some issues in nfs41_proc_reclaim_complete()
  NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
  NFS: Fix an unstable write data integrity race
  nfs: testing for null instead of ERR_PTR()
  NFS: rsize and wsize settings ignored on v4 mounts
  NFSv4: Don't attempt an atomic open if the file is a mountpoint
  SUNRPC: Fix a bug in rpcauth_prune_expired
2010-04-29 10:23:44 -07:00
Al Viro d9e80b7de9 nfs d_revalidate() is too trigger-happy with d_drop()
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.

Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-28 20:40:03 -07:00
Xiaotian Feng 9699eda6bc nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for
export_path in nfs4_validate_text_mount_data, so we need to free it then.
This is addressed in following kmemleak report:

unreferenced object 0xffff88016bf48a50 (size 16):
  comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s)
  hex dump (first 16 bytes):
    2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5  /opt/work.kkkkk.
  backtrace:
    [<ffffffff814b34f9>] kmemleak_alloc+0x60/0xa7
    [<ffffffff81102c76>] kmemleak_alloc_recursive.clone.5+0x1b/0x1d
    [<ffffffff811046b3>] __kmalloc_track_caller+0x18f/0x1b7
    [<ffffffff810e1b08>] kstrndup+0x37/0x54
    [<ffffffffa0336971>] nfs_parse_devname+0x152/0x204 [nfs]
    [<ffffffffa0336af3>] nfs4_validate_text_mount_data+0xd0/0xdc [nfs]
    [<ffffffffa0338deb>] nfs_get_sb+0x325/0x736 [nfs]
    [<ffffffff81113671>] vfs_kern_mount+0xbd/0x17c
    [<ffffffff81113798>] do_kern_mount+0x4d/0xed
    [<ffffffff81129a87>] do_mount+0x787/0x7fe
    [<ffffffff81129b86>] sys_mount+0x88/0xc2
    [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-28 13:46:28 -04:00
Dan Carpenter acf82b85a7 nfs: fix some issues in nfs41_proc_reclaim_complete()
The original code passed an ERR_PTR() to rpc_put_task() and instead of
returning zero on success it returned -ENOMEM.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-28 13:45:12 -04:00
Trond Myklebust ba8b06e67e NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush. According to the trace in
     https://bugzilla.novell.com/show_bug.cgi?id=599628
the problem appears to be due to nfs_wb_page() not waiting for the
PG_writeback flag to clear.

There is a ditto problem in nfs_wb_page_cancel()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-27 18:33:54 -04:00
Trond Myklebust 71d0a6112a NFS: Fix an unstable write data integrity race
Commit 2c61be0a94 (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.

This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:57 -04:00
Dan Carpenter cdd29ecfcb nfs: testing for null instead of ERR_PTR()
nfs_path() returns an ERR_PTR(), it doesn't return null.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:56 -04:00
Chuck Lever 356e76b855 NFS: rsize and wsize settings ignored on v4 mounts
NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both.  This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.

I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:56 -04:00
Trond Myklebust 1f063d2cdf NFSv4: Don't attempt an atomic open if the file is a mountpoint
Fix https://bugzilla.kernel.org/show_bug.cgi?id=15789

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:55 -04:00
Trond Myklebust 0df5dd4aae NFSv4: fix delegated locking
Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6d (NFSv4:
Don't allow posix locking against servers that don't support it).

According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.

The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-04-12 07:55:15 -04:00
Trond Myklebust 2c61be0a94 NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
We always want to ensure that WRITE and COMMIT completes, whether or not
the user presses ^C. Do this by making the call asynchronous, and allowing
the user to do an interruptible wait for rpc_task completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:54:50 -04:00
Trond Myklebust a6305ddb08 NFS: Fix a race with the new commit code
This patch fixes a race which occurs due to the fact that we release the
PG_writeback flag while still holding the nfs_page locked.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:17 -04:00
Trond Myklebust b80c3cb628 NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
Since writeback_single_inode() checks the inode->i_state flags _before_ it
flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is
already set. Otherwise we risk not seeing a call to write_inode(), which
again means that we break fsync() et al...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:17 -04:00
Trond Myklebust 1544fa0f7a NFS: Fix the mode calculation in nfs_find_open_context
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09 19:08:16 -04:00
Trond Myklebust 80e60639f1 NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-04-09 19:08:16 -04:00
Al Viro 04287f975e Have nfs ->d_revalidate() report errors properly
If nfs atomic open implementation ends up doing open request from
->d_revalidate() codepath and gets an error from server, return that error
to caller explicitly and don't bother with lookup_instantiate_filp() at all.
->d_revalidate() can return an error itself just fine...

See
	http://bugzilla.kernel.org/show_bug.cgi?id=15674
	http://marc.info/?l=linux-kernel&m=126988782722711&w=2

for original report.

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-07 16:10:16 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jeff Layton 556ae3bb32 NFS: don't try to decode GETATTR if DELEGRETURN returned error
The reply parsing code attempts to decode the GETATTR response even if
the DELEGRETURN portion of the compound returned an error. The GETATTR
response won't actually exist if that's the case and we're asking the
parser to read past the end of the response.

This bug is fairly benign. The parser catches this without reading past
the end of the response and decode_getfattr returns -EIO. Earlier
kernels however had decode_op_hdr using the READ_BUF macro, and this
bug would make this printk pop any time the client got an error from
a delegreturn:

kernel: decode_op_hdr: reply buffer overflowed in line XXXX

More recent kernels seem to have replaced this printk with a dprintk.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:34:13 -04:00
Trond Myklebust d812e57582 NFS: Prevent another deadlock in nfs_release_page()
We should not attempt to free the page if __GFP_FS is not set. Otherwise we
can deadlock as per

  http://bugzilla.kernel.org/show_bug.cgi?id=15578

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-19 13:55:17 -04:00
NeilBrown cfbc0683af NFS: ensure bdi_unregister is called on mount failure.
bdi_unregister is called by nfs_put_super which is only called by
generic_shutdown_super if ->s_root is not NULL.  So if we error out
in a circumstance where we called nfs_bdi_register (i.e. server !=
NULL) but have not set s_root, then we need to call bdi_unregister
explicitly in nfs_get_sb and various other *_get_sb() functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-15 15:37:45 -04:00
Trond Myklebust bb6fbc4548 NFS: Avoid a deadlock in nfs_release_page
J.R. Okajima reports the following deadlock:

INFO: task kswapd0:305 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0       D 0000000000000001     0   305      2 0x00000000
 ffff88001f21d4f0 0000000000000046 ffff88001fdea680 ffff88001f21c000
 ffff88001f21dfd8 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21dfd8
 ffff88001fdea040 0000000000014c00 0000000000000001 ffff88001fdea040
Call Trace:
 [<ffffffff8146155d>] io_schedule+0x4d/0x70
 [<ffffffff810d2be5>] sync_page+0x65/0xa0
 [<ffffffff81461b12>] __wait_on_bit_lock+0x52/0xb0
 [<ffffffff810d2b80>] ? sync_page+0x0/0xa0
 [<ffffffff810d2b64>] __lock_page+0x64/0x70
 [<ffffffff81070ce0>] ? wake_bit_function+0x0/0x40
 [<ffffffff810df1d4>] truncate_inode_pages_range+0x344/0x4a0
 [<ffffffff810df340>] truncate_inode_pages+0x10/0x20
 [<ffffffff8112cbfe>] generic_delete_inode+0x15e/0x190
 [<ffffffff8112cc8d>] generic_drop_inode+0x5d/0x80
 [<ffffffff8112bb88>] iput+0x78/0x80
 [<ffffffff811bc908>] nfs_dentry_iput+0x38/0x50
 [<ffffffff811285f4>] dentry_iput+0x84/0x110
 [<ffffffff811286ae>] d_kill+0x2e/0x60
 [<ffffffff8112912a>] dput+0x7a/0x170
 [<ffffffff8111e925>] path_put+0x15/0x40
 [<ffffffff811c3a44>] __put_nfs_open_context+0xa4/0xb0
 [<ffffffff811cb5d0>] ? nfs_free_request+0x0/0x50
 [<ffffffff811c3b0b>] put_nfs_open_context+0xb/0x10
 [<ffffffff811cb5f9>] nfs_free_request+0x29/0x50
 [<ffffffff81234b7e>] kref_put+0x8e/0xe0
 [<ffffffff811cb594>] nfs_release_request+0x14/0x20
 [<ffffffff811cf769>] nfs_find_and_lock_request+0x89/0xa0
 [<ffffffff811d1180>] nfs_wb_page+0x80/0x110
 [<ffffffff811c0770>] nfs_release_page+0x70/0x90
 [<ffffffff810d18ee>] try_to_release_page+0x5e/0x80
 [<ffffffff810e1178>] shrink_page_list+0x638/0x860
 [<ffffffff810e19de>] shrink_zone+0x63e/0xc40

We can fix this by making the call to put_nfs_open_context() happen when we
actually remove the write request from the inode (which is done by the
nfsiod thread in this case).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-11 09:19:35 -05:00
Trond Myklebust b4d2314bb8 NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
If the NFS_INO_REVAL_FORCED flag is set, that means that we don't yet have
an up to date attribute cache. Even if we hold a delegation, we must
put a GETATTR on the wire.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-10 15:21:44 -05:00
Steve Dickson 49697ee792 nfs4: Make the v4 callback service hidden
To avoid hangs in the svc_unregister(), on version 4 mounts
(and unmounts), when rpcbind is not running, make the nfs4 callback
program an 'hidden' service by setting the 'vs_hidden' flag in the
nfs4_callback_version structure.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:56:42 -05:00
Dan Carpenter 7dd08a570d nfs: fix unlikely memory leak
I'll admit that it's unlikely for the first allocation to fail and
the second one to succeed.  I won't be offended if you ignore this
patch.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:10:00 -05:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
Trond Myklebust 3fa04ecd72 Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34 2010-03-05 15:46:18 -05:00
Trond Myklebust 1cda707d52 NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:56 -05:00
Trond Myklebust 5cf95214cc NFS: Clean up nfs_sync_mapping
Remove the redundant call to filemap_write_and_wait().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:56 -05:00
Trond Myklebust 7f2f12d963 NFS: Simplify nfs_wb_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust acdc53b214 NFS: Replace __nfs_write_mapping with sync_inode()
Now that we have correct COMMIT semantics in writeback_single_inode, we can
reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.

With that done, we can eliminate nfs_write_mapping() altogether.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust c988950eb6 NFS: Simplify nfs_wb_page_cancel()
In all cases we should be able to just remove the request and call
cancel_dirty_page().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:55 -05:00
Trond Myklebust 2928db1ffe NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
Since nfs_scan_list() doesn't wait for locked pages, we have a race in
which it is possible to end up with an inode that needs to send a COMMIT,
but which does not have the I_DIRTY_DATASYNC flag set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 5bad5abec4 NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 420e3646bb NFS: Reduce the number of unnecessary COMMIT calls
If the caller is doing a non-blocking flush, and there are still writebacks
pending on the wire, we can usually defer the COMMIT call until those
writes are done.

Also ensure that we honour the wbc->nonblocking flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust ff778d02bf NFS: Add a count of the number of unstable writes carried by an inode
In order to know when we should do opportunistic commits of the unstable
writes, when the VM is doing a background flush, we add a field to count
the number of unstable writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:54 -05:00
Trond Myklebust 8fc795f703 NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
The sole purpose of nfs_write_inode is to commit unstable writes, so
move it into fs/nfs/write.c, and make nfs_commit_inode static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05 15:44:53 -05:00
Christoph Hellwig a9185b41a4 pass writeback_control to ->write_inode
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:52 -05:00
Christoph Hellwig 26821ed40b make sure data is on disk before calling ->write_inode
Similar to the fsync issue fixed a while ago in commit
2daea67e96 we need to write for data to
actually hit the disk before writing out the metadata to guarantee
data integrity for filesystems that modify the inode in the data I/O
completion path.  Currently XFS and NFS handle this manually, and AFS
has a write_inode method that does nothing but waiting for data, while
others are possibly missing out on this.

Fortunately this change has a lot less impact than the fsync change
as none of the write_inode methods starts data writeout of any form
by itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:10 -05:00
J. Bruce Fields 4ea41e2de5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs into for-2.6.34-incoming
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
2010-03-04 12:04:51 -05:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro f694869709 a couple of mntget+dget -> path_get in nfs4proc
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:56 -05:00
Al Viro 6eae7974d0 Switch alloc_nfs_open_context() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:56 -05:00
Linus Torvalds 0a135ba14d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: add __percpu sparse annotations to what's left
  percpu: add __percpu sparse annotations to fs
  percpu: add __percpu sparse annotations to core kernel subsystems
  local_t: Remove leftover local.h
  this_cpu: Remove pageset_notifier
  this_cpu: Page allocator conversion
  percpu, x86: Generic inc / dec percpu instructions
  local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
  module: Use this_cpu_xx to dynamically allocate counters
  local_t: Remove cpu_local_xx macros
  percpu: refactor the code in pcpu_[de]populate_chunk()
  percpu: remove compile warnings caused by __verify_pcpu_ptr()
  percpu: make accessors check for percpu pointer in sparse
  percpu: add __percpu for sparse.
  percpu: make access macros universal
  percpu: remove per_cpu__ prefix.
2010-03-03 07:34:18 -08:00
Andy Adamson 180b62a3d8 nfs41 fix NFS4ERR_CLID_INUSE for exchange id
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:45:33 -05:00
Trond Myklebust ebed9203b6 NFS: Fix an allocation-under-spinlock bug
sunrpc_cache_update() will always call detail->update() from inside the
detail->hash_lock, so it cannot allocate memory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-02 13:06:22 -05:00
Trond Myklebust 0f79fd6f5c NFSv4.1: Various fixes to the sequence flag error handling
Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time)
when we want to reset all state. This is mainly needed when the server
tells us that it is revoking our open or lock stateids.

Handle revoking of recallable state by expiring the delegations.

Handle callback path issues by expiring the delegations and then resetting
the session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:06:21 -05:00
Alexandros Batsakis 0851de0617 nfs4: renewd renew operations should take/put a client reference
renewd sends RENEW requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the client

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 13:00:03 -05:00
Alexandros Batsakis 7135840fc7 nfs41: renewd sequence operations should take/put client reference
renewd sends SEQUENCE requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the session/client

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 12:54:30 -05:00
Alexandros Batsakis dc96aef96a nfs: prevent backlogging of renewd requests
If the renewd send queue gets backlogged (e.g., if the server goes down),
we will keep filling the queue with periodic RENEW/SEQUENCE requests.

This patch schedules a new renewd request if and only if the previous one
returns (either success or failure)

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[Trond.Myklebust@netapp.com: moved nfs4_schedule_state_renewal() into
separate nfs4_renew_release() and nfs41_sequence_release() callbacks
to ensure correct behaviour on call setup failure]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 12:44:07 -05:00
Alexandros Batsakis 888ef2e3f8 nfs: kill renewd before clearing client minor version
renewd should be synchronously killed before we destroy the session in
nfs4_clear_minor_version

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[Trond.Myklebust@netapp.com: clean up to remove 'unused function
warning when !CONFIG_NFS_V4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02 12:16:12 -05:00
Christian Kujau 4912002fff Remove EXPERIMENTAL from NFS_FSCACHE
There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE
(and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel.
However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen.

As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that
much any more, I propose the following patch to fs/nfs/Kconfig.  I'd do the
same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I
feel it would not be appropriate for me to remove the flag.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5
[1] http://lkml.org/lkml/2010/1/23/145

Signed-off-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-26 17:22:35 -08:00
Tejun Heo 003cb608a2 percpu: add __percpu sparse annotations to fs
Add __percpu sparse annotations to fs.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-02-17 11:17:38 +09:00
Chuck Lever 65d269538a NFS: Too many GETATTR and ACCESS calls after direct I/O
The cached read and write paths initialize fattr->time_start in their
setup procedures.  The value of fattr->time_start is propagated to
read_cache_jiffies by nfs_update_inode().  Subsequent calls to
nfs_attribute_timeout() will then use a good time stamp when
computing the attribute cache timeout, and squelch unneeded GETATTR
calls.

Since the direct I/O paths erroneously leave the inode's
fattr->time_start field set to zero, read_cache_jiffies for that inode
is set to zero after any direct read or write operation.  This
triggers an otw GETATTR or ACCESS call to update the file's attribute
and access caches properly, even when the NFS READ or WRITE replies
have usable post-op attributes.

Make sure the direct read and write setup code performs the same fattr
initialization as the cached I/O paths to prevent unnecessary GETATTR
calls.

This was likely introduced by commit 0e574af1 in 2.6.15, which appears
to add new nfs_fattr_init() call sites in the cached read and write
paths, but not in the equivalent places in fs/nfs/direct.c.  A
subsequent commit in the same series, 33801147, introduces the
fattr->time_start field.

Interestingly, the direct write reschedule path already has a call to
nfs_fattr_init() in the right place.

Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-15 19:53:43 -08:00
Chuck Lever f895c53f8a NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
For NFSv2 and v3:

O_DIRECT writes are always synchronous, and aren't cached, so nothing
should be flushed when closing an NFS O_DIRECT file descriptor.  Thus
there are no write errors to report on close(2).

In addition, there's no cached data to verify on the next open(2),
so we don't need clean GETATTR results at close time to compare with.

Thus, there's no need for the nfs_revalidate_inode() call when closing
an NFS O_DIRECT file.  This reduces the number of synchronous
on-the-wire requests for a simple open-write-close of an NFS O_DIRECT
file by roughly 20%.

For NFSv4:

Call nfs4_do_close() with wait set to zero when closing an NFS
O_DIRECT file.  The CLOSE will go on the wire, but the application
won't wait for it to complete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:05 -05:00
Chuck Lever 7e381172cf NFS: Improve NFS iostat byte count accuracy for writes
The bytes counted by the performance counters for NFS writes should
reflect write and sync errors.  If the write(2) system call reports
an error, the bytes should not be counted.  And, if the write is
short, the actual number of bytes that was written should be counted,
not the number of bytes that was requested.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:04 -05:00
Chuck Lever aa2f1ef10e NFS: Account for NFS bytes read via the splice API
Bytes read via the splice API should be accounted for in the NFS
performance statistics.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:03 -05:00
Chuck Lever 4184dcf2db NFS: Fix byte accounting for generic NFS reads
Currently, the NFS I/O counters count the number of bytes requested
by applications, rather than the number of bytes actually read by the
system calls.

The number of bytes requested for reads is actually not that useful,
because the value is usually a buffer size for reads.  That is, that
requested number is usually a maximum, and frequently doesn't reflect
the actual number of bytes read.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:03 -05:00
Chuck Lever c2459dc462 NFS: Proper accounting for NFS VFS calls
Nit: The VFSOPEN and VFSFLUSH counters are function call counters.
Count every call to these routines.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:02 -05:00
Andy Adamson 9733f0d928 nfs41: cleanup callback code to use __be32 type
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:01 -05:00
Andy Adamson 41f54a5548 nfs41: clear NFS4CLNT_RECALL_SLOT bit on session reset
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:00 -05:00
Andy Adamson bae0ac0ee1 nfs41: fix nfs4_callback_recallslot
Return NFS4_OK if target high slotid equals enforced high slotid.
Fix nfs_client reference leak.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:31:00 -05:00
Andy Adamson 104aeba484 nfs41: resize slot table in reset
When session is reset, client can renegotiate slot table size.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:59 -05:00
Andy Adamson b9efa1b27e nfs41: implement cb_recall_slot
Drain the fore channel and reset the max_slots to the new value.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:59 -05:00
Andy Adamson 4911096f1a nfs41: back channel drc minimal implementation
For now the back channel ca_maxresponsesize_cached is 0 and there is no
backchannel DRC. Return NFS4ERR_REP_TOO_BIG_TO_CACHE when a cb_sequence
cachethis is true.  When it is false, return NFS4ERR_RETRY_UNCACHED_REP as the
next operation error.

Remember the replay error accross compound operation processing.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:58 -05:00
Andy Adamson b2f28bd783 nfs41: prepare for back channel drc
Make all cb_sequence arguments available to verify_seqid which will make
replay decisions.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:58 -05:00
Andy Adamson e95e60daee nfs41: remove uneeded checks in callback processing
All callback operations have arguments to decode and require processing.
The preprocess_nfs4X_op functions catch unsupported or illegal ops so
decode_args and process_op pointers are always non NULL.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:57 -05:00
Andy Adamson b92b301900 nfs41: directly encode back channel error
Skip all other processing when error is encountered.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Andy Adamson 31d2b4356b nfs41: fix wrong error on callback header xdr overflow
Set NFS4ERR_RESOURCE as CB_COMPOUND status and do not return an op on
decode_op_hdr or encode_op_hdr buffer overflow.

NFS4ERR_RESOURCE is correct for v4.0. Will fix the return for v4.1 along with
all the other NFS4ERR_RESOURCE errors in a later patch.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Mike Sager 72ce2b3c06 nfs41: Process callback's referring call list
If a CB_SEQUENCE referring call triple matches a slot table entry, the
client is still waiting for a response to the original request.  In this
case, return NFS4ERR_DELAY as the response to the callback.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager a7989c3e47 nfs41: Check slot table for referring calls
Traverse a list of referring calls and look for a session/slot/seq number
match.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager 8e0d46e138 nfs41: Adjust max cache response size value
For the CREATE_SESSION attribute ca_maxresponsesize_cached, calculate
the value based on the rpc reply header size plus the maximum nfs compound
reply size.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:54 -05:00
Jeff Layton 97cefcc6d0 nfs: handle NFSv2 -EKEYEXPIRED returns from RPC layer appropriately
Add a wrapper around rpc_call_sync that handles -EKEYEXPIRED errors from
the RPC layer as it would an -EJUKEBOX error if NFSv2 had such a thing.
Also, add a handler for that error for async calls that makes it
resubmit the RPC on -EKEYEXPIRED.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:52 -05:00
Jeff Layton b68d69b8c6 nfs: handle NFSv3 -EKEYEXPIRED errors as we would -EJUKEBOX
We're using -EKEYEXPIRED to indicate that a krb5 credcache contains an
expired ticket and that we should have the NFS layer retry the RPC call
instead of returning an error back to the caller. Handle this as we
would an -EJUKEBOX error return.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:51 -05:00
Jeff Layton 2c6434888c nfs4: handle -EKEYEXPIRED errors from RPC layer
If a KRB5 TGT ticket expires, we don't want to return an error
immediatel. If someone has a long running job and just forgets to run
"kinit" in time then this will make it fail.

Instead, we want to treat this situation as we would NFS4ERR_DELAY and
retry the upcall after delaying a bit with an exponential backoff.

This patch just makes any place that would handle NFS4ERR_DELAY also
handle -EKEYEXPIRED the same way. In the future, we may want to be more
sophisticated however and handle hard vs. soft mounts differently, or
specify some upper limit on how long we'll wait for a new TGT to be
acquired.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:50 -05:00
Trond Myklebust fdcb45777a NFS: Fix the mapping of the NFSERR_SERVERFAULT error
It was recently pointed out that the NFSERR_SERVERFAULT error, which is
designed to inform the user of a serious internal error on the server, was
being mapped to an error value that is internal to the kernel.

This patch maps it to the error EREMOTEIO, which is exported to userland
through errno.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-09 14:29:29 -05:00
Trond Myklebust 7549ad5f9b NFS: Remove a redundant check for PageFsCache in nfs_migrate_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
2010-02-09 14:29:21 -05:00
Trond Myklebust 2c1740098c NFS: Fix a bug in nfs_fscache_release_page()
Not having an fscache cookie is perfectly valid if the user didn't mount
with the fscache option.

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=15234

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: stable@kernel.org
2010-02-09 14:29:10 -05:00
Trond Myklebust 9b4b351346 NFS: Don't clobber the attribute type in nfs_update_inode()
If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
not set the S_IFMT part of inode->i_mode.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-03 08:27:35 -05:00
Trond Myklebust 387c149b54 NFS: Fix a umount race
Ensure that we unregister the bdi before kill_anon_super() calls
ida_remove() on our device name.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-03 08:27:35 -05:00
Trond Myklebust 9f557cd807 NFS: Fix an Oops when truncating a file
The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).

The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).


Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-03 08:27:22 -05:00
Chuck Lever d6783b2b6c SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
Clean up:  Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers:  the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set.  I'm about to add another case
that does just the same.

If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26 17:56:43 -05:00
Trond Myklebust a2c0b9e291 NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
Even if the server is crazy, we should be able to mark the stateid as being
bad, to ensure it gets recovered.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:47 -05:00
Trond Myklebust 03391693a9 NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
Currently, nfs4_handle_exception() will call it twice if called with an
error of -NFS4ERR_STALE_CLIENTID, -NFS4ERR_STALE_STATEID or
-NFS4ERR_EXPIRED.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:38 -05:00
Trond Myklebust 8e469ebd6d NFSv4: Don't allow posix locking against servers that don't support it
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:30 -05:00
Trond Myklebust 2bee72a6aa NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
In most cases, we just want to mark the lock_stateid sequence id as being
uninitialised.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:21 -05:00
David Howells b0706ca415 NFS: Avoid warnings when CONFIG_NFS_V4=n
Avoid the following warnings when CONFIG_NFS_V4=n:

	fs/nfs/sysctl.c:19: warning: unused variable `nfs_set_port_max'
	fs/nfs/sysctl.c:18: warning: unused variable `nfs_set_port_min'

by making those variables contingent on NFSv4 being configured.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:11 -05:00
H Hartley Sweeten 0aa05887af NFS: Make nfs_commitdata_release static
The symbol nfs_commitdata_release is only used locally
in this file. Make it static to prevent the following sparse warning:

warning: symbol 'nfs_commitdata_release' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:42:03 -05:00
Trond Myklebust 82be934a59 NFS: Try to commit unstable writes in nfs_release_page()
If someone calls nfs_release_page(), we presumably already know that the
page is clean, however it may be holding an unstable write.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:41:53 -05:00
Trond Myklebust c9edda7140 NFS: Fix a reference leak in nfs_wb_cancel_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26 15:41:34 -05:00
OGAWA Hirofumi 56335936de nfs: fix oops in nfs_rename()
Recent change is missing to update "rehash".  With that change, it will
become the cause of adding dentry to hash twice.

This explains the reason of Oops (dereference the freed dentry in
__d_lookup()) on my machine.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Marvin <marvin24@gmx.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-01-06 18:48:26 -05:00
Linus Torvalds a2770d86b3 Revert "fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)"
This reverts commit e9496ff46a. Quoth Al:

 "it's dependent on a lot of other stuff not currently in mainline
  and badly broken with current fs/namespace.c.  Sorry, badly
  out-of-order cherry-pick from old queue.

  PS: there's a large pending series reworking the refcounting and
  lifetime rules for vfsmounts that will, among other things, allow to
  rip a subtree away _without_ dissolving connections in it, to be
  garbage-collected when all active references are gone.  It's
  considerably saner wrt "is the subtree busy" logics, but it's nowhere
  near being ready for merge at the moment; this changeset is one of the
  things becoming possible with that sucker, but it certainly shouldn't
  have been picked during this cycle.  My apologies..."

Noticed-by: Eric Paris <eparis@redhat.com>
Requested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 12:51:05 -08:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Linus Torvalds e4bdda1bc3 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFSv4: Fix a regression in the NFSv4 state manager
  NFSv4: Release the sequence id before restarting a CLOSE rpc call
  nfs41: fix session fore channel negotiation
  nfs41: do not zero seqid portion of stateid on close
  nfs: run state manager in privileged mode
  nfs: make recovery state manager operations privileged
  nfs: enforce FIFO ordering of operations trying to acquire slot
  rpc: add a new priority in RPC task
  nfs: remove rpc_task argument from nfs4_find_slot
  rpc: add rpc_queue_empty function
  nfs: change nfs4_do_setlk params to identify recovery type
  nfs: do not do a LOOKUP after open
  nfs: minor cleanup of session draining
2009-12-16 10:47:44 -08:00
Linus Torvalds 37c24b37fb Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits)
  nfsd: remove pointless paths in file headers
  nfsd: move most of nfsfh.h to fs/nfsd
  nfsd: remove unused field rq_reffh
  nfsd: enable V4ROOT exports
  nfsd: make V4ROOT exports read-only
  nfsd: restrict filehandles accepted in V4ROOT case
  nfsd: allow exports of symlinks
  nfsd: filter readdir results in V4ROOT case
  nfsd: filter lookup results in V4ROOT case
  nfsd4: don't continue "under" mounts in V4ROOT case
  nfsd: introduce export flag for v4 pseudoroot
  nfsd: let "insecure" flag vary by pseudoflavor
  nfsd: new interface to advertise export features
  nfsd: Move private headers to source directory
  vfs: nfsctl.c un-used nfsd #includes
  lockd: Remove un-used nfsd headers #includes
  s390: remove un-used nfsd #includes
  sparc: remove un-used nfsd #includes
  parsic: remove un-used nfsd #includes
  compat.c: Remove dependence on nfsd private headers
  ...
2009-12-16 10:43:34 -08:00
Al Viro e9496ff46a fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:44 -05:00
Trond Myklebust 380454126f NFSv4: Fix a regression in the NFSv4 state manager
Commit 5601a00d67 (nfs: run state manager
in privileged mode) introduces a regression in the NFSv4 code when
compiled with CONFIG_NFS_V4_1. The calls to nfs4_end_drain_session()
from the main loop in nfs4_state_manager() Oops due to the lack of an
NFSv4.1 session when running NFSv4.0.

The fix is to move those two calls back into nfs41_init_clientid() and
nfs4_reset_session().

The calls to nfs4_end_drain_session() that remain inside
nfs4_state_manager() are safe, since the NFSv4.0 code will never set the
NFS4CLNT_SESSION_DRAINING bit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 17:36:57 -05:00
Trond Myklebust 72211dbe72 NFSv4: Release the sequence id before restarting a CLOSE rpc call
If the CLOSE or OPEN_DOWNGRADE call triggers a state recovery, and has
to be resent, then we must release the seqid. Otherwise the open
recovery will wait for the close to finish, which causes a deadlock.

This is mainly a NFSv4.1 problem, although it can theoretically happen
with NFSv4.0 too, in a OPEN_DOWNGRADE situation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 14:47:36 -05:00
Andy Adamson 68bf05efb7 nfs41: fix session fore channel negotiation
If the rsize or wsize is not set on the mount command, negotiate the highest
supported rsize and wsize in session creation.

Fixes a bug where the client negotiated nfs41_maxwrite_overhead as
ca_maxrequestsize and nfs41_maxread_overhead as ca_maxresponsesize resulting
in NFS4ERR_REQ_TOO_BIG errors on writes.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:42 -05:00
Andy Adamson a5523b84c4 nfs41: do not zero seqid portion of stateid on close
Remove code left over from a previous minorversion draft.
which specified zeroing seqid portions of stateid's.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:36 -05:00
Alexandros Batsakis 5601a00d67 nfs: run state manager in privileged mode
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:23 -05:00
Alexandros Batsakis b257957e50 nfs: make recovery state manager operations privileged
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:58:07 -05:00
Alexandros Batsakis 689cf5c15b nfs: enforce FIFO ordering of operations trying to acquire slot
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:55:18 -05:00
Alexandros Batsakis 40ead580ae nfs: remove rpc_task argument from nfs4_find_slot
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:51:22 -05:00
Alexandros Batsakis afe6c27ccb nfs: change nfs4_do_setlk params to identify recovery type
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:32 -05:00
Alexandros Batsakis 0f7e720694 nfs: do not do a LOOKUP after open
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:21 -05:00
Alexandros Batsakis 3bfb0fc591 nfs: minor cleanup of session draining
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-15 13:50:01 -05:00