POSIX requires that ctime and mtime, as reported by the stat(2) call,
reflect the activity of the most recent write(2). To that end, nfs_getattr()
flushes pending dirty writes to a file before doing a GETATTR to allow the
NFS server to set the file's size, ctime, and mtime properly.
However, nfs_getattr() can be starved when a constant stream of application
writes to a file prevents nfs_wb_nocommit() from completing. This usually
results in hangs of programs doing a stat against an NFS file that is being
written. "ls -l" is a common victim of this behavior.
To prevent starvation, hold the file's i_mutex in nfs_getattr() to
freeze applications writes temporarily so the client can more quickly obtain
clean values for a file's size, mtime, and ctime.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_wcc_update_inode() function omits logic to convert the type of
the NFS on-the-wire value of a file's size (__u64) to the type of file
size value stored in struct inode (loff_t, which is signed).
Everywhere else in the NFS client I checked already correctly converts the
file size type.
This effects only very large files.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The two arguments of rpc_depopulate() that pass in inode numbers should use
the same type as inode->i_ino: unsigned long.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The return type of xdr_skb_read_actor functions is size_t. This fixes a
nit I unwittingly overlooked in commit dd456471.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Minor: Replace an empty if statement with a debugging dprintk.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Nit: rl_nchunks is an unsigned integer, so pass it into
rpcrdma_count_chunks() via an unsigned integer argument. This eliminates
a harmless mixed sign comparison in rpcrdma_count_chunks()
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Keep the type of the buffer position the same during iovec conversion to
reduce the likelihood of unexpected results from comparisons and length
computations.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we
need to initialise task->tk_action, with rpc_call_start().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the common code for setting up the nfs_write_data and nfs_read_data
structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.
Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The sunrpc client exports are not meant to be part of any official kernel
API: they can change at the drop of a hat. Mark them as internal functions
using EXPORT_SYMBOL_GPL.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add kerneldoc comments for the rpc_pipefs.c functions that are exported.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we've disconnected from the server, rather than the other way round,
then it makes little sense to wait 3 seconds before reconnecting.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the calls to xprt_disconnect() over to xprt_force_disconnect() in
order to enable the transport layer to manage the state of the
XPRT_CONNECTED flag.
Ditto in xs_tcp_read_fraghdr().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The transport layer should do that itself whenever appropriate.
Note that the RDMA transport already assumes that it needs to call
xprt_disconnect in xprt_rdma_close().
For TCP sockets, we want to call xprt_disconnect() only after the
connection has been closed by both ends.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
By using shutdown() rather than close() we allow the RPC client to wait
for the TCP close handshake to complete before we start trying to reconnect
using the same port.
We use shutdown(SHUT_WR) only instead of shutting down both directions,
however we wait until the server has closed the connection on its side.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently the TCP rebinding logic assumes that if we're not using a
reserved port, then we don't need to reconnect on the same port if a
disconnection event occurs. This breaks most RPC duplicate reply cache
implementations.
Also take into account the fact that xprt_min_resvport and
xprt_max_resvport may change while we're reconnecting, since the user may
change them at any time via the sysctls. Ensure that we check the port
boundaries every time we loop in xs_bind4/xs_bind6. Also ensure that if the
boundaries change, we only scan the ports a maximum of 2 times.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reduce the time spent locking the rpc_sequence structure by queuing the
nfs_seqid only when we are ready to take the lock (when calling
nfs_wait_on_sequence).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current model locks the page twice for no good reason. Optimise by
inlining the parts of nfs_write_begin()/nfs_write_end() that we care about.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server returns an ENOENT error, we still need to do a d_delete() in
order to ensure that the dentry is deleted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In nfs_do_call_unlink() we check that we haven't raced, and that lookup()
hasn't created an aliased dentry to our sillydeleted dentry. If somebody
has deleted the file on the server and the lookup() resulted in a negative
dentry, then ignore...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that readdir revalidates its data cache after blocking on
sillyrename.
Also fix a typo in nfs_do_call_unlink(): swap the ^= for an |=. The result
is the same, since we've already checked that the flag is unset, but it
makes the code more readable.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The new e1000e driver is apparently not yet suitable for general use, so
mark it experimental, and re-instate all the PCI-Express device IDs in
the old and stable e1000 driver so that people (namely me) can continue
to use a driver that actually works.
Auke & co have been appraised of the situation.
Cc: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A bug report on nfsd that states that since it was switched to use
splice instead of sendfile, the atime was no longer being updated
on the input file. do_generic_mapping_read() does this when accessing
the file, make splice do it for the direct splice handler.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>