The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().
But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I want to use the name for a struct that actually does represent a
single callback.
(Actually, I've never been sure it helps to a separate struct for the
callback information. Some day maybe those fields could just be dumped
into struct nfs4_client. I don't know.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We don't really need a synchronous rpc, and moving to an asynchronous
rpc allows us to do without this extra kthread.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The code is a little simpler, and it should be easier to avoid races, if
we just do all rpc client creation/destruction from nfsd or laundromat
threads and do only the rpc calls themselves asynchronously. The rpc
creation doesn't involve any significant waiting (it doesn't call the
client, for example), so there's no reason not to do this.
Also don't bother destroying the client on failure of the rpc null
probe. We may want to retry the probe later anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We tried to do something overly complicated with the callback rpc
timeouts here. And they're wrong--the result is that by the time a
single callback times out, it's already too late to tell the client
(using the cb_path_down return to RENEW) that the callback is down.
Use a much shorter, simpler timeout.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This setclientid_confirm case should allow the client to change
callbacks, but it currently has a dummy implementation that just turns
off callbacks completely. That dummy implementation isn't completely
correct either, though:
- There's no need to remove any client recovery directory in
this case.
- New clientid confirm verifiers should be generated (and
returned) in setclientid; there's no need to generate a new
one here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Stephen Rothwell said:
"Today's linux-next build (powerpc ppc64_defconfig) produced this new
warning:
fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID':
fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast
Caused by commit 78155ed75f ("nfsd4:
distinguish expired from stale stateids")."
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
ext4 supports a real NFSv4 change attribute, which is bumped whenever
the ctime would be updated, including times when two updates arrive
within a jiffy of each other. (Note that although ext4 has space for
nanosecond-precision ctime, the real resolution is lower: it actually
uses jiffies as the time-source.) This ensures clients will invalidate
their caches when they need to.
There is some fear that keeping the i_version up-to-date could have
performance drawbacks, so for now it's turned on only by a mount option.
We hope to do something better eventually.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Theodore Tso <tytso@mit.edu>
We don't need comments to tell us these macros are ugly. And we're long
past trying to share any of this code with the BSD's.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: For consistency, handle output buffer size checking in a
other nfsctl functions the same way it's done for write_versions().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While it's not likely today that there are enough NFS versions to
overflow the output buffer in write_versions(), we should be more
careful about detecting the end of the buffer.
The number of NFS versions will only increase as NFSv4 minor versions
are added.
Note that this API doesn't behave the same as portlist. Here we
attempt to display as many versions as will fit in the buffer, and do
not provide any indication that an overflow would have occurred. I
don't have any good rationale for that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While it's not likely a pathname will be longer than
SIMPLE_TRANSACTION_SIZE, we should be more careful about just
plopping it into the output buffer without bounds checking.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Adjust the synopsis of svc_sock_names() to pass in the size of the
output buffer. Add a documenting comment.
This is a cosmetic change for now. A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Adjust the synopsis of svc_addsock() to pass in the size of the output
buffer. Add a documenting comment.
This is a cosmetic change for now. A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_xprt_names() function can overflow its buffer if it's so near
the end of the passed in buffer that the "name too long" string still
doesn't fit. Of course, it could never tell if it was near the end
of the passed in buffer, since its only caller passes in zero as the
buffer length.
Let's make this API a little safer.
Change svc_xprt_names() so it *always* checks for a buffer overflow,
and change its only caller to pass in the correct buffer length.
If svc_xprt_names() does overflow its buffer, it now fails with an
ENAMETOOLONG errno, instead of trying to write a message at the end
of the buffer. I don't like this much, but I can't figure out a clean
way that's always safe to return some of the names, *and* an
indication that the buffer was not long enough.
The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
"File name too long".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up.
A couple of years ago, a series of commits, finishing with commit
5680c446, swapped the order of the lockd_up() and svc_addsock() calls
in __write_ports(). At that time lockd_up() needed to know the
transport protocol of the passed-in socket to start a listener on the
same transport protocol.
These days, lockd_up() doesn't take a protocol argument; it always
starts both a UDP and TCP listener. It's now more straightforward to
try the lockd_up() first, then do a lockd_down() if the svc_addsock()
fails.
Careful review of this code shows that the svc_sock_names() call is
used only to close the just-opened socket in case lockd_up() fails.
So it is no longer needed if lockd_up() is done first.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Refactor transport name listing out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
User space must call listen(3) on SOCK_STREAM sockets passed into
/proc/fs/nfsd/portlist, otherwise that listener is ignored. Document
this.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Refactor the socket creation logic out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Refactor the socket closing logic out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Refactor transport addition out of __write_ports() to make
it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Refactor transport removal out of __write_ports() to make it
easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If we encode the time of client creation into the stateid instead of the
time of server boot, then we can determine whether that stateid is from
a previous instance of the a server, or from a client that has expired,
and return an appropriate error to the client.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For every lock request lockd creates a new file_lock object
in nlmsvc_setgrantargs() by copying the passed in file_lock with
locks_copy_lock(). A filesystem can attach it's own lock_operations
vector to the file_lock. It has to be cleaned up at the end of the
file_lock's life. However, lockd doesn't do it today, yet it
asserts in nlmclnt_release_lockargs() that the per-filesystem
state is clean.
This patch fixes it by exporting locks_release_private() and adding
it to nlmsvc_freegrantargs(), to be symmetrical to creating a
file_lock in nlmsvc_setgrantargs().
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix btrfs fallocate oops and deadlock
Btrfs: use the right node in reada_for_balance
Btrfs: fix oops on page->mapping->host during writepage
Btrfs: add a priority queue to the async thread helpers
Btrfs: use WRITE_SYNC for synchronous writes
This fixes the following BUG:
# mount -o size=MM -t hugetlbfs none /huge
hugetlbfs: Bad value 'MM' for mount option 'size=MM'
------------[ cut here ]------------
kernel BUG at fs/super.c:996!
Due to
BUG_ON(!mnt->mnt_sb);
in vfs_kern_mount().
Also, remove unused #include <linux/quotaops.h>
Cc: William Irwin <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Btrfs fallocate was incorrectly starting a transaction with a lock held
on the extent_io tree for the file, which could deadlock. Strictly
speaking it was using join_transaction which would be safe, but it is better
to move the transaction outside of the lock.
When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
being called on an unlocked buffer. This was triggering an assertion and
oops because the lock is supposed to be held.
The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
been run. btrfs_del_item takes care of dirtying things, so the solution is a
to skip the btrfs_mark_buffer_dirty call in this case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Fix page_mkwrite() return code
GFS2: Clear dirty bit at end of inode glock sync
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
reiserfs: fix j_last_flush_trans_id type
fs: Mark get_filesystem_list() as __init function.
kill vfs_stat_fd / vfs_lstat_fd
Separate out common fstatat code into vfs_fstatat
ecryptfs: use memdup_user()
ncpfs: use memdup_user()
xfs: use memdup_user()
sysfs: use memdup_user()
btrfs: use memdup_user()
xattr: use memdup_user()
autofs4: use memchr() in invalid_string()
Documentation/filesystems: remove out of date reference to BKL being held
Fix i_mutex vs. readdir handling in nfsd
fs/compat_ioctl: fix build when !BLOCK
Fix autofs_expire()
No need for crossing to mountpoint in audit_tag_tree()
Safer nfsd_cross_mnt()
Touch all affected namespaces on propagation of mount
Fix AUTOFS_DEV_IOCTL_REQUESTER_CMD
Commit ae46141ff0 (NFSv3: Fix posix ACL code)
introduces a bug in the calculation of the XDR header iovec. In the case
where we are inlining the acls, we need to adjust the length of the iovec
req->rq_svec, in addition to adjusting the total buffer length.
Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"int get_filesystem_list(char * buf)" is called by only
"static void __init get_fs_names(char *page)".
We can mark get_filesystem_list() as "__init".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat. Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove open-coded memdup_user().
Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.
This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.
While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about
__cold__.
Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
LKML-Reference: <8036.1237474444@jrobl>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In file included from fs/compat_ioctl.c:61:
include/linux/loop.h:59: error: field 'lo_bio_list' has incomplete type
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mnt should remain the same for all iterations through the list;
as it is, if we have a busy mount, mnt follows into it and isn't
restored for the next iteration.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AFAICS, we have a subtle bug there: if we have crossed mountpoint
*and* it got mount --move'd away, we'll be holding only one
reference to fs containing dentry - exp->ex_path.mnt. IOW, we
ought to dput() before exp_put().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We shouldn't just touch the namespace of current process
Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>