We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY. This simplifies things somewhat and
helps meet a coming 4.2 requirement.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.
The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.
This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.
The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.
I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
destroy_session uses the session and client without continuously holding
any reference or locks.
Put the whole thing under the state lock for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I'm not sure what the check for clientid expiry was meant to do here.
The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This should never happen.
(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.
The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
Only compile-tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_new_stid() is no longer used since commit 3abdb60712 ("nfsd4:
simplify idr allocation"). Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:
- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.
- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."
Some notes about conflicts, since there were *two* non-data semantic
conflicts here:
- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.
- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.
There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.
* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, NFSd is ready to operate in network namespace based containers.
So let's drop check for "init_net" and make it able to fly.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Use uid_eq(uid, GLOBAL_ROOT_UID) instead of !uid.
Use gid_eq(gid, GLOBAL_ROOT_GID) instead of !gid.
Use uid_eq(uid, INVALID_UID) instead of uid == -1
Use gid_eq(uid, INVALID_GID) instead of gid == -1
Use uid = GLOBAL_ROOT_UID instead of uid = 0;
Use gid = GLOBAL_ROOT_GID instead of gid = 0;
Use !uid_eq(uid1, uid2) instead of uid1 != uid2.
Use !gid_eq(gid1, gid2) instead of gid1 != gid2.
Use uid_eq(uid1, uid2) instead of uid1 == uid2.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
These routines are used by server and client code, so having them in a
separate header would be best.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't really need to preallocate at all; just allocate and initialize
everything at once, but leave the sc_type field initially 0 to prevent
finding the stateid till it's fully initialized.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When free nfs-client, it must free the ->cl_stateids.
Cc: stable@kernel.org
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If CONFIG_LOCKDEP is disabled, then there would be a warning like this:
CC [M] fs/nfsd/nfs4state.o
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]
So, let's add "maybe_unused" tag to this variable.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the procedure of CREATE_SESSION, the state is locked after
alloc_conn_from_crses(). If the allocation fails, the function
goes to "out_free_session", and then "out" where there is an
unlock function.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In alloc_session(), numslots is the correct slot number used by the session.
But the slot number passed to nfsd4_put_drc_mem() is the one from nfs client.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Note the stateid is hashed early on in init_stid(), but isn't currently
being unhashed on error paths.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
To ensure ordering of read data with any following operations, turn off
zero copy if the read is not the final operation in the compound.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I honestly have no idea where I got 129 from, but it's a much bigger
value than the actual buffer size (INET6_ADDRSTRLEN).
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch makes main step in NFSd containerisation.
There could be different approaches to how to make NFSd able to handle
incoming RPC request from different network namespaces. The two main
options are:
1) Share NFSd kthreads betwween all network namespaces.
2) Create separated pool of threads for each namespace.
While first approach looks more flexible, second one is simpler and
non-racy. This patch implements the second option.
To make it possible to allocate separate pools of threads, we have to
make it possible to allocate separate NFSd service structures per net.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fix nfsd4_lockt and release_lockowner to lookup the referenced client,
so that it can renew it, or correctly return "expired", as appropriate.
Also share some code while we're here.
Reported-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Write the client's ip address to any state file and all appropriate
state for that client will be forgotten.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I also log basic information that I can figure out about the type of
state (such as number of locks for each client IP address). This can be
useful for checking that state was actually dropped and later for
checking if the client was able to recover.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The eventual goal is to forget state based on ip address, so it makes
sense to call this function in a for-each-client loop until the correct
amount of state is forgotten. I also use this patch as an opportunity
to rename the forget function from "func()" to "forget()".
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once I have a client, I can easily use its delegation list rather than
searching the file hash table for delegations to remove.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Using "forget_n_state()" forces me to implement the code needed to
forget a specific client's openowners.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I use the new "forget_n_state()" function to iterate through each client
first when searching for locks. This may slow down forgetting locks a
little bit, but it implements most of the code needed to forget a
specified client's locks.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I added in a generic for-each loop that takes a pass over the client_lru
list for the current net namespace and calls some function. The next few
patches will update other operations to use this function as well. A value
of 0 still means "forget everything that is found".
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Each function touches state in some way, so getting the lock earlier
can help simplify code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead. I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Grace time is a part of NFSv4 state engine, which is constructed per network
namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Split NFSv4 state init and shutdown into two different calls: per-net one and
generic one.
Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
- only once on NSFd kthreads start and shutdown respectively.
Refresh of diff-nfsd-call-state-init-twice
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where
get_net() now performed.
Also it introduces new nfs4_state_start_net(), which is now responsible for
state creation and initializing all per-net data and which is now called from
nfs4_state_start().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(),
__nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network
related shutdown operations to nfs4_state_shutdown_net().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4 delegations are stored in global list. But they are nfs4_client
dependent, which is network namespace aware already.
State shutdown and laundromat are done per network namespace as well.
So, delegations unhash have to be done in network namespace context.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This lock protects the client lru list and session hash table, which are
allocated per network namespace already.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.
This function is called by the last NFSd thread on it's exit and state lock
protects actually two functions (del_recall_lru is protected by recall_lock):
1) nfsd4_client_tracking_exit
2) __nfs4_state_shutdown_net
"nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
state can be modified only by tracker callbacks.
Here a re they:
1) create: is called only from nfsd4_proc_compound.
2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
3) check: is called only from nfsd4_proc_compound.
4) grace_done; called only from nfs4_laundromat.
nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
now.
nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
already.
"__nfs4_state_shutdown_net" also doesn't require state lock protection,
because all NFSd kthreads are dead, and no race can happen with NFSd start,
because "nfsd_up" flag is still set.
Moreover, all Nfsd shutdown is protected with global nfsd_mutex.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus has pointed out that indiscriminate use of BUG's can make it
harder to diagnose bugs because they can bring a machine down, often
before we manage to get any useful debugging information to the logs.
(Consider, for example, a BUG() that fires in a workqueue, or while
holding a spinlock).
Most of these BUG's won't do much more than kill an nfsd thread, but it
would still probably be safer to get out the warning without dying.
There's still more of this to do in nfsd/.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>