Commit Graph

1895 Commits

Author SHA1 Message Date
Kees Cook 3fb790ee3a net/sunrpc: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-01-11 11:40:02 -08:00
Randy Dunlap 7144bca681 nfs: fix sunrpc/clnt.c kernel-doc warnings
Fix new kernel-doc warnings in clnt.c:

  Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor'
  Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10 14:35:23 -08:00
Trond Myklebust 87ed50036b SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2013-01-08 14:30:43 -05:00
Trond Myklebust 360e1a5349 SUNRPC: Partial revert of commit 168e4b39d1
Partially revert commit (SUNRPC: add WARN_ON_ONCE for potential deadlock).
The looping behaviour has been tracked down to a knownn issue with
workqueues, and a workaround has now been implemented.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org [>= 3.7]
2013-01-04 12:59:30 -05:00
Trond Myklebust c6567ed140 SUNRPC: Ensure that we free the rpc_task after cleanups are done
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org
2013-01-04 12:53:59 -05:00
Linus Torvalds 982197277c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
 "Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...
2012-12-20 14:04:11 -08:00
J. Bruce Fields afc59400d6 nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:16 -05:00
J. Bruce Fields 3a28e33111 svcrpc: fix some printks
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 16:02:40 -05:00
Stanislav Kinsbursky cd6c596858 SUNRPC: continue run over clients list on PipeFS event instead of break
There are SUNRPC clients, which program doesn't have pipe_dir_name. These
clients can be skipped on PipeFS events, because nothing have to be created or
destroyed. But instead of breaking in case of such a client was found, search
for suitable client over clients list have to be continued. Otherwise some
clients could not be covered by PipeFS event handler.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org [>= v3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-17 12:19:16 -05:00
Trond Myklebust 1efc28780b SUNRPC: variable 'svsk' is unused in function bc_send_request
Silence a compile time warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:05:57 -05:00
Trond Myklebust 4a20a988f7 SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
Silence the unnecessary warning "unhandled error (111) connecting to..."
and convert it to a dprintk for debugging purposes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:02:29 -05:00
Andy Adamson eb96d5c97b SUNRPC handle EKEYEXPIRED in call_refreshresult
Currently, when an RPCSEC_GSS context has expired or is non-existent
and the users (Kerberos) credentials have also expired or are non-existent,
the client receives the -EKEYEXPIRED error and tries to refresh the context
forever.  If an application is performing I/O, or other work against the share,
the application hangs, and the user is not prompted to refresh/establish their
credentials. This can result in a denial of service for other users.

Users are expected to manage their Kerberos credential lifetimes to mitigate
this issue.

Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number
of times to refresh the gss_context, and then return -EACCES to the application.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 15:36:02 -05:00
Andy Adamson 620038f6d2 SUNRPC set gss gc_expiry to full lifetime
Only use the default GSSD_MIN_TIMEOUT if the gss downcall timeout is zero.
Store the full lifetime in gc_expiry (not 3/4 of the lifetime) as subsequent
patches will use the gc_expiry to determine buffered WRITE behavior in the
face of expired or soon to be expired gss credentials.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 15:35:59 -05:00
Trond Myklebust 7ce0171d4f Merge branch 'bugfixes' into nfs-for-next 2012-12-11 09:16:26 -05:00
Stanislav Kinsbursky 756933ee8a SUNRPC: remove redundant "linux/nsproxy.h" includes
This is a cleanup patch.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:30 -05:00
Trond Myklebust c05eecf636 SUNRPC: Don't allow low priority tasks to pre-empt higher priority ones
Currently, the priority queues attempt to be 'fair' to lower priority
tasks by scheduling them after a certain number of higher priority tasks
have run. The problem is that both the transport send queue and
the NFSv4.1 session slot queue have strong ordering requirements.

This patch therefore removes the fairness code in favour of strong
ordering of task priorities.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:53 +01:00
Trond Myklebust 1e1093c7fd NFSv4.1: Don't mess with task priorities in nfs41_setup_sequence
We want to preserve the rpc_task priority for things like writebacks,
that may have differing levels of urgency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:51 +01:00
J. Bruce Fields 836fbadb96 svcrpc: support multiple-fragment rpc's
Over TCP, RPC's are preceded by a single 4-byte field telling you how
long the rpc is (in bytes).  The spec also allows you to send an RPC in
multiple such records (the high bit of the length field is used to tell
you whether this is the final record).

We've survived for years without supporting this because in practice the
clients we care about don't use it.  But the userland rpc libraries do,
and every now and then an experimental client will run into this.  (Most
recently I noticed it while trying to write a pynfs check.)  And we're
really on the wrong side of the spec here--let's fix this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:24 -05:00
J. Bruce Fields 8af345f58a svcrpc: track rpc data length separately from sk_tcplen
Keep a separate field, sk_datalen, that tracks only the data contained
in a fragment, not including the fragment header.

For now, this is always just max(0, sk_tcplen - 4), but after we allow
multiple fragments sk_datalen will accumulate the total rpc data size
while sk_tcplen only tracks progress receiving the current fragment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:14 -05:00
J. Bruce Fields 6a72ae2e23 svcrpc: fix off-by-4 error in "incomplete TCP record" dprintk
The full reclen doesn't include the fragment header, but sk_tcplen does.
Fix this to make it an apples-to-apples comparison.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:49:06 -05:00
J. Bruce Fields ad46ccf094 svcrpc: delay minimum-rpc-size check till later
Soon we want to support multiple fragments, in which case it may be
legal for a single fragment to be smaller than 8 bytes, so we'll want to
delay this check till we've reached the last fragment.

Also fix an outdated comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:47:58 -05:00
J. Bruce Fields cc248d4b1d svcrpc: don't byte-swap sk_reclen in place
Byte-swapping in place is always a little dubious.

Let's instead define this field to always be big-endian, and do the
swapping on demand where we need it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:47:23 -05:00
Trond Myklebust 642fe4d00d SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
rpc_kill_sb() must defer calling put_net() until after the notifier
has been called, since most (all?) of the notifier callbacks assume
that sb->s_fs_info points to a valid net namespace. It also must not
call put_net() if the call to rpc_fill_super was unsuccessful.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=48421

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org [>= v3.4]
2012-11-08 14:53:28 -05:00
J. Bruce Fields 7032a3dd92 svcrpc: demote some printks to a dprintk
In general I'd rather random bad behavior on the network won't trigger a
printk.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:32 -05:00
Trond Myklebust f994c43d19 SUNRPC: Clean up rpc_bind_new_program
We can and should use the rpc_create_args and __rpc_clone_client()
to change the program and version number on the resulting rpc_client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:43 -05:00
Weston Andros Adamson 50d2bdb197 SUNRPC: remove BUG_ON from rpc_call_sync
Use WARN_ON_ONCE instead of calling BUG_ON and return -EINVAL when
RPC_TASK_ASYNC flag is passed to rpc_call_sync.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:43 -05:00
Weston Andros Adamson 0a0c2a57bc SUNRPC: remove BUG_ON in rpc_release_task
Replace BUG_ON() with WARN_ON_ONCE().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:43 -05:00
Weston Andros Adamson 0104729807 SUNRPC: remove BUG_ON in svc_delete_xprt
Replace BUG_ON() with WARN_ON_ONCE().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson 2bd4eef87b SUNRPC: remove BUG_ONs checking RPC_IS_QUEUED
Replace two BUG_ON() calls with WARN_ON_ONCE() and early returns.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson f50ad42837 SUNRPC: remove BUG_ON from __rpc_sleep_on_priority
Replace BUG_ON() with WARN_ON_ONCE().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson 0af39507f6 SUNRPC: remove BUG_ON in svc_register
Instead of calling BUG_ON(), do a WARN_ON_ONCE() and return -EINVAL.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson 332e008a44 SUNRPC: remove BUG_ON from encode_rpcb_string
Replace BUG_ON() with WARN_ON_ONCE() and truncate the encoded string if
len > max.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson b8a13d039c SUNRPC: remove BUG_ON from bc_malloc
Replace BUG_ON() with WARN_ON_ONCE() and NULL return - the caller will handle
this like a memory allocation failure.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson 18e624ad03 SUNRPC: remove BUG_ON in xdr_shrink_bufhead
Replace bounds checking BUG_ON() with a WARN_ON_ONCE() and resetting
the requested len to the max.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson b25cd058f2 SUNRPC: remove BUG_ONs checking RPCSVC_MAXPAGES
Replace two bounds checking BUG_ON() calls with WARN_ON_ONCE() and resetting
the requested size to RPCSVC_MAXPAGES.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson ff1fdb9b80 SUNRPC: remove BUG_ON in svc_xprt_received
Replace BUG_ON() with a WARN_ON_ONCE() and early return.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson 1b7a181907 SUNRPC: remove BUG_ONs from *_reclassify_socket*
Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when
sanity checking socket ownership (lock). The bind call will fail if the
socket was unsuccessfully reclassified.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson 1bd58aaff4 SUNRPC: remove BUG_ON from svc_pool_map_set_cpumask
Replace BUG_ON() with a WARN() and early return.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson f30dfbba16 SUNRPC: remove two BUG_ON asserts
Replace two BUG_ON() calls checking the RPC_BC_PA_IN_USE flag with
WARN_ON_ONCE().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson 749386e906 SUNRPC: remove BUG_ON in rpc_put_sb_net
Replace BUG_ON() with WARN_ON() - the condition is definitely a misuse
of the API, but shouldn't cause a crash.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson 0db74d9a2d SUNRPC: remove BUG_ON calls from cache_read
Replace BUG_ON() with WARN_ON_ONCE() in two parts of cache_read().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson 4c9c52e479 SUNRPC: remove BUG_ON from bc_send
Replace BUG_ON() with WARN_ON_ONCE(). The error condition is a simple
ref counting sanity check and the following code will not free anything
until final put.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Weston Andros Adamson c4ded8d977 SUNRPC: remove BUG_ON from xprt_destroy_backchannel
If max_reqs is 0, do nothing besides the usual dprintks.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson e454a7a83d SUNRPC: remove BUG_ON from rpc_sleep_on*
Replace BUG_ON() with WARN_ON_ONCE() and clean up after inactive task.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 8b827e1f1e SUNRPC: remove BUG_ON from call_bc_transmit
Remove redundant BUG_ON().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 1facf4c4a4 SUNRPC: remove BUG_ON from call_bc_transmit
Replace BUG_ON() with WARN_ON_ONCE().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 576e613d21 SUNRPC: remove BUG_ON from call_transmit
Remove unneeded BUG_ON()

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 9a6478f6cc SUNRPC: remove BUG_ON from rpc_run_bc_task
Replace BUG_ON() with WARN_ON_ONCE() - rpc_run_bc_task calls rpc_init_task()
then increments the tk_count, so this is a simple sanity check that
if hit once would hit every time this code path is executed.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 922eeac30d SUNRPC: remove BUG_ON in __rpc_clnt_handle_event
Print a KERN_INFO message before rpc_d_lookup_sb returns NULL, like
other error paths in that function.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:40 -05:00
Weston Andros Adamson 168e4b39d1 SUNRPC: add WARN_ON_ONCE for potential deadlock
rpc_shutdown_client should never be called from a workqueue context.
If it is, it could deadlock looping forever trying to kill tasks that are
assigned to the same kworker thread (and will never run rpc_exit_task).

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Weston Andros Adamson d24bab93e4 SUNRPC: return proper errno from backchannel_rqst
The one and only caller (in fs/nfs/nfs4client.c) uses the result
as an errno and would have interpreted an error as EPERM.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-01 11:50:53 -04:00
Trond Myklebust f878b657ce SUNRPC: Get rid of the xs_error_report socket callback
Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.

Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:46:15 -04:00
Trond Myklebust 4bc1e68ed6 SUNRPC: Prevent races in xs_abort_connection()
The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:46:08 -04:00
Trond Myklebust b9d2bb2ee5 Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
This reverts commit 55420c24a0.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:45:39 -04:00
Trond Myklebust d0bea455dd SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT
This is needed to ensure that we call xprt_connect() upon the next
call to call_connect().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:44:49 -04:00
Sasha Levin 212ba90696 SUNRPC: Prevent kernel stack corruption on long values of flush
The buffer size in read_flush() is too small for the longest possible values
for it. This can lead to a kernel stack corruption:

[   43.047329] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff833e64b4
[   43.047329]
[   43.049030] Pid: 6015, comm: trinity-child18 Tainted: G        W    3.5.0-rc7-next-20120716-sasha #221
[   43.050038] Call Trace:
[   43.050435]  [<ffffffff836c60c2>] panic+0xcd/0x1f4
[   43.050931]  [<ffffffff833e64b4>] ? read_flush.isra.7+0xe4/0x100
[   43.051602]  [<ffffffff810e94e6>] __stack_chk_fail+0x16/0x20
[   43.052206]  [<ffffffff833e64b4>] read_flush.isra.7+0xe4/0x100
[   43.052951]  [<ffffffff833e6500>] ? read_flush_pipefs+0x30/0x30
[   43.053594]  [<ffffffff833e652c>] read_flush_procfs+0x2c/0x30
[   43.053596]  [<ffffffff812b9a8c>] proc_reg_read+0x9c/0xd0
[   43.053596]  [<ffffffff812b99f0>] ? proc_reg_write+0xd0/0xd0
[   43.053596]  [<ffffffff81250d5b>] do_loop_readv_writev+0x4b/0x90
[   43.053596]  [<ffffffff81250fd6>] do_readv_writev+0xf6/0x1d0
[   43.053596]  [<ffffffff812510ee>] vfs_readv+0x3e/0x60
[   43.053596]  [<ffffffff812511b8>] sys_readv+0x48/0xb0
[   43.053596]  [<ffffffff8378167d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-17 14:59:10 -04:00
Linus Torvalds bd81ccea85 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from J Bruce Fields:
 "Another relatively quiet cycle.  There was some progress on my
  remaining 4.1 todo's, but a couple of them were just of the form
  "check that we do X correctly", so didn't have much affect on the
  code.

  Other than that, a bunch of cleanup and some bugfixes (including an
  annoying NFSv4.0 state leak and a busy-loop in the server that could
  cause it to peg the CPU without making progress)."

* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
  UAPI: (Scripted) Disintegrate include/linux/sunrpc
  UAPI: (Scripted) Disintegrate include/linux/nfsd
  nfsd4: don't allow reclaims of expired clients
  nfsd4: remove redundant callback probe
  nfsd4: expire old client earlier
  nfsd4: separate session allocation and initialization
  nfsd4: clean up session allocation
  nfsd4: minor free_session cleanup
  nfsd4: new_conn_from_crses should only allocate
  nfsd4: separate connection allocation and initialization
  nfsd4: reject bad forechannel attrs earlier
  nfsd4: enforce per-client sessions/no-sessions distinction
  nfsd4: set cl_minorversion at create time
  nfsd4: don't pin clientids to pseudoflavors
  nfsd4: fix bind_conn_to_session xdr comment
  nfsd4: cast readlink() bug argument
  NFSD: pass null terminated buf to kstrtouint()
  nfsd: remove duplicate init in nfsd4_cb_recall
  nfsd4: eliminate redundant nfs4_free_stateid
  fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
  ...
2012-10-13 10:53:54 +09:00
J. Bruce Fields a9ca4043d0 Merge Trond's bugfixes
Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
for-3.7-incoming.  Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
backchannel tcp ops", without which the 4.1 server oopses.
2012-10-11 12:41:05 -04:00
Linus Torvalds df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
J. Bruce Fields f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Linus Torvalds 033d9959ed Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
2012-10-02 09:54:49 -07:00
Chuck Lever ba9b584c1d SUNRPC: Introduce rpc_clone_client_set_auth()
An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create().  However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there.  For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.

It turns out that's not the only problem here.  While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.

So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation.  Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.

This makes immediate use of the new __rpc_clone_client() helper.

This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor.  Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.

To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever 1b63a75180 SUNRPC: Refactor rpc_clone_client()
rpc_clone_client() does most of the same tasks as rpc_new_client(),
so there is an opportunity for code re-use.  Create a generic helper
that makes it easy to clone an RPC client while replacing any of the
clnt's parameters.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:32:07 -07:00
Chuck Lever 632f0d0503 SUNRPC: Use __func__ in dprintk() in auth_gss.c
Clean up: Some function names have changed, but debugging messages
were never updated.  Automate the construction of the function name
in debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:32:02 -07:00
Chuck Lever d8af9bc16c SUNRPC: Clean up dprintk messages in rpc_pipe.c
Clean up: The blank space in front of the message must be spaces.
Tabs show up on the console as a graphical character.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:31:57 -07:00
Trond Myklebust 9b96ce7197 SUNRPC: Limit the rpciod workqueue concurrency
We shouldn't need more than 1 worker thread per cpu, since rpciod
is designed to run without sleeping in most cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 20:24:16 -04:00
Trond Myklebust d19751e7b9 SUNRPC: Get rid of the redundant xprt->shutdown bit field
It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:05 -04:00
Trond Myklebust a11a2bf4de SUNRPC: Optimise away unnecessary data moves in xdr_align_pages
We only have to call xdr_shrink_pagelen() if the remaining RPC
message does not fit in the page buffer length that we supplied
to xdr_align_pages().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 15:58:42 -04:00
Trond Myklebust 8a9a8b8332 SUNRPC: Fix the return value of xdr_align_pages()
The callers of xdr_align_pages() expect it to return the number of bytes
of actual XDR data remaining in the pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-26 12:43:10 -04:00
Bryan Schumaker 84e28a307e SUNRPC: Set alloc_slot for backchannel tcp ops
f39c1bfb5a (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations.  This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
 [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc]
 [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc]
 [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc]
 [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc]
 [<ffffffff81073589>] process_one_work+0x139/0x500
 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70
 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc]
 [<ffffffff81073d1e>] worker_thread+0x15e/0x460
 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70
 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230
 [<ffffffff81079603>] kthread+0x93/0xa0
 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10
 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81465d00>] ? gs_change+0x13/0x13

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25 10:33:59 -04:00
Trond Myklebust a519fc7a70 SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT
Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@vger.kernel.org
2012-09-19 18:16:10 -04:00
Linus Torvalds 22b4e63ebe NFS client bugfixes for Linux 3.6
- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
   should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQUN+KAAoJEGcL54qWCgDyHGcQAKj7MYVDIjhdmsVGGNWXUCnf
 X0LVg/ajh+vjusK+hmquzcJikZqgce5IU5DW4vcFr1X8BgP+R51UVvU0KksByD5H
 ourV2JVCztAQzQ4WWOsZAGqN0tooJUjyEjl4lEiDsQCF4Nk1HWbuCHeYuX74OToZ
 jrgedj0EZ6zb7TOizvbgU/7lI+FKu3Hlw6+u27M9phtSuefJdYSHZHYVMOX81qPh
 k0zgZ4tuLIaDuBB84iCrPwNt9icnevq6cIc+AGluI6xhDw+foPvUaUR+OUI420IZ
 tunNzP2So+nNoyjEiyMVENaCdEyA75XAmmGHTUUdBiVOsMV4HF/TqvTtSsjk2mN1
 FbZVvtjD6srjsQaKdVmqMIZBdhY9LSMLIQVqb4H2rYP6Mwq06WTuyCxf5YhzFfoy
 2tai7JuqBkTAWfKB8ESWywV6Qk/MkUWRAOBO6ksS66gAwpcFDj6nfeAdwaEmoYKc
 uzLUIRZaclPMZf661cs1fWeFV5XOnCL7je4owgTRGs7MHooWHPcC3273fEJqnhFz
 5MkC7nfmUiGcdO1v0mfYTEtMj9Pp9icBoZcVTGn4eZIHzvhhZOx//8LhyBfS+jll
 bKjaLZ1rErvIqwnSGcB7PK2yBYY9P6ZaxWjOrAAncZmiOxfhN0hvCo54jNOr/VZ+
 atsDEAuqSTeK7ouBqyO4
 =e5yE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Final (hopefully) fix for the range checking code in NFSv4 getacl.
   This should fix the Oopses being seen when the acl size is close to
   PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.

* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fsync() must exit with an error if page writeback failed
  SUNRPC: Fix a UDP transport regression
  NFS: return error from decode_getfh in decode open
  NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
  NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
  NFS: Fix a problem with the legacy binary mount code
  NFS: Fix the initialisation of the readdir 'cookieverf' array
2012-09-13 09:04:13 +08:00
J. Bruce Fields eccf50c129 nfsd: remove unused listener-removal interfaces
You can use nfsd/portlist to give nfsd additional sockets to listen on.
In theory you can also remove listening sockets this way.  But nobody's
ever done that as far as I can tell.

Also this was partially broken in 2.6.25, by
a217813f90 "knfsd: Support adding
transports by writing portlist file".

(Note that we decide whether to take the "delfd" case by checking for a
digit--but what's actually expected in that case is something made by
svc_one_sock_name(), which won't begin with a digit.)

So, let's just rip out this stuff.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 10:55:19 -04:00
Trond Myklebust f39c1bfb5a SUNRPC: Fix a UDP transport regression
Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2012-09-07 11:43:49 -04:00
J. Bruce Fields 65b2e6656b svcrpc: split up svc_handle_xprt
Move initialization of newly accepted socket into a helper.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:02 -04:00
J. Bruce Fields 6797fa5a01 svcrpc: break up svc_recv
Matter of taste, I suppose, but svc_recv breaks up naturally into:

	allocate pages and setup arg
	dequeue (wait for, if necessary) next socket
	do something with that socket

And I find it easier to read when it doesn't go on for pages and pages.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:01 -04:00
J. Bruce Fields 6741019c82 svcrpc: make svc_xprt_received static
Note this isn't used outside svc_xprt.c.

May as well move it so we don't need a declaration while we're here.

Also remove an outdated comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:01 -04:00
J. Bruce Fields 9f9d2ebe69 svcrpc: make xpo_recvfrom return only >=0
The only errors returned from xpo_recvfrom have been -EAGAIN and
-EAFNOSUPPORT.  The latter was removed by a previous patch.  That leaves
only -EAGAIN, which is treated just like 0 by the caller (svc_recv).

So, just ditch -EAGAIN and return 0 instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:41:07 -04:00
J. Bruce Fields af6d572134 svcrpc: don't bother checking bad svc_addr_len result
None of the callers should see an unsupported address family (only one
of them even bothers to check for that case), so just check for the
buggy case in svc_addr_len and don't bother elsewhere.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:40:10 -04:00
J. Bruce Fields f23abfdb94 svcrpc: minor udp code cleanup
Order the code in a more boring way.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:50 -04:00
J. Bruce Fields 39b5530137 svcrpc: share some setup of listening sockets
There's some duplicate code here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:48 -04:00
J. Bruce Fields c334196694 svcrpc: make svc_create_xprt enqueue on clearing XPT_BUSY
Whenever we clear XPT_BUSY we should call svc_xprt_enqueue().  Without
that we may fail to notice any events (such as new connections) that
arrived while XPT_BUSY was set.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:36 -04:00
Tejun Heo 203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
J. Bruce Fields a8e10078a8 svcrpc: clean up control flow
Mainly, use the kernel standard

	err = -ERROR;
	if (something_bad)
		goto out;
	normal case;

rather than

	if (something_bad)
		err = -ERROR
	else {
		normal case;
	}

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:41 -04:00
J. Bruce Fields 72c3537607 svcrpc: standardize svc_setup_socket return convention
Use the kernel-standard ptr-or-error return convention instead of
passing a pointer to the error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:40 -04:00
J. Bruce Fields 719f8bcc88 svcrpc: fix xpt_list traversal locking on shutdown
Server threads are not running at this point, but svc_age_temp_xprts
still may be, so we need this locking.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:40 -04:00
J. Bruce Fields d10f27a750 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:19 -04:00
J. Bruce Fields f06f00a24d svcrpc: sends on closed socket should stop immediately
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: stable@vger.kernel.org
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:59 -04:00
J. Bruce Fields be1e44441a svcrpc: fix BUG() in svc_tcp_clear_pages
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:44 -04:00
Linus Torvalds ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Linus Torvalds 6dbb35b0a7 NFS client updates for Linux 3.6
Features include:
 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.
 - Fix Oopses in the NFSv4 idmapper
 - Fix a deadlock whereby rpciod tries to allocate a new socket and
   ends up recursing into the NFS code due to memory reclaim.
 - Increase the number of permitted callback connections.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
 XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
 scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
 hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
 mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
 uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
 zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
 jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
 sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
 cV9Ex8zh2Y4aiUXCy3uA
 =KSix
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull second wave of NFS client updates from Trond Myklebust:

 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.

 - Fix Oopses in the NFSv4 idmapper

 - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
   up recursing into the NFS code due to memory reclaim.

 - Increase the number of permitted callback connections.

* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: explicitly reject LOCK_MAND flock() requests
  nfs: increase number of permitted callback connections.
  SUNRPC: return negative value in case rpcbind client creation error
  NFS: Convert v4 into a module
  NFS: Convert v3 into a module
  NFS: Convert v2 into a module
  NFS: Keep module parameters in the generic NFS client
  NFS: Split out remaining NFS v4 inode functions
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Only initialize the ACL client in the v3 case
  NFS: Create a try_mount rpc op
  NFS: Remove the NFS v4 xdev mount function
  NFS: Add version registering framework
  NFS: Fix a number of bugs in the idmapper
  nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  sunrpc: clarify comments on rpc_make_runnable
  pnfsblock: bail out partial page IO
2012-07-31 18:45:44 -07:00
Mel Gorman a564b8f039 nfs: enable swap on NFS
Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:48 -07:00
Linus Torvalds 08843b79fb Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J. Bruce Fields:
 "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
  The one large piece is Stanislav's work to containerize the server's
  grace period--but that in itself is just one more step in a
  not-yet-complete project to allow fully containerized nfs service.

  There are a number of outstanding delegation, container, v4 state, and
  gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
  NFSd: make boot_time variable per network namespace
  NFSd: make grace end flag per network namespace
  Lockd: move grace period management from lockd() to per-net functions
  LockD: pass actual network namespace to grace period management functions
  LockD: manage grace list per network namespace
  SUNRPC: service request network namespace helper introduced
  NFSd: make nfsd4_manager allocated per network namespace context.
  LockD: make lockd manager allocated per network namespace
  LockD: manage grace period per network namespace
  Lockd: add more debug to host shutdown functions
  Lockd: host complaining function introduced
  LockD: manage used host count per networks namespace
  LockD: manage garbage collection timeout per networks namespace
  LockD: make garbage collector network namespace aware.
  LockD: mark host per network namespace on garbage collect
  nfsd4: fix missing fault_inject.h include
  locks: move lease-specific code out of locks_delete_lock
  locks: prevent side-effects of locks_release_private before file_lock is initialized
  NFSd: set nfsd_serv to NULL after service destruction
  NFSd: introduce nfsd_destroy() helper
  ...
2012-07-31 14:42:28 -07:00
Linus Torvalds 1fad1e9a74 NFS client updates for Linux 3.6
Features include:
 - More preparatory patches for modularising NFSv2/v3/v4.
   Split out the various NFSv2/v3/v4-specific code into separate
   files
 - More preparation for the NFSv4 migration code
 - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters
 - pNFS fast failover when the data servers are down
 - Various cleanups and debugging patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg
 0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL
 cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4
 QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV
 qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw
 OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q
 ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR
 iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0
 NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx
 I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC
 4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL
 YNVVGhvNxccMU5EU9YR4
 =Lc59
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:
   - More preparatory patches for modularising NFSv2/v3/v4.  Split out
     the various NFSv2/v3/v4-specific code into separate files
   - More preparation for the NFSv4 migration code
   - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
     parameters
   - pNFS fast failover when the data servers are down
   - Various cleanups and debugging patches"

* tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  nfs: fix fl_type tests in NFSv4 code
  NFS: fix pnfs regression with directio writes
  NFS: fix pnfs regression with directio reads
  sunrpc: clnt: Add missing braces
  nfs: fix stub return type warnings
  NFS: exit_nfs_v4() shouldn't be an __exit function
  SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
  NFS: Split out NFS v4 client functions
  NFS: Split out the NFS v4 filesystem types
  NFS: Create a single nfs_clone_super() function
  NFS: Split out NFS v4 server creating code
  NFS: Initialize the NFS v4 client from init_nfs_v4()
  NFS: Move the v4 getroot code to nfs4getroot.c
  NFS: Split out NFS v4 file operations
  NFS: Initialize v4 sysctls from nfs_init_v4()
  NFS: Create an init_nfs_v4() function
  NFS: Split out NFS v4 inode operations
  NFS: Split out NFS v3 inode operations
  NFS: Split out NFS v2 inode operations
  NFS: Clean up nfs4_proc_setclientid() and friends
  ...
2012-07-30 19:16:57 -07:00
Stanislav Kinsbursky caea33da89 SUNRPC: return negative value in case rpcbind client creation error
Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.0]
2012-07-30 20:39:05 -04:00
Jeff Layton 5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
Jeff Layton 506026c3ec sunrpc: clarify comments on rpc_make_runnable
rpc_make_runnable is not generally called with the queue lock held, unless
it's waking up a task that has been sitting on a waitqueue. This is safe
when the task has not entered the FSM yet, but the comments don't really
spell this out.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:55:08 -04:00
Joe Perches cac5d07e3c sunrpc: clnt: Add missing braces
Add a missing set of braces that commit 4e0038b6b2
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
2012-07-30 17:58:08 -04:00
Eric Dumazet ddbe503203 ipv6: add ipv6_addr_hash() helper
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 11:28:46 -07:00
Trond Myklebust 013448c59b SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
The patch "SUNRPC: Add rpcauth_list_flavors()" introduces a new error
path in gss_mech_list_pseudoflavors, but fails to release the spin lock.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 17:02:57 -04:00