Commit Graph

1555 Commits

Author SHA1 Message Date
Chuck Lever 7402ab19cd SUNRPC: Use AF_LOCAL for rpcbind upcalls
As libtirpc does in user space, have our registration API try using an
AF_LOCAL transport first when registering and unregistering.

This means we don't chew up privileged ports, and our registration is
bound to an "owner" (the effective uid of the process on the sending
end of the transport).  Only that "owner" may unregister the service.

The kernel could probe rpcbind via an rpcbind query to determine
whether rpcbind has an AF_LOCAL service. For simplicity, we use the
same technique that libtirpc uses: simply fail over to network
loopback if creating an AF_LOCAL transport to the well-known rpcbind
service socket fails.

This means we open-code the pathname of the rpcbind socket in the
kernel.  For now we have to do that anyway because the kernel's
RPC over AF_LOCAL implementation does not support autobind.  That may
be undesirable in the long term.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Chuck Lever da09eb9303 SUNRPC: Clean up use of curly braces in switch cases
Clean up.  Preferred style is not to use curly braces around
switch cases.  I'm about to add another case that needs a third
type cast.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Chuck Lever 61677eeec2 SUNRPC: Rename xs_encode_tcp_fragment_header()
Clean up: Use a more generic name for xs_encode_tcp_fragment_header();
it's appropriate to use for all stream transport types.  We're about
to add new stream transport.

Also, move it to a place where it is more easily shared amongst the
various send_request methods.  And finally, replace the "htonl" macro
invocation with its modern equivalent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Trond Myklebust fe19a96b10 SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-05-27 17:42:00 -04:00
Linus Torvalds 4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Sean Hefty b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Ying Han 1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Linus Torvalds 57d19e80f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  b43: fix comment typo reqest -> request
  Haavard Skinnemoen has left Atmel
  cris: typo in mach-fs Makefile
  Kconfig: fix copy/paste-ism for dell-wmi-aio driver
  doc: timers-howto: fix a typo ("unsgined")
  perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
  md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
  treewide: fix a few typos in comments
  regulator: change debug statement be consistent with the style of the rest
  Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
  audit: acquire creds selectively to reduce atomic op overhead
  rtlwifi: don't touch with treewide double semicolon removal
  treewide: cleanup continuations and remove logging message whitespace
  ath9k_hw: don't touch with treewide double semicolon removal
  include/linux/leds-regulator.h: fix syntax in example code
  tty: fix typo in descripton of tty_termios_encode_baud_rate
  xtensa: remove obsolete BKL kernel option from defconfig
  m68k: fix comment typo 'occcured'
  arch:Kconfig.locks Remove unused config option.
  treewide: remove extra semicolons
  ...
2011-05-23 09:12:26 -07:00
Justin P. Mattock 70f23fd66b treewide: fix a few typos in comments
- kenrel -> kernel
- whetehr -> whether
- ttt -> tt
- sss -> ss

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-05-10 10:16:21 +02:00
Jiri Kosina 07f9479a40 Merge branch 'master' into for-next
Fast-forwarded to current state of Linus' tree as there are patches to be
applied for files that didn't exist on the old branch.
2011-04-26 10:22:59 +02:00
Trond Myklebust 7494d00c7b SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
On occasion, it is useful for the NFS layer to distinguish between
soft timeouts and other EIO errors due to (say) encoding errors,
or authentication errors.

The following patch ensures that the default behaviour of the RPC
layer remains to return EIO on soft timeouts (until we have
audited all the callers).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24 14:28:45 -04:00
Bryan Schumaker 468f86134e NFSv4.1: Don't update sequence number if rpc_task is not sent
If we fail to contact the gss upcall program, then no message will
be sent to the server.  The client still updated the sequence number,
however, and this lead to NFS4ERR_SEQ_MISMATCH for the next several
RPC calls.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-18 17:05:48 -04:00
Trond Myklebust e3b2854faa SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies
Since kernel 2.6.35, the SUNRPC Kerberos support has had an implicit
dependency on a number of additional crypto modules. The following
patch makes that dependency explicit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-15 18:28:18 -04:00
Bryan Schumaker d1a8016a2d NFS: Fix infinite loop in gss_create_upcall()
There can be an infinite loop if gss_create_upcall() is called without
the userspace program running.  To prevent this, we return -EACCES if
we notice that pipe_version hasn't changed (indicating that the pipe
has not been opened).

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:22 -04:00
Justin P. Mattock 6eab04a876 treewide: remove extra semicolons
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-10 17:01:05 +02:00
J. Bruce Fields 8985ef0b8a svcrpc: complete svsk processing on cb receive failure
Currently when there's some failure to receive a callback (because we
couldn't find a matching xid, for example), we exit svc_recv with
sk_tcplen still set but without any pages saved with the socket.  This
will cause a crash later in svc_tcp_restore_pages.

Instead, make sure we reset that tcp information whether the callback
received failed or succeeded.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-10 10:47:46 -04:00
Linus Torvalds 94c8a984ae Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
  NFS: Fix a signed vs. unsigned secinfo bug
  Revert "net/sunrpc: Use static const char arrays"
2011-04-08 11:47:35 -07:00
Olga Kornievskaia 9660439861 svcrpc: take advantage of tcp autotuning
Allow the NFSv4 server to make use of TCP autotuning behaviour, which
was previously disabled by setting the sk_userlocks variable.

Set the receive buffers to be big enough to receive the whole RPC
request, and set this for the listening socket, not the accept socket.

Remove the code that readjusts the receive/send buffer sizes for the
accepted socket. Previously this code was used to influence the TCP
window management behaviour, which is no longer needed when autotuning
is enabled.

This can improve IO bandwidth on networks with high bandwidth-delay
products, where a large tcp window is required.  It also simplifies
performance tuning, since getting adequate tcp buffers previously
required increasing the number of nfsd threads.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2011-04-07 14:36:40 -04:00
J. Bruce Fields 31d68ef65c SUNRPC: Don't wait for full record to receive tcp data
Ensure that we immediately read and buffer data from the incoming TCP
stream so that we grow the receive window quickly, and don't deadlock on
large READ or WRITE requests.

Also do some minor exit cleanup.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:40 -04:00
Trond Myklebust 586c52cc61 svcrpc: copy cb reply instead of pages
It's much simpler just to copy the cb reply data than to play tricks
with pages.  Callback replies will typically be very small (at least
until we implement cb_getattr, in which case files with very long ACLs
could pose a problem), so there's no loss in efficiency.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:40 -04:00
J. Bruce Fields cc6c2127f2 svcrpc: close connection if client sends short packet
If the client sents a record too short to contain even the beginning of
the rpc header, then just close the connection.

The current code drops the record data and continues.  I don't see the
point.  It's a hopeless situation and simpler just to cut off the
connection completely.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:40 -04:00
J. Bruce Fields 48e6555c7b svcrpc: note network-order types in svc_process_calldir
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:40 -04:00
Trond Myklebust 5ee78d483c SUNRPC: svc_tcp_recvfrom cleanup
Minor cleanup in preparation for later patches.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:39 -04:00
Trond Myklebust 0601f79392 SUNRPC: requeue tcp socket less frequently
Don't requeue the socket in some cases where we know it's unnecessary.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-07 14:36:39 -04:00
Trond Myklebust 0867659fa3 Revert "net/sunrpc: Use static const char arrays"
This reverts commit 411b5e0561.

Olga Kornievskaia reports:

Problem: linux client mounting linux server using rc4-hmac-md5
enctype. gssd fails with create a context after receiving a reply from
the server.

Diagnose: putting printout statements in the server kernel and
kerberos libraries revealed that client and server derived different
integrity keys.

Server kernel code was at fault due the the commit

[aglo@skydive linux-pnfs]$ git show 411b5e0561

Trond: The problem is that since it relies on virt_to_page(), you cannot
call sg_set_buf() for data in the const section.

Reported-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org	[2.6.36+]
2011-04-06 11:18:17 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
OGAWA Hirofumi a271c5a0de NFS: Ensure that rpc_release_resources_task() can be called twice.
BUG: atomic_dec_and_test(): -1: atomic counter underflow at:
Pid: 2827, comm: mount.nfs Not tainted 2.6.38 #1
Call Trace:
 [<ffffffffa02223a0>] ? put_rpccred+0x44/0x14e [sunrpc]
 [<ffffffffa021bbe9>] ? rpc_ping+0x4e/0x58 [sunrpc]
 [<ffffffffa021c4a5>] ? rpc_create+0x481/0x4fc [sunrpc]
 [<ffffffffa022298a>] ? rpcauth_lookup_credcache+0xab/0x22d [sunrpc]
 [<ffffffffa028be8c>] ? nfs_create_rpc_client+0xa6/0xeb [nfs]
 [<ffffffffa028c660>] ? nfs4_set_client+0xc2/0x1f9 [nfs]
 [<ffffffffa028cd3c>] ? nfs4_create_server+0xf2/0x2a6 [nfs]
 [<ffffffffa0295d07>] ? nfs4_remote_mount+0x4e/0x14a [nfs]
 [<ffffffff810dd570>] ? vfs_kern_mount+0x6e/0x133
 [<ffffffffa029605a>] ? nfs_do_root_mount+0x76/0x95 [nfs]
 [<ffffffffa029643d>] ? nfs4_try_mount+0x56/0xaf [nfs]
 [<ffffffffa0297434>] ? nfs_get_sb+0x435/0x73c [nfs]
 [<ffffffff810dd59b>] ? vfs_kern_mount+0x99/0x133
 [<ffffffff810dd693>] ? do_kern_mount+0x48/0xd8
 [<ffffffff810f5b75>] ? do_mount+0x6da/0x741
 [<ffffffff810f5c5f>] ? sys_mount+0x83/0xc0
 [<ffffffff8100293b>] ? system_call_fastpath+0x16/0x1b

Well, so, I think this is real bug of nfs codes somewhere. With some
review, the code

rpc_call_sync()
    rpc_run_task
        rpc_execute()
            __rpc_execute()
                rpc_release_task()
                    rpc_release_resources_task()
                        put_rpccred()                <= release cred
    rpc_put_task
        rpc_do_put_task()
            rpc_release_resources_task()
                put_rpccred()                        <= release cred again

seems to be release cred unintendedly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-27 17:55:36 +02:00
Linus Torvalds 40471856f2 Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (28 commits)
  Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO
  NFSv4.1 convert layoutcommit sync to boolean
  NFSv4.1 pnfs_layoutcommit_inode fixes
  NFS: Determine initial mount security
  NFS: use secinfo when crossing mountpoints
  NFS: Add secinfo procedure
  NFS: lookup supports alternate client
  NFS: convert call_sync() to a function
  NFSv4.1 remove temp code that prevented ds commits
  NFSv4.1: layoutcommit
  NFSv4.1: filelayout driver specific code for COMMIT
  NFSv4.1: remove GETATTR from ds commits
  NFSv4.1: add generic layer hooks for pnfs COMMIT
  NFSv4.1: alloc and free commit_buckets
  NFSv4.1: shift filelayout_free_lseg
  NFSv4.1: pull out code from nfs_commit_release
  NFSv4.1: pull error handling out of nfs_commit_list
  NFSv4.1: add callback to nfs4_commit_done
  NFSv4.1: rearrange nfs_commit_rpcsetup
  NFSv4.1: don't send COMMIT to ds for data sync writes
  ...
2011-03-25 10:03:28 -07:00
Trond Myklebust 0acd220192 Merge branch 'nfs-for-2.6.39' into nfs-for-next 2011-03-24 17:03:14 -04:00
Bryan Schumaker 8f70e95f9f NFS: Determine initial mount security
When sec=<something> is not presented as a mount option,
we should attempt to determine what security flavor the
server is using.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Bryan Schumaker 7ebb931598 NFS: use secinfo when crossing mountpoints
A submount may use different security than the parent
mount does.  We should figure out what sec flavor the
submount uses at mount time.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Linus Torvalds dc87c55120 Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfsd: wrong index used in inner loop
  nfsd4: fix comment and remove unused nfsd4_file fields
  nfs41: make sure nfs server return right ca_maxresponsesize_cached
  nfsd: fix compile error
  svcrpc: fix bad argument in unix_domain_find
  nfsd4: fix struct file leak
  nfsd4: minor nfs4state.c reshuffling
  svcrpc: fix rare race on unix_domain creation
  nfsd41: modify the members value of nfsd4_op_flags
  nfsd: add proc file listing kernel's gss_krb5 enctypes
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  NFSD, VFS: Remove dead code in nfsd_rename()
  nfsd: kill unused macro definition
  locks: use assign_type()
2011-03-24 08:20:39 -07:00
Trond Myklebust 246408dcd5 SUNRPC: Never reuse the socket port after an xs_close()
If we call xs_close(), we're in one of two situations:
 - Autoclose, which means we don't expect to resend a request
 - bind+connect failed, which probably means the port is in use

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-22 18:42:33 -04:00
Linus Torvalds 179198373c Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
  RPC: killing RPC tasks races fixed
  xprt: remove redundant check
  SUNRPC: Convert struct rpc_xprt to use atomic_t counters
  SUNRPC: Ensure we always run the tk_callback before tk_action
  sunrpc: fix printk format warning
  xprt: remove redundant null check
  nfs: BKL is no longer needed, so remove the include
  NFS: Fix a warning in fs/nfs/idmap.c
  Cleanup: Factor out some cut-and-paste code.
  cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
  NFS: account direct-io into task io accounting
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  RPCRDMA: Fix FRMR registration/invalidate handling.
  RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
  NFSv4: Send unmapped uid/gids to the server when using auth_sys
  NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
  NFSv4: cleanup idmapper functions to take an nfs_server argument
  NFSv4: Send unmapped uid/gids to the server if the idmapper fails
  NFSv4: If the server sends us a numeric uid/gid then accept it
  NFSv4.1: reject zero layout with zeroed stripe unit
  ...
2011-03-17 17:40:00 -07:00
Jesper Juhl 4be34b9d69 SUNRPC: Remove resource leak in svc_rdma_send_error()
We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-17 14:38:03 -04:00
Stanislav Kinsbursky 8e26de238f RPC: killing RPC tasks races fixed
RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up
task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to
NULL).
Also, as Trond Myklebust mentioned, such approach (instead of checking
tk_waitqueue to NULL) allows us to "optimise away the call to
rpc_wake_up_queued_task() altogether for those
tasks that aren't queued".

Here is an example of dereferencing of tk_waitqueue equal to NULL:

CPU 0               	CPU 1				CPU 2
--------------------	---------------------	--------------------------
nfs4_run_open_task
rpc_run_task
rpc_execute
rpc_set_active
rpc_make_runnable
(waiting)
			rpc_async_schedule
			nfs4_open_prepare
			nfs_wait_on_sequence
						nfs_umount_begin
						rpc_killall_tasks
						rpc_wake_up_task
						rpc_wake_up_queued_task
						spin_lock(tk_waitqueue == NULL)
						BUG()
			rpc_sleep_on
			spin_lock(&q->lock)
			__rpc_sleep_on
			task->tk_waitqueue = q

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:39:00 -04:00
j223yang@asset.uwaterloo.ca ba3c578de2 xprt: remove redundant check
remove redundant check.

Signed-off-by: Jinqiu Yang <crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:39:00 -04:00
Trond Myklebust a8de240a90 SUNRPC: Convert struct rpc_xprt to use atomic_t counters
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:38:59 -04:00
Trond Myklebust e020c6800c SUNRPC: Ensure we always run the tk_callback before tk_action
This fixes a race in which the task->tk_callback() puts the rpc_task
to sleep, setting a new callback. Under certain circumstances, the current
code may end up executing the task->tk_action before it gets round to the
callback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-17 12:38:41 -04:00
Linus Torvalds 7a6362800c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
  bonding: enable netpoll without checking link status
  xfrm: Refcount destination entry on xfrm_lookup
  net: introduce rx_handler results and logic around that
  bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
  bonding: wrap slave state work
  net: get rid of multiple bond-related netdevice->priv_flags
  bonding: register slave pointer for rx_handler
  be2net: Bump up the version number
  be2net: Copyright notice change. Update to Emulex instead of ServerEngines
  e1000e: fix kconfig for crc32 dependency
  netfilter ebtables: fix xt_AUDIT to work with ebtables
  xen network backend driver
  bonding: Improve syslog message at device creation time
  bonding: Call netif_carrier_off after register_netdevice
  bonding: Incorrect TX queue offset
  net_sched: fix ip_tos2prio
  xfrm: fix __xfrm_route_forward()
  be2net: Fix UDP packet detected status in RX compl
  Phonet: fix aligned-mode pipe socket buffer header reserve
  netxen: support for GbE port settings
  ...

Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
2011-03-16 16:29:25 -07:00
Linus Torvalds bd2895eead Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix build failure introduced by s/freezeable/freezable/
  workqueue: add system_freezeable_wq
  rds/ib: use system_wq instead of rds_ib_fmr_wq
  net/9p: replace p9_poll_task with a work
  net/9p: use system_wq instead of p9_mux_wq
  xfs: convert to alloc_workqueue()
  reiserfs: make commit_wq use the default concurrency level
  ocfs2: use system_wq instead of ocfs2_quota_wq
  ext4: convert to alloc_workqueue()
  scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
  scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
  misc/iwmc3200top: use system_wq instead of dedicated workqueues
  i2o: use alloc_workqueue() instead of create_workqueue()
  acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
  fs/aio: aio_wq isn't used in memory reclaim path
  input/tps6507x-ts: use system_wq instead of dedicated workqueue
  cpufreq: use system_wq instead of dedicated workqueues
  wireless/ipw2x00: use system_wq instead of dedicated workqueues
  arm/omap: use system_wq in mailbox
  workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-16 08:20:19 -07:00
Randy Dunlap 986d4abbdd sunrpc: fix printk format warning
Fix printk format build warning:

net/sunrpc/xprtrdma/verbs.c:1463: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15 20:17:32 -04:00
j223yang@asset.uwaterloo.ca 4d4a76f330 xprt: remove redundant null check
'req' is dereferenced before checked for NULL.
The patch simply removes the check.

Signed-off-by: Jinqiu Yang<crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15 20:16:14 -04:00
Kevin Coffman f8628220bb gss:krb5 only include enctype numbers in gm_upcall_enctypes
Make the value in gm_upcall_enctypes just the enctype values.
This allows the values to be used more easily elsewhere.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Tom Tucker 5c635e09ce RPCRDMA: Fix FRMR registration/invalidate handling.
When the rpc_memreg_strategy is 5, FRMR are used to map RPC data.
This mode uses an FRMR to map the RPC data, then invalidates
(i.e. unregisers) the data in xprt_rdma_free. These FRMR are used
across connections on the same mount, i.e. if the connection goes
away on an idle timeout and reconnects later, the FRMR are not
destroyed and recreated.

This creates a problem for transport errors because the WR that
invalidate an FRMR may be flushed (i.e. fail) leaving the
FRMR valid. When the FRMR is later used to map an RPC it will fail,
tearing down the transport and starting over. Over time, more and
more of the FRMR pool end up in the wrong state resulting in
seemingly random disconnects.

This fix keeps track of the FRMR state explicitly by setting it's
state based on the successful completion of a reg/inv WR. If the FRMR
is ever used and found to be in the wrong state, an invalidate WR
is prepended, re-syncing the FRMR state and avoiding the connection loss.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Tom Tucker bd7ea31b9e RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
The RPCRDMA marshalling logic assumed that xdr->page_base was an
offset into the first page of xdr->page_list. It is in fact an
offset into the xdr->page_list itself, that is, it selects the
first page in the page_list and the offset into that page.

The symptom depended in part on the rpc_memreg_strategy, if it was
FRMR, or some other one-shot mapping mode, the connection would get
torn down on a base and bounds error. When the badly marshalled RPC
was retransmitted it would reconnect, get the error, and tear down the
connection again in a loop forever. This resulted in a hung-mount. For
the other modes, it would result in silent data corruption. This bug is
most easily reproduced by writing more data than the filesystem
has space for.

This fix corrects the page_base assumption and otherwise simplifies
the iov mapping logic.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Andy Adamson cbdabc7f8b NFSv4.1: filelayout async error handler
Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Fred Isaman eabf5baaaa RPC: clarify rpc_run_task error handling
rpc_run_task can only fail if it is not passed in a preallocated task.
However, that is not at all clear with the current code.  So
remove several impossible to occur failure checks.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Fred Isaman cee6a5372f RPC: remove check for impossible condition in rpc_make_runnable
queue_work() only returns 0 or 1, never a negative value.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Ben Hutchings 4cea288aaf sunrpc: Propagate errors from xs_bind() through xs_create_sock()
xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [v2.6.37]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:58 -05:00
Jesper Juhl a5e5026810 SUNRPC: Remove resource leak in svc_rdma_send_error()
We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:54 -05:00
Trond Myklebust bf294b41ce SUNRPC: Close a race in __rpc_wait_for_completion_task()
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:52 -05:00
J. Bruce Fields 352b5d13c0 svcrpc: fix bad argument in unix_domain_find
"After merging the nfsd tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

net/sunrpc/svcauth_unix.c: In function 'unix_domain_find':
net/sunrpc/svcauth_unix.c:58: warning: passing argument 1 of
+'svcauth_unix_domain_release' from incompatible pointer type
net/sunrpc/svcauth_unix.c:41: note: expected 'struct auth_domain *' but
argument
+is of type 'struct unix_domain *'

Introduced by commit 8b3e07ac90 ("svcrpc: fix rare race on unix_domain
creation")."

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-09 22:40:30 -05:00
J. Bruce Fields 8b3e07ac90 svcrpc: fix rare race on unix_domain creation
Note that "new" here is not yet fully initialized; auth_domain_put
should be called only on auth_domains that have actually been added to
the hash.

Before this fix, two attempts to add the same domain at once could
cause the hlist_del in auth_domain_put to fail.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-08 11:51:29 -05:00
Kevin Coffman 540c8cb6a5 gss:krb5 only include enctype numbers in gm_upcall_enctypes
Make the value in gm_upcall_enctypes just the enctype values.
This allows the values to be used more easily elsewhere.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:06:48 -05:00
Eric Dumazet eaefd1105b net: add __rcu annotations to sk_wq and wq
Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:19:31 -08:00
Tejun Heo 43d133c18b Merge branch 'master' into for-2.6.39 2011-02-21 09:43:56 +01:00
Andy Adamson 778be232a2 NFS do not find client in NFSv4 pg_authenticate
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Tejun Heo ada609ee2a workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
WQ_RESCUER is now an internal flag and should only be used in the
workqueue implementation proper.  Use WQ_MEM_RECLAIM instead.

This doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
2011-01-25 14:35:54 +01:00
Linus Torvalds 18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
Linus Torvalds b9d919a4ac Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
  NFS fix the setting of exchange id flag
  NFS: Don't use vm_map_ram() in readdir
  NFSv4: Ensure continued open and lockowner name uniqueness
  NFS: Move cl_delegations to the nfs_server struct
  NFS: Introduce nfs_detach_delegations()
  NFS: Move cl_state_owners and related fields to the nfs_server struct
  NFS: Allow walking nfs_client.cl_superblocks list outside client.c
  pnfs: layout roc code
  pnfs: update nfs4_callback_recallany to handle layouts
  pnfs: add CB_LAYOUTRECALL handling
  pnfs: CB_LAYOUTRECALL xdr code
  pnfs: change lo refcounting to atomic_t
  pnfs: check that partial LAYOUTGET return is ignored
  pnfs: add layout to client list before sending rpc
  pnfs: serialize LAYOUTGET(openstateid)
  pnfs: layoutget rpc code cleanup
  pnfs: change how lsegs are removed from layout list
  pnfs: change layout state seqlock to a spinlock
  pnfs: add prefix to struct pnfs_layout_hdr fields
  pnfs: add prefix to struct pnfs_layout_segment fields
  ...
2011-01-11 15:11:56 -08:00
J. Bruce Fields f0418aa4b1 rpc: allow xprt_class->setup to return a preexisting xprt
This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields 99de8ea962 rpc: keep backchannel xprt as long as server connection
Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

	A connection's association with a session is not exclusive.  A
	connection associated with the channel(s) of one session may be
	simultaneously associated with the channel(s) of other sessions
	including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields d75faea330 rpc: move sk_bc_xprt to svc_xprt
This seems obviously transport-level information even if it's currently
used only by the server socket code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
Trond Myklebust 68c404b18f Merge branch 'bugfixes' into nfs-for-2.6.38
Conflicts:
	fs/nfs/nfs2xdr.c
	fs/nfs/nfs3xdr.c
	fs/nfs/nfs4xdr.c
2011-01-10 14:48:02 -05:00
Trond Myklebust 6650239a4b NFS: Don't use vm_map_ram() in readdir
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]
2011-01-10 14:45:01 -05:00
Linus Torvalds 23d69b09b7 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
  usb: don't use flush_scheduled_work()
  speedtch: don't abuse struct delayed_work
  media/video: don't use flush_scheduled_work()
  media/video: explicitly flush request_module work
  ioc4: use static work_struct for ioc4_load_modules()
  init: don't call flush_scheduled_work() from do_initcalls()
  s390: don't use flush_scheduled_work()
  rtc: don't use flush_scheduled_work()
  mmc: update workqueue usages
  mfd: update workqueue usages
  dvb: don't use flush_scheduled_work()
  leds-wm8350: don't use flush_scheduled_work()
  mISDN: don't use flush_scheduled_work()
  macintosh/ams: don't use flush_scheduled_work()
  vmwgfx: don't use flush_scheduled_work()
  tpm: don't use flush_scheduled_work()
  sonypi: don't use flush_scheduled_work()
  hvsi: don't use flush_scheduled_work()
  xen: don't use flush_scheduled_work()
  gdrom: don't use flush_scheduled_work()
  ...

Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
as per Tejun.
2011-01-07 16:58:04 -08:00
Linus Torvalds b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Andy Adamson 4a19de0f4b NFS rename client back channel transport field
Differentiate from server backchannel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:25 -05:00
Andy Adamson 2c2618c6f2 NFS associate sessionid with callback connection
The sessions based callback service is started prior to the CREATE_SESSION call
so that it can handle CB_NULL requests which can be sent before the
CREATE_SESSION call returns and the session ID is known.

Set the callback sessionid after a sucessful CREATE_SESSION.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson 16b2d1e1d1 SUNRPC register and unregister the back channel transport
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson 1f11a034cd SUNRPC new transport for the NFSv4.1 shared back channel
Move the current sock create and destroy routines into the new transport ops.
Back channel socket will be destroyed by the svc_closs_all call in svc_destroy.

Added check: only TCP supported on shared back channel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson 71e161a6a9 SUNRPC fix bc_send print
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson 4b5b3ba16b SUNRPC move svc_drop to caller of svc_process_common
The NFSv4.1 shared back channel does not need to call svc_drop because the
callback service never outlives the single connection it services, and it
reuses it's buffers and keeps the trasport.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
J. Bruce Fields fdef7aa5d4 svcrpc: ensure cache_check caller sees updated entry
Supposes cache_check runs simultaneously with an update on a different
CPU:

	cache_check			task doing update
	^^^^^^^^^^^			^^^^^^^^^^^^^^^^^

	1. test for CACHE_VALID		1'. set entry->data
	   & !CACHE_NEGATIVE

	2. use entry->data		2'. set CACHE_VALID

If the two memory writes performed in step 1' and 2' appear misordered
with respect to the reads in step 1 and 2, then the caller could get
stale data at step 2 even though it saw CACHE_VALID set on the cache
entry.

Add memory barriers to prevent this.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:25 -05:00
J. Bruce Fields 6bab93f87e svcrpc: take lock on turning entry NEGATIVE in cache_check
We attempt to turn a cache entry negative in place.  But that entry may
already have been filled in by some other task since we last checked
whether it was valid, so we could be modifying an already-valid entry.
If nothing else there's a likely leak in such a case when the entry is
eventually put() and contents are not freed because it has
CACHE_NEGATIVE set.

So, take the cache_lock just as sunrpc_cache_update() does.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:24 -05:00
J. Bruce Fields 9e701c6109 svcrpc: simpler request dropping
Currently we use -EAGAIN returns to determine when to drop a deferred
request.  On its own, that is error-prone, as it makes us treat -EAGAIN
returns from other functions specially to prevent inadvertent dropping.

So, use a flag on the request instead.

Returning an error on request deferral is still required, to prevent
further processing, but we no longer need worry that an error return on
its own could result in a drop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:22 -05:00
J. Bruce Fields d76d1815f3 svcrpc: avoid double reply caused by deferral race
Commit d29068c431 "sunrpc: Simplify cache_defer_req and related
functions." asserted that cache_check() could determine success or
failure of cache_defer_req() by checking the CACHE_PENDING bit.

This isn't quite right.

We need to know whether cache_defer_req() created a deferred request,
in which case sending an rpc reply has become the responsibility of the
deferred request, and it is important that we not send our own reply,
resulting in two different replies to the same request.

And the CACHE_PENDING bit doesn't tell us that; we could have
succesfully created a deferred request at the same time as another
thread cleared the CACHE_PENDING bit.

So, partially revert that commit, to ensure that cache_check() returns
-EAGAIN if and only if a deferred request has been created.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
2011-01-04 16:49:21 -05:00
J. Bruce Fields bdd5f05d91 SUNRPC: Remove more code when NFSD_DEPRECATED is not configured
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields@redhat.com: moved svcauth_unix_purge outside ifdef's.]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:48:02 -05:00
J. Bruce Fields 31f7aa65f5 svcrpc: modifying valid sunrpc cache entries is racy
Once a sunrpc cache entry is VALID, we should be replacing it (and
allowing any concurrent users to destroy it on last put) instead of
trying to update it in place.

Otherwise someone referencing the ip_map we're modifying here could try
to use the m_client just as we're putting the last reference.

The bug should only be seen by users of the legacy nfsd interfaces.

(Thanks to Neil for suggestion to use sunrpc_invalidate.)

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:47:29 -05:00
Trond Myklebust beb0f0a9fb kernel panic when mount NFSv4
On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
>   rpc_mkpipe
>     __rpc_lookup_create()          <=== find pipefile *idmap*
>     __rpc_mkpipe()                 <=== pipefile is *idmap*
>       __rpc_create_common()
>        ******  BUG_ON(!d_unhashed(dentry)); ******    *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot       *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> 	((LOOPCOUNT += 1))
> 	service nfs restart
> 	/usr/sbin/rpc.idmapd
> 	mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> 	ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> 	umount /mnt
> 	echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G     D   -------------------2.6.32 #1
> Call Trace:
>  [<c080ad18>] ? panic+0x42/0xed
>  [<c080e42c>] ? oops_end+0xbc/0xd0
>  [<c040b090>] ? do_invalid_op+0x0/0x90
>  [<c040b10f>] ? do_invalid_op+0x7f/0x90
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
>  [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
>  [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
>  [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
>  [<c080d80b>] ? error_code+0x73/0x78
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<c0532bda>] ? d_lookup+0x2a/0x40
>  [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
>  [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
>  [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
>  [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
>  [<c05e76aa>] ? strlcpy+0x3a/0x60
>  [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
>  [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
>  [<c04f1400>] ? krealloc+0x40/0x50
>  [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
>  [<c04f14ec>] ? kstrdup+0x3c/0x60
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
>  [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
>  [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
>  [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<c05366d2>] ? get_fs_type+0x32/0xb0
>  [<c052089f>] ? do_kern_mount+0x3f/0xe0
>  [<c053954f>] ? do_mount+0x2ef/0x740
>  [<c0537740>] ? copy_mount_options+0xb0/0x120
>  [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
  Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
  rpc_mkpipe
      __rpc_lookup_create()          <=== find pipefile *idmap*
      __rpc_mkpipe()                 <=== pipefile is *idmap*
        __rpc_create_common()
         ******  BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:38 -05:00
David S. Miller 17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Joe Perches b3bcedadf2 net/sunrpc/clnt.c: Convert sprintf_symbol to %ps
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:26 -05:00
Joe Perches f3c0ceea83 net/sunrpc/auth_gss/gss_krb5_crypto.c: Use normal negative error value return
And remove unnecessary double semicolon too.

No effect to code, as test is != 0.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:22 -05:00
Shan Wei 66c941f4aa net: sunrpc: kill unused macros
These macros never be used for several years.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:21 -05:00
NeilBrown 3942302ea9 sunrpc: svc_sock_names should hold ref to socket being closed.
Currently svc_sock_names calls svc_close_xprt on a svc_sock to
which it does not own a reference.
As soon as svc_close_xprt sets XPT_CLOSE, the socket could be
freed by a separate thread (though this is a very unlikely race).

It is safer to hold a reference while calling svc_close_xprt.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:19 -05:00
NeilBrown 7c96aef759 sunrpc: remove xpt_pool
The xpt_pool field is only used for reporting BUGs.
And it isn't used correctly.

In particular, when it is cleared in svc_xprt_received before
XPT_BUSY is cleared, there is no guarantee that either the
compiler or the CPU might not re-order to two assignments, just
setting xpt_pool to NULL after XPT_BUSY is cleared.

If a different cpu were running svc_xprt_enqueue at this moment,
it might see XPT_BUSY clear and then xpt_pool non-NULL, and
so BUG.

This could be fixed by calling
  smp_mb__before_clear_bit()
before the clear_bit.  However as xpt_pool isn't really used,
it seems safest to simply remove xpt_pool.

Another alternate would be to change the clear_bit to
clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:18 -05:00
J. Bruce Fields ec66ee3797 Merge commit 'v2.6.37-rc6' into for-2.6.38 2010-12-17 13:29:07 -05:00
Chuck Lever bf2695516d SUNRPC: New xdr_streams XDR decoder API
Now that all client-side XDR decoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC res *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each decoder function.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever 9f06c719f4 SUNRPC: New xdr_streams XDR encoder API
Now that all client-side XDR encoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC arg *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each encoder function.

Also, all the client-side encoder functions return 0 now, making a
return value superfluous.  Take this opportunity to convert them to
return void instead.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever 1ac7c23e4a SUNRPC: Determine value of "nrprocs" automatically
Clean up.

Just fixed a panic where the nrprocs field in a different upper layer
client was set by hand incorrectly.  Use the compiler-generated method
used by the other upper layer protocols.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever 4129ccf303 SUNRPC: Avoid return code checking in rpcbind XDR encoder functions
Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the rpcbind XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Tejun Heo afe2c511fb workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync()
cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago.  Convert all the
in-kernel users.  The conversions are completely equivalent and
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
2010-12-15 10:56:11 +01:00
NeilBrown ed2849d3ec sunrpc: prevent use-after-free on clearing XPT_BUSY
When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
The refcount is *not* owned by the thread that created the xprt
(as is clear from the fact that creators never put the reference).
Rather, it is owned by the absence of XPT_DEAD.  Once XPT_DEAD is set,
(And XPT_BUSY is clear) that initial reference is dropped and the xprt
can be freed.

So when a creator clears XPT_BUSY it is dropping its only reference and
so must not touch the xprt again.

However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
reference on a new xprt), calls svc_xprt_recieved.  This clears
XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
This is dangerous and has been seen to leave svc_xprt_enqueue working
with an xprt containing garbage.

So we need to hold an extra counted reference over that call to
svc_xprt_received.

For safety, any time we clear XPT_BUSY and then use the xprt again, we
first get a reference, and the put it again afterwards.

Note that svc_close_all does not need this extra protection as there are
no threads running, and the final free can only be called asynchronously
from such a thread.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-07 20:39:55 -05:00
Trond Myklebust 5fc43978a7 SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
If the rpcauth_refreshcred() call returns an error other than
EACCES, ENOMEM or ETIMEDOUT, we currently end up looping forever
between call_refresh and call_refreshresult.

The correct thing to do here is to exit on all errors except
EAGAIN and ETIMEDOUT, for which case we retry 3 times, then
return EACCES.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:22:39 -05:00
Tracey Dent fdb26195f4 Net: sunrpc: auth_gss: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:16 -08:00
J. Bruce Fields 9c335c0b8d svcrpc: fix wspace-checking race
We call svc_xprt_enqueue() after something happens which we think may
require handling from a server thread.  To avoid such events being lost,
svc_xprt_enqueue() must guarantee that there will be a svc_serv() call
from a server thread following any such event.  It does that by either
waking up a server thread itself, or checking that XPT_BUSY is set (in
which case somebody else is doing it).

But the check of XPT_BUSY could occur just as someone finishes
processing some other event, and just before they clear XPT_BUSY.

Therefore it's important not to clear XPT_BUSY without subsequently
doing another svc_export_enqueue() to check whether the xprt should be
requeued.

The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing
an event to be missed in situations like:

	data arrives
	call svc_tcp_data_ready():
	call svc_xprt_enqueue():
	set BUSY
	find no write space
				svc_reserve():
				free up write space
				call svc_enqueue():
				test BUSY
	clear BUSY

So, instead, check wspace in the same places that the state flags are
checked: before taking BUSY, and in svc_receive().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:12 -05:00