Commit Graph

3038 Commits

Author SHA1 Message Date
Linus Torvalds 8e7757d83d NFS client updates for Linux 4.14
Hightlights include:
 
 Stable bugfixes:
 - Fix mirror allocation in the writeback code to avoid a use after free
 - Fix the O_DSYNC writes to use the correct byte range
 - Fix 2 use after free issues in the I/O code
 
 Features:
 - Writeback fixes to split up the inode->i_lock in order to reduce contention
 - RPC client receive fixes to reduce the amount of time the
   xprt->transport_lock is held when receiving data from a socket into am
   XDR buffer.
 - Ditto fixes to reduce contention between call side users of the rdma
   rb_lock, and its use in rpcrdma_reply_handler.
 - Re-arrange rdma stats to reduce false cacheline sharing.
 - Various rdma cleanups and optimisations.
 - Refactor the NFSv4.1 exchange id code and clean up the code.
 - Const-ify all instances of struct rpc_xprt_ops
 
 Bugfixes:
 - Fix the NFSv2 'sec=' mount option.
 - NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
 - Fix the NFSv3 GRANT callback when the port changes on the server.
 - Fix livelock issues with COMMIT
 - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when doing
   and NFSv4.1 open by filehandle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZtbvIAAoJEGcL54qWCgDy/boP/jRuVk6B2VyhWnJkOgdQzIN3
 Q8PIR0oxkywH2MI7c9/G2k5b/HD9BK2iQrXzIoPxRuPrckKLwzqYclzG8PR4Niyg
 D3CCzrvGcEXZrv/nHQ+HDMD0ZuUyXFqhrYeyQwNSJ9p/oP0gaxnYwteennfJVa99
 mv6+LdoY+lzVYJI1gmMHVF2zOhN+rTe7xUVnjYnsVCpwMvL+u992oZl3qQJRFG6b
 HlXOy7h5JRFyue61P20PSgh9D1JUWWYD/V0EG+7cIvByAg5KxhvVgjqSsTTT7FXe
 Omn4fTv1MFzk8er9qYFRjpM2IoIdAejFMqX3/PxQVr2qOFNmHYrq+WsdWNQEr/Wu
 WREJu5Ac1Hboe2/scA+DtuVPFePPPyrolhwk533aNWrdDywg01e0XqBEDKR/atJd
 u5lvW20UfLQuCFLOpaxDpq2ngQSOg6t96N36tsydG0SAVpiydOPMLqkQi7Nb3aoB
 79xGpmtnijP5T6jnOI2/nexM08OMTI0BhMbXJC5v1+lnxIJKcKdnGlTM4UJyxUMq
 /3dFI4IQZLfkMEjIvZFoi+nKWx3DYhiUhkKhbBYwtB4P4q8Z2qKTPHFxORz9griZ
 Pa+8BPuDuodIWuDD97q1Dnw2NWjQim8Rx/ce4c8FHGzwMJLPkcVqk+guGsub5IdO
 7qF7Vvv02gJ48TAqTBDf
 =1Ssl
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Hightlights include:

  Stable bugfixes:
   - Fix mirror allocation in the writeback code to avoid a use after
     free
   - Fix the O_DSYNC writes to use the correct byte range
   - Fix 2 use after free issues in the I/O code

  Features:
   - Writeback fixes to split up the inode->i_lock in order to reduce
     contention
   - RPC client receive fixes to reduce the amount of time the
     xprt->transport_lock is held when receiving data from a socket into
     am XDR buffer.
   - Ditto fixes to reduce contention between call side users of the
     rdma rb_lock, and its use in rpcrdma_reply_handler.
   - Re-arrange rdma stats to reduce false cacheline sharing.
   - Various rdma cleanups and optimisations.
   - Refactor the NFSv4.1 exchange id code and clean up the code.
   - Const-ify all instances of struct rpc_xprt_ops

  Bugfixes:
   - Fix the NFSv2 'sec=' mount option.
   - NFSv4.1: don't use machine credentials for CLOSE when using
     'sec=sys'
   - Fix the NFSv3 GRANT callback when the port changes on the server.
   - Fix livelock issues with COMMIT
   - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when
     doing and NFSv4.1 open by filehandle"

* tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
  NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()
  NFS: Don't hold the group lock when calling nfs_release_request()
  NFS: Remove pnfs_generic_transfer_commit_list()
  NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock
  NFS: Fix 2 use after free issues in the I/O code
  NFS: Sync the correct byte range during synchronous writes
  lockd: Delete an error message for a failed memory allocation in reclaimer()
  NFS: remove jiffies field from access cache
  NFS: flush data when locking a file to ensure cache coherence for mmap.
  SUNRPC: remove some dead code.
  NFS: don't expect errors from mempool_alloc().
  xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler
  xprtrdma: Re-arrange struct rx_stats
  NFS: Fix NFSv2 security settings
  NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'
  SUNRPC: ECONNREFUSED should cause a rebind.
  NFS: Remove unused parameter gfp_flags from nfs_pageio_init()
  NFSv4: Fix up mirror allocation
  SUNRPC: Add a separate spinlock to protect the RPC request receive list
  SUNRPC: Cleanup xs_tcp_read_common()
  ...
2017-09-11 22:01:44 -07:00
NeilBrown f1ecbc21eb SUNRPC: remove some dead code.
RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo
is set, which happens when  RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set,
which happens when NFS_CS_NO_RETRANS_TIMEOUT is set.

This flag means "don't resend on a timeout, only resend if the
connection gets broken for some reason".

cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which
happens when NFS_CS_DISCRTRY is set.

This flag means "always disconnect before resending".

NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set
in nfs4_init_client(), and it always sets both.

So we will never have a situation where only one of the flags is set.
So this code, which tests if timeout retransmits are allowed, and
disconnection is required, will never run.

So it makes sense to remove this code as it cannot be tested and
could confuse people reading the code (like me).

(alternately we could leave it there with a comment saying
 it is never actually used).

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06 12:31:15 -04:00
Chuck Lever 9590d083c1 xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler
Adopt the use of xprt_pin_rqst to eliminate contention between
Call-side users of rb_lock and the use of rb_lock in
rpcrdma_reply_handler.

This replaces the mechanism introduced in 431af645cf ("xprtrdma:
Fix client lock-up after application signal fires").

Use recv_lock to quickly find the completing rqst, pin it, then
drop the lock. At that point invalidation and pull-up of the Reply
XDR can be done. Both are often expensive operations.

Finally, take recv_lock again to signal completion to the RPC
layer. It also protects adjustment of "cwnd".

This greatly reduces the amount of time a lock is held by the
reply handler. Comparing lock_stat results shows a marked decrease
in contention on rb_lock and recv_lock.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[trond.myklebust@primarydata.com: Remove call to rpcrdma_buffer_put() from
   the "out_norqst:" path in rpcrdma_reply_handler.]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-05 18:27:07 -04:00
Trond Myklebust f9773b22a2 NFS-over-RDMA client updates for Linux 4.14
Bugfixes and cleanups:
 - Constify rpc_xprt_ops
 - Harden RPC call encoding and decoding
 - Clean up rpc call decoding to use xdr_streams
 - Remove unused variables from various structures
 - Refactor code to remove imul instructions
 - Rearrange rx_stats structure for better cacheline sharing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlmgfA4ACgkQ18tUv7Cl
 QOsbXBAAnNaCWwerMGi7IbPcvA8aIQLcaruVUVuI2HIUdwb0At3EBakLJr5vFong
 IbUPEegi2F7Dm8gwwQ8Ntb0gqGER1mHr0Bd4tcls+cNxwKNpRad/cv8ZjN4AMVpz
 Kf1ZQOSDoRyJxwnAaRTYsU302tkWQFHrBjpCXpvgI3uoQ7kJwC1sZpXH6qN+r9E3
 hFlkzZJ6gkZE3Rx3XsQqjl+TFZ3amd9Yl1AjzND622oLItmcJiRoptCVz8jYEFBJ
 uYvg22jbZWIrI66pPXnX+TuDfkbA6nFuSqJma0VLZAyTGKtRzJpaExvSJuuMqLm1
 ZuWgWXIO3Kvvyx4gTvRFq06TAlunjOHlxb+39Yr41w2LLcDitvTmv2t/o8+BcVCp
 fkaziwZIqkfXoE4+3SGRC0s+R5obtgjAiTlAPTwno9p8T7jC+x43fdPF9l5jgAs+
 0jtl1d+whQK0yGITq7zwbLimLxxz12f8S9JH6U4umkL/A458ApRVuUQfoCHzl4wk
 ZPG1DGZjPBClM3R//XfUargfs/uM2FO6u0Z4+mxxdyJAHrdExczDC6OE9lLG9hnR
 KQEa7PVDjQZssNHOY0Nu3QaTpBoVxmN6xiDMTtXdf+ltd2m/ja18lER3tB9IwpXD
 +RqIJ8aFat3oP76tZ8CNJ7LiRORzmqDTcfjWkpCDPK259OK7FFU=
 =fdZG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-4.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next

NFS-over-RDMA client updates for Linux 4.14

Bugfixes and cleanups:
- Constify rpc_xprt_ops
- Harden RPC call encoding and decoding
- Clean up rpc call decoding to use xdr_streams
- Remove unused variables from various structures
- Refactor code to remove imul instructions
- Rearrange rx_stats structure for better cacheline sharing
2017-09-05 15:16:04 -04:00
Chuck Lever 26fb2254dd svcrdma: Estimate Send Queue depth properly
The rdma_rw API adjusts max_send_wr upwards during the
rdma_create_qp() call. If the ULP actually wants to take advantage
of these extra resources, it must increase the size of its send
completion queue (created before rdma_create_qp is called) and
increase its send queue accounting limit.

Use the new rdma_rw_mr_factor API to figure out the correct value
to use for the Send Queue and Send Completion Queue depths.

And, ensure that the chosen Send Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Lastly, there's no longer a need to carry the Send Queue depth in
struct svcxprt_rdma, since the value is used only in the
svc_rdma_accept() path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05 15:15:31 -04:00
Chuck Lever 5a25bfd28c svcrdma: Limit RQ depth
Ensure that the chosen Receive Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05 15:15:30 -04:00
Chuck Lever 193bcb7b37 svcrdma: Populate tail iovec when receiving
So that NFS WRITE payloads can eventually be placed directly into a
file's page cache, enable the RPC-over-RDMA transport to present
these payloads in the xdr_buf's page list, while placing trailing
content (such as a GETATTR operation) in the xdr_buf's tail.

After this change, the RPC-over-RDMA's "copy tail" hack, added by
commit a97c331f9a ("svcrdma: Handle additional inline content"),
is no longer needed and can be removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05 15:15:29 -04:00
J. Bruce Fields 0828170f3d merge nfsd 4.13 bugfixes into nfsd for-4.14 branch 2017-09-05 15:11:47 -04:00
Chuck Lever 7075a867ce svcrdma: Clean up svc_rdma_build_read_chunk()
Dan Carpenter <dan.carpenter@oracle.com> observed that the while()
loop in svc_rdma_build_read_chunk() does not document the assumption
that the loop interior is always executed at least once.

Defensive: the function now returns -EINVAL if this assumption
fails.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 22:13:50 -04:00
Chuck Lever afea5657c2 sunrpc: Const-ify struct sv_serv_ops
Close an attack vector by moving the arrays of per-server methods to
read-only memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 22:13:50 -04:00
Chuck Lever 2412e92760 sunrpc: Const-ify instances of struct svc_xprt_ops
Close an attack vector by moving the arrays of server-side transport
methods to read-only memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 22:13:50 -04:00
Vadim Lomovtsev eebe53e87f net: sunrpc: svcsock: fix NULL-pointer exception
While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24 18:11:28 -04:00
Chuck Lever 67af6f652f xprtrdma: Re-arrange struct rx_stats
To reduce false cacheline sharing, separate counters that are likely
to be accessed in the Call path from those accessed in the Reply
path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-22 16:19:32 -04:00
Trond Myklebust 7af7a5963c Merge branch 'bugfixes' 2017-08-20 13:04:12 -04:00
NeilBrown fd01b25979 SUNRPC: ECONNREFUSED should cause a rebind.
If you
 - mount and NFSv3 filesystem
 - do some file locking which requires the server
   to make a GRANT call back
 - unmount
 - mount again and do the same locking

then the second attempt at locking suffers a 30 second delay.
Unmounting and remounting causes lockd to stop and restart,
which causes it to bind to a new port.
The server still thinks the old port is valid and gets ECONNREFUSED
when trying to contact it.
ECONNREFUSED should be seen as a hard error that is not worth
retrying.  Rebinding is the only reasonable response.

This patch forces a rebind if that makes sense.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-20 12:39:28 -04:00
Trond Myklebust ce7c252a8c SUNRPC: Add a separate spinlock to protect the RPC request receive list
This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-18 14:45:04 -04:00
Trond Myklebust 040249dfbe SUNRPC: Cleanup xs_tcp_read_common()
Simplify the code to avoid a full copy of the struct xdr_skb_reader.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-16 15:10:17 -04:00
Trond Myklebust 8d6f97d698 SUNRPC: Don't loop forever in xs_tcp_data_receive()
Ensure that we don't hog the workqueue thread by requeuing the job
every 64 loops.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-16 15:10:16 -04:00
Trond Myklebust c89091c88d SUNRPC: Don't hold the transport lock when receiving backchannel data
The backchannel request has no associated task, so it is going nowhere
until we call xprt_complete_bc_request().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-16 15:10:16 -04:00
Trond Myklebust 729749bb8d SUNRPC: Don't hold the transport lock across socket copy operations
Instead add a mechanism to ensure that the request doesn't disappear
from underneath us while copying from the socket. We do this by
preventing xprt_release() from freeing the XDR buffers until the
flag RPC_TASK_MSG_RECV has been cleared from the request.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2017-08-16 15:10:15 -04:00
Chuck Lever 6748b0caf8 xprtrdma: Remove imul instructions from chunk list encoders
Re-arrange the pointer arithmetic in the chunk list encoders to
eliminate several more integer multiplication instructions during
Transport Header encoding.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-15 14:01:50 -04:00
Chuck Lever 28d9d56f4c xprtrdma: Remove imul instructions from rpcrdma_convert_iovs()
Re-arrange the pointer arithmetic in rpcrdma_convert_iovs() to
eliminate several integer multiplication instructions during
Transport Header encoding.

Also, array overflow does not occur outside development
environments, so replace overflow checking with one spot check
at the end. This reduces the number of conditional branches in
the common case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-15 13:37:38 -04:00
Chuck Lever 7ec910e78d xprtrdma: Clean up rpcrdma_bc_marshal_reply()
Same changes as in rpcrdma_marshal_req(). This removes
C-structure style encoding from the backchannel.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 13:20:08 -04:00
Chuck Lever 39f4cd9e99 xprtrdma: Harden chunk list encoding against send buffer overflow
While marshaling chunk lists which are variable-length XDR objects,
check for XDR buffer overflow at every step. Measurements show no
significant changes in CPU utilization.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 13:20:08 -04:00
Chuck Lever 7a80f3f0dd xprtrdma: Set up an xdr_stream in rpcrdma_marshal_req()
Initialize an xdr_stream at the top of rpcrdma_marshal_req(), and
use it to encode the fixed transport header fields. This xdr_stream
will be used to encode the chunk lists in a subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 13:20:08 -04:00
Chuck Lever f4a2805e7d xprtrdma: Remove rpclen from rpcrdma_marshal_req
Clean up: Remove a variable whose result is no longer used.
Commit 655fec6987 ("xprtrdma: Use gathered Send for large inline
messages") should have removed it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 13:20:08 -04:00
Chuck Lever 09e60641fc xprtrdma: Clean up rpcrdma_marshal_req() synopsis
Clean up: The caller already has rpcrdma_xprt, so pass that directly
instead. And provide a documenting comment for this critical
function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11 13:20:08 -04:00
Chuck Lever c1bcb68e39 xprtrdma: Clean up XDR decoding in rpcrdma_update_granted_credits()
Clean up: Replace C-structure based XDR decoding for consistency
with other areas.

struct rpcrdma_rep is rearranged slightly so that the relevant fields
are in cache when the Receive completion handler is invoked.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:01 -04:00
Chuck Lever e2a6719041 xprtrdma: Remove rpcrdma_rep::rr_len
This field is no longer used outside the Receive completion handler.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:01 -04:00
Chuck Lever fdf503e302 xprtrdma: Remove opcode check in Receive completion handler
Clean up: The opcode check is no longer necessary, because since
commit 2fa8f88d88 ("xprtrdma: Use new CQ API for RPC-over-RDMA
client send CQs"), this completion handler is invoked only for
RECV work requests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:00 -04:00
Chuck Lever 264b0cdbcb xprtrdma: Replace rpcrdma_count_chunks()
Clean up chunk list decoding by using the xdr_stream set up in
rpcrdma_reply_handler. This hardens decoding by checking for buffer
overflow at every step while unmarshaling variable-length XDR
objects.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:00 -04:00
Chuck Lever 07ff2dd510 xprtrdma: Refactor rpcrdma_reply_handler()
Refactor the reply handler's transport header decoding logic to make
it easier to understand and update.

Convert some of the handler to use xdr_streams, which will enable
stricter validation of input data and enable the eventual addition
of support for new combinations of chunks, such as "Write + Reply"
or "PZRC + normal Read".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:00 -04:00
Chuck Lever 41c8f70f5a xprtrdma: Harden backchannel call decoding
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:00 -04:00
Chuck Lever 96f8778f70 xprtrdma: Add xdr_init_decode to rpcrdma_reply_handler()
Transport header decoding deals with untrusted input data, therefore
decoding this header needs to be hardened.

Adopt the same infrastructure that is used when XDR decoding NFS
replies. This is slightly more CPU-intensive than the replaced code,
but we're not adding new atomics, locking, or context switches. The
cost is manageable.

Start by initializing an xdr_stream in rpcrdma_reply_handler().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08 10:52:00 -04:00
Chuck Lever d31ae25481 sunrpc: Const-ify all instances of struct rpc_xprt_ops
After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-01 16:10:35 -04:00
Linus Torvalds 505d5c1119 NFS client bugfixes for 4.13
Stable bugfixes:
 - Fix error reporting regression
 
 Bugfixes:
 - Fix setting filelayout ds address race
 - Fix subtle access bug when using ACLs
 - Fix setting mnt3_counts array size
 - Fix a couple of pNFS commit races
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAllyVYYACgkQ18tUv7Cl
 QOtTwBAA8ek0Ba5wIfwlQe4MvIX1t6v7q7Otrwdombhuw1a6nW410hFwu2SG8kGd
 fQWr1xnqRsMKwu83UuImtlYYvMa271TZgXxPNWx7KJpi/zocnigJRg5sVpkcRqza
 AjjPc245pHCooQoaTAlm5WrO9zDm0s7lRGuTB1cvPRsElnFVWT0jwhTASIDOU1Zy
 9K/hElAnXp/dZv/ydjSePTPPsVQPJWLbJPlSm6vAIQPyeXWUeCgAym0yf2FVvtsd
 AQeozaq9xq6tofVPZhsYWeBKswjTHs3FxL8vhDOF9gF3QMPm43mwsfrKVidWo4vW
 0UJnRZRCFgG0WIxxhA7l1Z9MovAsXlbWmFufgCa4Ev/bC5WuUT4ZEkjBGJw2vXD4
 0/lxkhD41PBhCl/LIod9OT6iJ8koifl50JUC4N67D2illFy9a7Btzx3EPxfDz9uG
 6jEek9x6B1xf7AC4HhxByN/E8gKX08N4Q/afxTFuAwrzKRKqI4Me3qbCyU86bp+T
 wiAxgVPVnmb/VBVLU68i7titdsA6U8ZO12FFqu9QOr9wHMXxa6108h2Nia9jVTdk
 EZhanXa6tJThQQ/QZicuR5hTtoM4BuikaaJSHZ4ODbgrAjsMAyqy4qsf4tf3LLAo
 tEDu5sJDmHJhhdtqzuz+OtDn5iJ2Nga6a4/fMwt9YU2ZQBm7X/8=
 =ZcQv
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfix:
   - Fix error reporting regression

  Bugfixes:
   - Fix setting filelayout ds address race
   - Fix subtle access bug when using ACLs
   - Fix setting mnt3_counts array size
   - Fix a couple of pNFS commit races"

* tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
  NFS: Be more careful about mapping file permissions
  NFS: Store the raw NFS access mask in the inode's access cache
  NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
  NFS: Refactor NFS access to kernel access mask calculation
  net/sunrpc/xprt_sock: fix regression in connection error reporting.
  nfs: count correct array for mnt3_counts array size
  Revert commit 722f0b8911 ("pNFS: Don't send COMMITs to the DSes if...")
  pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
  NFS: Fix another COMMIT race in pNFS
  NFS: Fix a COMMIT race in pNFS
  mount: copy the port field into the cloned nfs_server structure.
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations
2017-07-21 16:26:01 -07:00
NeilBrown 3ffbc1d655 net/sunrpc/xprt_sock: fix regression in connection error reporting.
Commit 3d4762639d ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.

This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.

As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying.  If it saw the ECONNREFUSED, it would abort.

To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().

Fixes: 3d4762639d ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-21 08:49:58 -04:00
Linus Torvalds b86faee6d1 NFS client updates for Linux 4.13
Stable bugfixes:
 - Fix -EACCESS on commit to DS handling
 - Fix initialization of nfs_page_array->npages
 - Only invalidate dentries that are actually invalid
 
 Features:
 - Enable NFSoRDMA transparent state migration
 - Add support for lookup-by-filehandle
 - Add support for nfs re-exporting
 
 Other bugfixes and cleanups:
 - Christoph cleaned up the way we declare NFS operations
 - Clean up various internal structures
 - Various cleanups to commits
 - Various improvements to error handling
 - Set the dt_type of . and .. entries in NFS v4
 - Make slot allocation more reliable
 - Fix fscache stat printing
 - Fix uninitialized variable warnings
 - Fix potential list overrun in nfs_atomic_open()
 - Fix a race in NFSoRDMA RPC reply handler
 - Fix return size for nfs42_proc_copy()
 - Fix against MAC forgery timing attacks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlln4jEACgkQ18tUv7Cl
 QOv2ZxAAwbQN9Dtx4rOZmPe0Xszua23sNN0ja891PodkCjIiZrRelZhLIBAf1rfP
 uSR+jTD8EsBHGt3bzTXg2DHz+o8cGDZuH+uuZX+wRWJPQcKA2pC7zElqnse8nmn5
 4Z1UUdzf42vE4NZ/G1ucqpEiAmOqGJ3s7pCRLLXPvOSSQXqOhiomNDAcGxX05FIv
 Ly4Kr6RIfg/O4oNOZBuuL/tZHodeyOj1vbyjt/4bDQ5MEXlUQfcjJZEsz/2EcNh6
 rAgbquxr1pGCD072pPBwYNH2vLGbgNN41KDDMGI0clp+8p6EhV6BOlgcEoGtZM86
 c0yro2oBOB2vPCv9nGr6JgTOHPKG6ksJ7vWVXrtQEjBGP82AbFfAawLgqZ6Ae8dP
 Sqpx55j4xdm4nyNglCuhq5PlPAogARq/eibR+RbY973Lhzr5bZb3XqlairCkNNEv
 4RbTlxbWjhgrKJ56jVf+KpUDJAVG5viKMD7YDx/bOfLtvPwALbozD7ONrunz5v43
 PgQEvWvVtnQAKp27pqHemTsLFhU6M6eGUEctRnAfB/0ogWZh1X8QXgulpDlqG3kb
 g12kr5hfA0pSfcB0aGXVzJNnHKfW3IY3WBWtxq4xaMY22YkHtuB+78+9/yk3jCAi
 dvimjT2Ko9fE9MnltJ/hC5BU+T+xUxg+1vfwWnKMvMH8SIqjyu4=
 =OpLj
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Fix -EACCESS on commit to DS handling
   - Fix initialization of nfs_page_array->npages
   - Only invalidate dentries that are actually invalid

  Features:
   - Enable NFSoRDMA transparent state migration
   - Add support for lookup-by-filehandle
   - Add support for nfs re-exporting

  Other bugfixes and cleanups:
   - Christoph cleaned up the way we declare NFS operations
   - Clean up various internal structures
   - Various cleanups to commits
   - Various improvements to error handling
   - Set the dt_type of . and .. entries in NFS v4
   - Make slot allocation more reliable
   - Fix fscache stat printing
   - Fix uninitialized variable warnings
   - Fix potential list overrun in nfs_atomic_open()
   - Fix a race in NFSoRDMA RPC reply handler
   - Fix return size for nfs42_proc_copy()
   - Fix against MAC forgery timing attacks"

* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations
  nfs4: add NFSv4 LOOKUPP handlers
  nfs: add a nfs_ilookup helper
  nfs: replace d_add with d_splice_alias in atomic_open
  sunrpc: use constant time memory comparison for mac
  NFSv4.2 fix size storage for nfs42_proc_copy
  xprtrdma: Fix documenting comments in frwr_ops.c
  xprtrdma: Replace PAGE_MASK with offset_in_page()
  xprtrdma: FMR does not need list_del_init()
  xprtrdma: Demote "connect" log messages
  NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
  NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
  xprtrdma: Don't defer MR recovery if ro_map fails
  xprtrdma: Fix FRWR invalidation error recovery
  xprtrdma: Fix client lock-up after application signal fires
  xprtrdma: Rename rpcrdma_req::rl_free
  xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
  xprtrdma: Pre-mark remotely invalidated MRs
  xprtrdma: On invalidation failure, remove MWs from rl_registered
  ...
2017-07-13 14:35:37 -07:00
Linus Torvalds 6240300597 Chuck's RDMA update overhauls the "call receive" side of the
RPC-over-RDMA transport to use the new rdma_rw API.
 
 Christoph cleaned the way nfs operations are declared, removing a bunch
 of function-pointer casts and declaring the operation vectors as const.
 
 Christoph's changes touch both client and server, and both client and
 server pulls this time around should be based on the same commits from
 Christoph.
 
 (Note: Anna and I initially didn't coordinate this well and we realized
 our pull requests were going to leave you with Christoph's 33 patches
 duplicated between our two trees.  We decided a last-minute rebase was
 the lesser of two evils, so her pull request will show that last-minute
 rebase.  Yell if that was the wrong choice, and we'll know better for
 next time....)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZZ80JAAoJECebzXlCjuG+PiMP/jmw4IbzY4qt/X8aldVTMPZ8
 TkEXuZSrc7FbmroqAR0XN/qJjzENKUcrnlYm7HKVe6iItTZUvJuVThtHQVGzZUZD
 wP2VRzgkky59aDs9cphfTPGKPKL1MtoC3qQdFmKd/8ZhBDHIq89A2pQJwl7PI4rA
 IHzvLmZtTKL+xWoypqZQxepONhEY2ZPrffGWL+5OVF/dPmWfJ6m/M6jRTb7zV/YD
 PZyRqWQ8UY/HwZTwRrxZDCCxUsmRUPZz195iFjM8wvBl7auWNetC22gyyITlvfzf
 1m0zJqw3qn09+v2xnAWs/ZVxypg6rsEiIcL2mf0JC/tQh+iIzabc4e/TwDEWqSq+
 ocQrvXJuZCjsrMqg4oaIuDFogaZCsGR5wxDAEyfYDS/8fMdiKq8xJzT7v31/2U37
 Bsr1hvgAmD4eZWaTrJg11V5RnTzDgns+EtNfISR8t4/k+wehDfyzav8A+j72sqvR
 JT+7iUEd0QcBwo+MCC7AOnLLsIX45QUjZKKrvZNAC1fmr8RyAF1zo5HHO+NNjLuP
 J2PUG2GbNxsQkm/JAFKDvyklLpEXZc6uyYAcEefirxYbh1x0GfuetzqtH58DtrQL
 /1e80MRG9Qgq5S8PvYyvp1bIQPDRaQ188chEvzZy+3QeNXydq2LzDh0bjlM+4A9I
 DZhP2pNGLh0ImaPtX0q+
 =mR/a
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Chuck's RDMA update overhauls the "call receive" side of the
  RPC-over-RDMA transport to use the new rdma_rw API.

  Christoph cleaned the way nfs operations are declared, removing a
  bunch of function-pointer casts and declaring the operation vectors as
  const.

  Christoph's changes touch both client and server, and both client and
  server pulls this time around should be based on the same commits from
  Christoph"

* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
  svcrdma: fix an incorrect check on -E2BIG and -EINVAL
  nfsd4: factor ctime into change attribute
  svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
  svcrdma: use offset_in_page() macro
  svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
  svcrdma: Clean-up svc_rdma_unmap_dma
  svcrdma: Remove frmr cache
  svcrdma: Remove unused Read completion handlers
  svcrdma: Properly compute .len and .buflen for received RPC Calls
  svcrdma: Use generic RDMA R/W API in RPC Call path
  svcrdma: Add recvfrom helpers to svc_rdma_rw.c
  sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
  svcrdma: Don't account for Receive queue "starvation"
  svcrdma: Improve Reply chunk sanity checking
  svcrdma: Improve Write chunk sanity checking
  svcrdma: Improve Read chunk sanity checking
  svcrdma: Remove svc_rdma_marshal.c
  svcrdma: Avoid Send Queue overflow
  svcrdma: Squelch disconnection messages
  sunrpc: Disable splice for krb5i
  ...
2017-07-13 13:56:24 -07:00
Jason A. Donenfeld 15a8b93fd5 sunrpc: use constant time memory comparison for mac
Otherwise, we enable a MAC forgery via timing attack.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:14 -04:00
Chuck Lever 6afafa7799 xprtrdma: Fix documenting comments in frwr_ops.c
Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever d933cc3201 xprtrdma: Replace PAGE_MASK with offset_in_page()
Clean up.

Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever e2f6ef0915 xprtrdma: FMR does not need list_del_init()
Clean up.

Commit 38f1932e60 ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever 173b8f49b3 xprtrdma: Demote "connect" log messages
Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:12 -04:00
Chuck Lever 1f541895da xprtrdma: Don't defer MR recovery if ro_map fails
Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.

Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.

Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.

Fixes: 42fe28f607 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64dd ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever 8d75483a23 xprtrdma: Fix FRWR invalidation error recovery
When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c626 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1bed ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever 431af645cf xprtrdma: Fix client lock-up after application signal fires
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
2.  While ro_unmap_safe is still running, the server replies and
    rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever a80d66c9e0 xprtrdma: Rename rpcrdma_req::rl_free
Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.

This is a refactoring change only.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00
Chuck Lever 451d26e151 xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.

Since commit 9d6b040978 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.

As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.

Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.

And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00
Chuck Lever 4b196dc6fe xprtrdma: Pre-mark remotely invalidated MRs
There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.

As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00