Commit Graph

921 Commits

Author SHA1 Message Date
Linus Torvalds 2eb7f910c1 Main set of InfiniBand/RDMA updates for 3.18 merge window:
- Large set of iSER initiator improvements
  - Hardware driver fixes for cxgb4, mlx5 and ocrdma
  - Small fixes to core midlayer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUQEczAAoJEENa44ZhAt0h3n8P/RqklU+JJiF1eWRgvdf3fOPC
 WDzOzdKUHvv3Lm5qSv8V6q23oYzf2QC/vjuZJgyM5156vj/qSf3iw1ueZTwYSQ3v
 B6bV7/ptSpBlxRx/sI9/ks5yqT869jww7QAO+wtvzuq7JDxQr+t4Yw1j3WOM8DGd
 F/rBFWJgLCD3zFSeJVY+AgZwIeDpNvBO0/QVnchs9iPUY0jSBvhDLsWegGhs92Uv
 wfeiV36f8hPVnbYVMV+xA2t9NkBV21r1sUK1l+CfPDgL/unDoXXqriuqb401l4cj
 zR4/Xwro9WzOC0gey2a5KkyX9wQUW9+Y4TnHRLnJ5shO/yzqxmc+/FMksxnaoWtI
 koF5LqyfXxNhq0VZBpoy+astY4vv4h34WlyBC2lxDCJBEw8VYzO+Wg9QJAzWOxlq
 JXtY9l9zRxfgTLe78xjl2n9LEeOysbYJemp3YFpZVh7a4JvUK4L9Kh9mXKZu8tqt
 zd7YniNNJSdDfF5+Gx1kSK4kE1r/89f04ED6hg/eIf/IqhNYJ/vD5joQuJ4RqQNx
 5G0wdCfKMy9cCwbx1/eCRJOuP+dbGR73UskgXc41s5/VtXCdq8JHvBcWw5CgxQ5I
 AYGUrCuu/ZsQSMNM1i7Pocz8m4uLlnpXF7Qv62ULt/NQJQHj5Xfdvb9M3BK0urIf
 +aBpf+LiZahaUEfo5eYt
 =+ABw
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/RDMA updates from Roland Dreier:
 - large set of iSER initiator improvements
 - hardware driver fixes for cxgb4, mlx5 and ocrdma
 - small fixes to core midlayer

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (47 commits)
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Make c4iw_wr_log_size_order static
  IB/core: Fix XRC race condition in ib_uverbs_open_qp
  IB/core: Clear AH attr variable to prevent garbage data
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: Remove a unused-label warning
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs
  RDMA/ocrdma: Add default GID at index 0
  IB/mlx5, iser, isert: Add Signature API additions
  Target/iser: Centralize ib_sig_domain setting
  IB/iser: Centralize ib_sig_domain settings
  IB/mlx5: Use extended internal signature layout
  IB/iser: Set IP_CSUM as default guard type
  IB/iser: Remove redundant assignment
  IB/mlx5: Use enumerations for PI copy mask
  ...
2014-10-19 12:29:23 -07:00
Rasmus Villemoes b60459f080 ib_srpt: replace strnicmp with strncasecmp
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Roland Dreier <roland@kernel.org>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:25 +02:00
Sagi Grimberg 78eda2bb65 IB/mlx5, iser, isert: Add Signature API additions
Expose more signature setting parameters. We modify the signature API
to allow usage of some new execution parameters relevant to data
integrity feature.

This patch modifies ib_sig_domain structure by:

- Deprecate DIF type in signature API (operation will
  be determined by the parameters alone, no DIF type awareness)
- Add APPTAG check bitmask (for input domain)
- Add REFTAG remap (increment) flag for each domain
- Add APPTAG/REFTAG escape options for each domain

The mlx5 driver is modified to follow the new parameters in HW
signature setup.

At the moment the callers (iser/isert) hard-code new parameters (by
DIF type). In the future, callers will retrieve them from the scsi
command structure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg 3d73cf1a2a Target/iser: Centralize ib_sig_domain setting
Later there will be more parameters to set, so we want to do it in a
centralized place.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg 92792c0a19 IB/iser: Centralize ib_sig_domain settings
Later there will be more parameters to set, so we want to do it in a
centralized place.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg f043032ef1 IB/iser: Set IP_CSUM as default guard type
In the future this will be a per-command parameter so we can lose it,
but in the mean time IP_CSUM is a lot lighter for SW layers to
compute, set it as default.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg 6f5f8a016e IB/iser: Remove redundant assignment
We clear the struct before - no need to do 0 assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Or Gerlitz b261aeafe1 IB/iser: Bump version, add maintainer
Update the driver version and add Sagi Grimberg as maintainer

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:46 -07:00
Sagi Grimberg dc05ac36f7 IB/iser: Fix/add kernel-doc style description in iscsi_iser.c
This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:46 -07:00
Sagi Grimberg cd88621a9e IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h
- iser_hdr
- iser_data_buf
- iser_mem_reg
- iser_regd_buf
- iser_tx_desc
- iser_rx_desc
- iser_device
- iser_pi_context
- iser_conn
- ib_conn
- iser_comp
- iscsi_iser_task
- iser_global

While we're at it, change nit alignments in this file

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:46 -07:00
Sagi Grimberg e9d49b82f1 IB/iser: Nit - add space after __func__ in iser logging
Change logging: "iser:XXXX" to "iser: XXXX"

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:46 -07:00
Ariel Nahum bba0a3c9d7 IB/iser: Change iscsi_conn_stop log level to info
Match to the debug level of all functions in connect/disconnect flows.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:42 -07:00
Sagi Grimberg 6df5a128f0 IB/iser: Suppress scsi command send completions
Singal completion of every 32 scsi commands and suppress all the rest.
We don't do anything upon getting the completion so no need to "just
consume" it.  Cleanup of scsi command is done in cleanup_task callback.

Still keep dataout and control send completions as we may need to
cleanup there. This helps reducing the amount of interrupts/completions
in the IO path.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg 6e6fe2fb1d IB/iser: Optimize completion polling
Poll in batch of 16. Since we don't want it on the stack, keep under
iser completion context (iser_comp).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg ff3dd52d26 IB/iser: Use beacon to indicate all completions were consumed
Avoid post_send counting (atomic) in the IO path just to keep track of
how many completions we need to consume.  Use a beacon post to indicate
that all prior posts completed.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg 6aabfa76f5 IB/iser: Use single CQ for RX and TX
This will solve a possible condition where we might miss TX completion
(flush error) during session teardown.  Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().

This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg 183cfa434e IB/iser: Use internal polling budget to avoid possible live-lock
We need a way to guarentee that we don't stay in soft-IRQ context for
too long.  We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg bf17554035 IB/iser: Centralize iser completion contexts
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Ariel Nahum aea8f4df6d IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections.  No harm done, but when we try to unload
the module we will need to cleanup all these connections.  So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload.  BUG_ON()
will cause segfault on module_exit and we don't want that.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg 8c204e69ce IB/iser: Signal iSCSI layer that transport is broken in error completions
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.

While we are at it, add a nice kernel-doc style documentation.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg 3a940daf6f IB/iser: Protect tasks cleanup in case IB device was already released
Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called
after the IB device was removed (DEVICE_REMOVAL CM event).  We also
call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and
tasks cleanup from racing.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Ariel Nahum ec370e2b63 IB/iser: Unbind at conn_stop stage
Previously we didn't need to unbind the iser_conn and iscsi_conn since
we always relied on iscsi daemon to teardown the connection and never
let it finish before we cleanup all that is needed in iser.  This is
not the case anymore (for DEVICE_REMOVAL event).  So avoid any possible
chance we cause iscsi_conn dereference after iscsi_conn was freed.

We also call iser_conn_terminate (safe to call multiple times) just
for the corner case of iscsi daemon stopping an old connection before
invoking endpoint removal (might happen if it was violently killed).

Notice we are unbinding under a lock - which is required.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg c107a6c0cf IB/iser: Don't bound release_work completions timeouts
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.

ib_completion:
	Guaranteed to come when:
	    - Got DISCONNECTED/ADDR_CHANGE event or
	    - iSCSI called ep_disconnect/conn_stop
	Guaranteed to finish when:
	    - Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
	    - All Flush errors are consumed
	    - IB related resources are destroyed

stop_completion:
	Guaranteed to come when:
	    - iSCSI calls conn_stop
	Guaranteed to finish when:
	    - All inflight tasks were cleaned up

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg c47a3c9ed5 IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).

This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.

The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.

This patch introduces a separation in logic when handling CM events:

- DISCONNECTED_HANDLER, ADDR_CHANGED
  This events indicate the start of teardown process.
  Actions:
  1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
  2. Notify iSCSI of connection failure
  3. Change state to TERMINATING
  4. Poll for all flush errors to be consumed

- TIMEWAIT_EXIT, DEVICE_REMOVAL
  These events indicate the final stage of termination process and
  we can free RDMA related resources.
  Actions:
  1. Call disconnected handler (we are not guaranteed that DISCONNECTED
     event was invoked in the past)
  2. Cleanup RDMA related resources
  3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
     implicitly destroy the cm_id (Can't rely on user-space, make sure
     we have forward progress)

We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).

The iser_conn_release_work will wait for teardown completions:

- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion

And then will continue to free iser connection representation (iser_conn).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg 96f15198c1 IB/iser: Extend iser_free_ib_conn_res()
Put all connection IB related resources release in this routine.  One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex.  Also move its position to avoid forward
declaration.  While at it fix qp NULL assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Roi Dayan 6bb0279f95 IB/iser: Remove unused variables and dead code
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg a4ee3539f6 IB/iser: Re-introduce ib_conn
Structure that describes the RDMA relates connection objects.  Static
member of iser_conn.

This patch does not change any functionality

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg 5716af6e52 IB/iser: Rename ib_conn -> iser_conn
Two reasons why we choose to do this:

1. No point today calling struct iser_conn by another name ib_conn
2. In the next patches we will restructure iser control plane representation
   - struct iser_conn: connection logical representation
   - struct ib_conn: connection RDMA layout representation

This patch does not change any functionality.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Eric Dumazet 0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
Linus Torvalds 452b6361c4 Last late set of InfiniBand/RDMA fixes for 3.17:
- Fixes for the new memory region re-registration support
  - iSER initiator error path fixes
  - Grab bag of small fixes for the qib and ocrdma hardware drivers
  - Larger set of fixes for mlx4, especially in RoCE mode
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUIexdAAoJEENa44ZhAt0hP10QAJztxlS2a8U3JCJzthwSYxlI
 ohT9487iLk1uEcj4Z3i7w2ERRUzXaHbRTktNHFjwfRb8x2qMUgT2PfD6/30sQ250
 nJAk3FRFNipxKkJSfmcc3+O4r91i4F+CaN8DGypaBDHcupeD2drKocl/Iu5MIvkG
 e5CzLlS7i/xrWKmgYP4bIqqFZsqQ+2rJrYBDybuLZSaZNd0PTDE3yCDihfOcsxjn
 TeOCVbm5895fPRtxzeCGHy8bXbYYN9vItuhtHC+sntYtbhNJhjpmP+1yD6M2SoZR
 34sGd7AA1j1H6ATmanzeW2aALkFYPIuGihDbbnRQlDG1v09lEPfP2GtfLxoQ9Ibo
 nfe2rsthzV6Qh2xcXjn6KicgV7bb6aSUXEK24zKx7O3MkOvHkOC/JIIrd9dFe+uj
 R7pUd3XlAk8SBhTQ4gLub06Dl7ynzSRArwcdMTHp30LvtnjJZoQR67WGGrsdwlIW
 MV43105i7iLCcdaSd0ihKnR6OFlSh13Z0wpu+B386bwxkHxjFJXkVHxOJir/iAk9
 cW4RXbA/ic7nwIjes4GbMNDOvdJO2tDcg9KGSgiDY3kC5GksPqfxXYVDlMB2rFoE
 PhfQ8TOcbZYTmlcKLMpMIFXP484VPhWQJeYWPOf9KGS6aW5QRNPsPCmAvaoSXWLs
 GVSlvjbE6O7MgonqG1Jh
 =Kpm1
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 "Last late set of InfiniBand/RDMA fixes for 3.17:

   - fixes for the new memory region re-registration support
   - iSER initiator error path fixes
   - grab bag of small fixes for the qib and ocrdma hardware drivers
   - larger set of fixes for mlx4, especially in RoCE mode"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/mlx4: Fix VF mac handling in RoCE
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/core: When marshaling uverbs path, clear unused fields
  IB/mlx4: Avoid executing gid task when device is being removed
  IB/mlx4: Fix lockdep splat for the iboe lock
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/iser: Bump version to 1.4.1
  IB/iser: Allow bind only when connection state is UP
  IB/iser: Fix RX/TX CQ resource leak on error flow
  RDMA/ocrdma: Use right macro in query AH
  RDMA/ocrdma: Resolve L2 address when creating user AH
  mlx4: Correct error flows in rereg_mr
  IB/qib: Correct reference counting in debugfs qp_stats
  IPoIB: Remove unnecessary port query
  ...
2014-09-23 16:47:34 -07:00
Linus Torvalds 98f75b8291 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the user gives us a msg_namelen of 0, don't try to interpret
    anything pointed to by msg_name.  From Ani Sinha.

 2) Fix some bnx2i/bnx2fc randconfig compilation errors.

    The gist of the issue is that we firstly have drivers that span both
    SCSI and networking.  And at the top of that chain of dependencies
    we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
    selected.

    But since select is a sledgehammer and ignores dependencies,
    everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
    explicitly select their dependencies and so on and so forth.

    Generally speaking 'select' is supposed to only be used for child
    nodes, those which have no dependencies of their own.  And this
    whole chain of dependencies in the scsi layer violates that rather
    strongly.

    So just make SCSI_NETLINK depend upon it's dependencies, and so on
    and so forth for the things selecting it (either directly or
    indirectly).

    From Anish Bhatt and Randy Dunlap.

 3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.

 4) Actually notice netdev feature changes in rtl_open() code, from
    Hayes Wang.

 5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.

 6) Missing memory barrier in sunvnet driver, from David Stevens.

 7) Don't leave anycast addresses around when ipv6 interface is
    destroyed, from Sabrina Dubroca.

 8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
    initialized in SFC driver, from Edward Cree.

 9) Fix missing DMA error checking in 3c59x, from Neal Horman.

10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
    fix from Samuel Gauthier.

11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
    build error.

12) Fix macvlan regression wherein we stopped emitting
    broadcast/multicast frames over software devices.  From Nicolas
    Dichtel.

13) Fix infiniband bug due to unintended overflow of skb->cb[], from
    Eric Dumazet.  And add an assertion so this doesn't happen again.

14) dm9000_parse_dt() should return error pointers, not NULL.  From
    Tobias Klauser.

15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
  net: bcmgenet: fix TX reclaim accounting for fragments
  ipv4: do not use this_cpu_ptr() in preemptible context
  dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
  r8169: fix an if condition
  r8152: disable ALDPS
  ipoib: validate struct ipoib_cb size
  net: sched: shrink struct qdisc_skb_cb to 28 bytes
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  macvlan: allow to enqueue broadcast pkt on virtual device
  pch_gbe: 'select' NET_PTP_CLASSIFY.
  scsi: Use 'depends' with LIBFC instead of 'select'.
  openvswitch: restore OVS_FLOW_CMD_NEW notifications
  genetlink: add function genl_has_listeners()
  lib: rhashtable: remove second linux/log2.h inclusion
  net: allow macvlans to move to net namespace
  3c59x: Fix bad offset spec in skb_frag_dma_map
  3c59x: Add dma error checking and recovery
  sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
  can: at91_can: add missing prepare and unprepare of the clock
  ...
2014-09-22 18:23:33 -07:00
Eric Dumazet b49fe36208 ipoib: validate struct ipoib_cb size
To catch future errors sooner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:29:55 -04:00
Roland Dreier 3bdad2d13f Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2014-09-22 10:05:40 -07:00
Or Gerlitz 61aabb3c91 IB/iser: Bump version to 1.4.1
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Sagi Grimberg 91eb1df39a IB/iser: Allow bind only when connection state is UP
We need to fail the bind operation if the iser connection state != UP
(started teardown) and this should be done under the state lock.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Roi Dayan c33b15f00b IB/iser: Fix RX/TX CQ resource leak on error flow
When failing to allocate TX CQ we already allocated RX CQ, so we need to make
sure we release it. Also, when failing to register notification to the RX CQ
we currently leak both RX and TX CQs of the current index, fix that too.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Alex Estrin 68f9d83c77 IPoIB: Remove unnecessary port query
There are two queries for port attributes one after another. A second
call is not needed since port_attr structure already holds the data.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19 10:17:40 -07:00
Sagi Grimberg 1a92e17e39 Target/iser: Fix initiator_depth and responder_resources
The iser target is the RDMA requester and the iser initiator is the
RDMA responder. In order to determine the max inflight RDMA READ requests
to set on the QP (initiator_depth), it should take the min between the
initiator published initiator_depth and the max inflight rdma read
requests its local HCA support (max_qp_init_rd_atom).

The target will never handle incoming RDMA READ requests so no need to
set responder_resources.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:55 -07:00
Sagi Grimberg 38a8316b5d Target/iser: Avoid calling rdma_disconnect twice
rdma_disconnect may be called in 2 code flows:
- isert_wait_conn: disconnect initiated be the target
- disconnected_handler: disconnect invoked by the initiator

In case isert_conn->disconnect is true then rdma_disconnect
was called in disconnected handler, no need to call it again
from isert_wait_conn.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:30 -07:00
Sagi Grimberg 0fc4ea701f Target/iser: Don't put isert_conn inside disconnected handler
disconnected_handler is invoked on several CM events (such
as DISCONNECTED, DEVICE_REMOVAL, TIMEWAIT_EXIT...). Since
multiple  events can occur while before isert_free_conn is
invoked, we might put all isert_conn references and free
the connection too early.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:16 -07:00
Sagi Grimberg c2f88b17a1 Target/iser: Get isert_conn reference once got to connected_handler
In case the connection didn't reach connected state, disconnected
handler will never be invoked thus the second kref_put on
isert_conn will be missing.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:02 -07:00
Linus Torvalds e3b1fd56f1 Main set of InfiniBand/RDMA updates for 3.17 merge window:
- MR reregistration support
  - MAD support for RMPP in userspace
  - iSER and SRP initiator updates
  - ocrdma hardware driver updates
  - other fixes...
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJT7N2GAAoJEENa44ZhAt0hUiIQAKBqYIjpB3QY6Z/B19mxDxku
 I81B3OkirumbAaCLoLckvq4gnwQ+BAD+YUmXgP08TCIgrABZAYYw+WvMEY9WNyQB
 x3Pv+BzX+wKKNaQkSnB9JVdku+BSI76eW0YYrIX0F0x1o0Jq9JpSkia91KmRvqmX
 YSFy2R7BjEZ4lfo/uscydHT26Q6EdT3od4iv48K8qq5rKdjtyYNgD/75DLCN599Z
 uI1f98e6Tl+7nHWaioQB61zlYPkNPLAnZtMrY2j4tarTwYwX1KhF3eV7z39L1l81
 nhMIXr+qBXtYuZw3I9rKqw3VVCCqB9e6E8FIA5K/d2jWqO+0TqIMUYOuZXuCezWg
 o+uGgwbDOweBqwrKRmiR2M0lk2I1Z16jBxYuaUBbLImG0/NPtuUB22t6RbPAojTa
 EjDkb9XBA7uFUMrYYnou+HxEzmJUYkin6wgGxtklYEKUqvh8G9ccGt6httWrCSrV
 mpjwJv+S4LdFM49wdP993lYthpCoZ42yxEzZ7zJ/KTt17/Wb5F4RtHIROGVFkHdT
 mo8RUz5DGDznfNQgn0m/jC3woFnMNpLOduI+CivhrwrWCwMpSUxIjqWu3263pIJ7
 +H0kNOKDAbp6+F27j+AVznlyXnaEtYyM8EZnysG1Hkz24gCSWBYo5Ep8eSH8LgY4
 9VPc75KVG6uddx5mhg5h
 =wOm0
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 "Main set of InfiniBand/RDMA updates for 3.17 merge window:

   - MR reregistration support
   - MAD support for RMPP in userspace
   - iSER and SRP initiator updates
   - ocrdma hardware driver updates
   - other fixes..."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits)
  IB/srp: Fix return value check in srp_init_module()
  RDMA/ocrdma: report asic-id in query device
  RDMA/ocrdma: Update sli data structure for endianness
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/uapi: Include socket.h in rdma_user_cm.h
  IB/srpt: Handle GID change events
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mad: Add user space RMPP support
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid multicast join attempts with invalid P_key
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid flushing the workqueue from worker context
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipath: Add P_Key change event support
  mlx4_core: Add support for secure-host and SMP firewall
  ...
2014-08-14 11:09:05 -06:00
Roland Dreier d087f6ad72 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next 2014-08-14 08:58:04 -07:00
Wei Yongjun da05be290f IB/srp: Fix return value check in srp_init_module()
In case of error, the function create_workqueue() returns NULL pointer
not ERR_PTR().  The IS_ERR() test in the return value check should be
replaced with NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-14 08:57:54 -07:00
Doug Ledford 2aa1cf64aa IB/srpt: Handle GID change events
GID change events need a refresh just like LID change events and several
others.  Handle this the same as the others.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12 22:01:55 -07:00
Fabian Frederick e42fa2092c IPoIB: Remove unnecessary test for NULL before debugfs_remove()
Fix checkpatch warning:

    WARNING: debugfs_remove(NULL) is safe this check is probably not required

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12 21:59:54 -07:00
Ira Weiny 0f29b46d49 IB/mad: add new ioctl to ABI to support new registration options
Registrations options are specified through flags.  Definitions of flags will
be in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:36:00 -07:00
Alex Estrin dd57c9308a IB/ipoib: Avoid multicast join attempts with invalid P_key
Currently, the parent interface keeps sending broadcast group join
requests even if p_key index 0 is invalid, which is possible/common in
virtualized environments where a VF has been probed to VM but the
actual P_key configuration has not yet been assigned by the management
software. This creates unnecessary noise on the fabric and in the
kernel logs:

    ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22

The original code run the multicast task regardless of the actual
P_key value, which can be avoided. The fix is to re-init resources and
bring interface up only if P_key index 0 is valid either when starting
up or on PKEY_CHANGE event.

Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 19:53:56 -07:00
Erez Shitrit 4eae374845 IB/ipoib: Avoid flushing the workqueue from worker context
The error flow of ipoib_ib_dev_open() invokes ipoib_ib_dev_stop() with
workqueue flushing enabled, which deadlocks if the open procedure
itself was called by a worker thread.

Fix this by adding a flush enabled flag to ipoib_ib_dev_open() and set
it accordingly from the locations where such a call is made.

The call trace was the following:

 [<ffffffff81095bc4>] ? flush_workqueue+0x54/0x80
 [<ffffffffa056c657>] ? ipoib_ib_dev_stop+0x447/0x650 [ib_ipoib]
 [<ffffffffa056cc34>] ? ipoib_ib_dev_open+0x284/0x430 [ib_ipoib]
 [<ffffffffa05674a8>] ? ipoib_open+0x78/0x1d0 [ib_ipoib]
 [<ffffffffa05697b8>] ? ipoib_pkey_open+0x38/0x40 [ib_ipoib]
 [<ffffffffa056cf3c>] ? __ipoib_ib_dev_flush+0x15c/0x2c0 [ib_ipoib]
 [<ffffffffa056ce56>] ? __ipoib_ib_dev_flush+0x76/0x2c0 [ib_ipoib]
 [<ffffffffa056d0a0>] ? ipoib_ib_dev_flush_heavy+0x0/0x20 [ib_ipoib]
 [<ffffffffa056d0ba>] ? ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
 [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:47:33 -07:00
Erez Shitrit db84f88037 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
The current code use a dedicated polling logic to determine when the P_Key
assigned to the ipoib device is present in HCA port table and act accordingly.

Move to use the code which acts upon getting PKEY_CHANGE event to handle this
task and remove the P_Key polling logic/thread as they add extra complexity
which isn't needed.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:47:33 -07:00