linux

Commit Graph

Author	SHA1	Message	Date
Roland Dreier	a96e4e2ffe	IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA() Trying to have a ternary operator to choose between NULL (or 0) and the real pointer value in invocations leads to an impossible choice between a sparse error about a literal 0 used as a NULL pointer, and a gcc warning about "pointer/integer type mismatch in conditional expression." Rather than clutter the source with more casts, move the ternary operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it easier to use and simplifies its callers. Reported-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-12-20 10:53:44 -08:00
Yann Droneaud	309243ec14	IB/core: const'ify inbuf in struct ib_udata Userspace input buffer is not modified by kernel, so it can be 'const'. This is also a prerequisite to remove the implicit cast from INIT_UDATA(). Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-12-16 10:38:28 -08:00
Steve Wise	6b59ba609b	RDMA/iwcm: Don't touch cm_id after deref in rem_ref rem_ref() calls iwcm_deref_id(), which will wake up any blockers on cm_id_priv->destroy_comp if the refcnt hits 0. That will unblock someone in iw_destroy_cm_id() which will free the cmid. If that happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags), then the test_bit() will touch freed memory. The fix is to read the bit first, then deref. We should never be in iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a BUG_ON() to make sure of that. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-12-15 16:47:47 -08:00
Linus Torvalds	1ea406c0e0	Main batch of InfiniBand/RDMA changes for 3.13: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz VDvT9DEy6zjSMCLpMcdo =xxC+ -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...	2013-11-18 15:36:04 -08:00
Roland Dreier	b4fdf52b3f	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next	2013-11-17 08:22:19 -08:00
Matan Barak	69ad5da41b	IB/core: Re-enable create_flow/destroy_flow uverbs This commit reverts commit `7afbddfae9` ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Yann Droneaud	f21519b23c	IB/core: extended command: an improved infrastructure for uverbs commands Commit `400dbc9658` ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit `400dbc9658`, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ \| flags \| 00 00 \| command \| \| in_words \| out_words \| +----------------------------------------+ \| response \| \| response \| \| provider_in_words \| provider_out_words \| \| padding \| +----------------------------------------+ \| \| . <uverbs input> . . (in_words * 8) . \| \| +----------------------------------------+ \| \| . <provider input> . . (provider_in_words * 8) . \| \| +----------------------------------------+ - response, if present: +----------------------------------------+ \| \| . <uverbs output space> . . (out_words * 8) . \| \| +----------------------------------------+ \| \| . <provider output space> . . (provider_out_words * 8) . \| \| +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Yann Droneaud	2490f20be4	IB/core: Remove ib_uverbs_flow_spec structure from userspace The structure holding any types of flow_spec is of no use to userspace. It would be wrong for userspace to do: struct ib_uverbs_flow_spec flow_spec; flow_spec.type = IB_FLOW_SPEC_TCP; flow_spec.size = sizeof(flow_spec); Instead, userspace should use the dedicated flow_spec structure for - Ethernet : struct ib_uverbs_flow_spec_eth, - IPv4 : struct ib_uverbs_flow_spec_ipv4, - TCP/UDP : struct ib_uverbs_flow_spec_tcp_udp. In other words, struct ib_uverbs_flow_spec is a "virtual" data structure that can only be use by the kernel as an alias to the other. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Yann Droneaud	b68c956021	IB/core: Make uverbs flow structure use names like verbs ones This patch adds "flow" prefix to most of data structure added as part of commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") to keep those names in sync with the data structures added in commit `319a441d13` ("IB/core: Add receive flow steering support"). It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Yann Droneaud	d82693dad0	IB/core: Rename 'flow' structs to match other uverbs structs Commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") added public data structures to support receive flow steering. The new structs are not following the 'uverbs' pattern: they're lacking the common prefix 'ib_uverbs'. This patch replaces ib_kern prefix by ib_uverbs. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:08 -08:00
Matan Barak	f884827438	IB/core: clarify overflow/underflow checks on ib_create/destroy_flow This patch fixes the following issues: 1. Unneeded checks were removed 2. Removed the fixed size out of flow_attr.size, thus simplifying the checks. 3. Remove a 32bit hole on 64bit systems with strict alignment in struct ib_kern_flow_att by adding a reserved field. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:07 -08:00
Joe Perches	f3a5e3e37e	IB/ucma: Convert use of typedef ctl_table to struct ctl_table This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-16 10:32:24 -08:00
Zhao Hongjiang	ab626d1a68	IB/cm: Convert to using idr_alloc_cyclic() Commit `3e6628c4b3` ("idr: introduce idr_alloc_cyclic()") adds a new idr_alloc_cyclic() routine and converts several of these users to it. This is just a missed one - add it. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-16 10:30:42 -08:00
Eli Cohen	1c636f8016	IB/core: Encorce MR access rights rules on kernel consumers Enforce the rule that when requesting remote write or atomic permissions, local write must be indicated as well. See IB spec 11.2.8.2. Spotted by: Hagay Abramovsky <hagaya@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:25:32 -08:00
Michal Nazarewicz	352b905635	RDMA/cma: Remove unused argument and minor dead code The dev variable is never assigned after being initialised. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:46:54 -08:00
Sean Hefty	c6b21824c9	RDMA/ucma: Discard events for IDs not yet claimed by user space Problem reported by Avneesh Pant <avneesh.pant@oracle.com>: It looks like we are triggering a bug in RDMA CM/UCM interaction. The bug specifically hits when we have an incoming connection request and the connecting process dies BEFORE the passive end of the connection can process the request i.e. it does not call rdma_get_cm_event() to retrieve the initial connection event. We were able to triage this further and have some additional information now. In the example below when P1 dies after issuing a connect request as the CM id is being destroyed all outstanding connects (to P2) are sent a reject message. We see this reject message being received on the passive end and the appropriate CM ID created for the initial connection message being retrieved in cm_match_req(). The problem is in the ucma_event_handler() code when this reject message is delivered to it and the initial connect message itself HAS NOT been delivered to the client. In fact the client has not even called rdma_cm_get_event() at this stage so we haven't allocated a new ctx in ucma_get_event() and updated the new connection CM_ID to point to the new UCMA context. This results in the reject message not being dropped in ucma_event_handler() for the new connection request as the (if (!ctx->uid)) block is skipped since the ctx it refers to is the listen CM id context which does have a valid UID associated with it (I believe the new CMID for the connection initially uses the listen CMID -> context when it is created in cma_new_conn_id). Thus the assumption that new events for a connection can get dropped in ucma_event_handler() is incorrect IF the initial connect request has not been retrieved in the first case. We end up getting a CM Reject event on the listen CM ID and our upper layer code asserts (in fact this event does not even have the listen_id set as that only gets set up librdmacm for connect requests). The solution is to verify that the cm_id being reported in the event is the same as the cm_id referenced by the ucma context. A mismatch indicates that the ucma context corresponds to the listen. This fix was validated by using a modified version of librdmacm that was able to verify the problem and see that the reject message was indeed dropped after this patch was applied. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-11 10:45:06 -08:00
$Upinder Malhi $umalhi$$ Upinder Malhi $umalhi$	180771a370	IB/core: Add Cisco usNIC rdma node and transport types This patch adds new rdma node and new rdma transport, and supporting code used by Cisco's low latency driver called usNIC. usNIC uses its own transport, distinct from IB and iWARP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:36:25 -08:00
Mathias Krause	5476781bb9	IB/netlink: Remove superfluous RDMA_NL_GET_OP() masking 'op' is the already RDMA_NL_GET_OP() masked 'type'. No need to mask it again. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:54 -08:00
Latchesar Ionkov	6b7d103c1b	IB/core: Pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly Currently, we don't copy the immediate data from the userspace struct to the kernel one when UD messages are being sent. This patch makes sure that the immediate data is set correctly. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:54 -08:00
Doug Ledford	be9130cc92	IB/cma: Check for GID on listening device first As a simple optimization that should speed up the vast majority of connect attemps on IB devices, when we are searching for the GID of an incoming connection in the cached GID lists of devices, search the device that received the incoming connection request first. If we don't find it there, then move on to other devices. This reduces the time to perform 10,000 connections considerably. Prior to this patch, a bad run of cmtime would look like this: connect : 12399.26 12351.10 8609.00 1239.93 With this patch, it looks more like this: connect : 5864.86 5799.80 8876.00 586.49 Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:24 -08:00
Doug Ledford	29f27e8477	IB/cma: Use cached gids The cma_acquire_dev function was changed by commit `3c86aa70bf` ("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port() because multiport devices might have either IB or IBoE formatted gids. The old function assumed that all ports on the same device used the same GID format. However, when it was changed to use find_gid_port(), we inadvertently lost usage of the GID cache. This turned out to be a very costly change. In our testing, each iteration through each index of the GID table takes roughly 35us. When you have multiple devices in a system, and the GID you are looking for is on one of the later devices, the code loops through all of the GID indexes on all of the early devices before it finally succeeds on the target device. This pathological search behavior combined with 35us per GID table index retrieval results in results such as the following from the cmtime application that's part of the latest librdmacm git repo: ib1: step total ms max ms min us us / conn create id : 29.42 0.04 1.00 2.94 bind addr : 186705.66 19.00 18556.00 18670.57 resolve addr : 41.93 9.68 619.00 4.19 resolve route: 486.93 0.48 101.00 48.69 create qp : 4021.95 6.18 330.00 402.20 connect : 68350.39 68588.17 24632.00 6835.04 disconnect : 1460.43 252.65-1862269.00 146.04 destroy : 41.16 0.04 2.00 4.12 ib0: step total ms max ms min us us / conn create id : 28.61 0.68 1.00 2.86 bind addr : 2178.86 2.95 201.00 217.89 resolve addr : 51.26 16.85 845.00 5.13 resolve route: 620.08 0.43 92.00 62.01 create qp : 3344.40 6.36 273.00 334.44 connect : 6435.99 6368.53 7844.00 643.60 disconnect : 5095.38 321.90 757.00 509.54 destroy : 37.13 0.02 2.00 3.71 Clearly, both the bind address and connect operations suffer a huge penalty for being anything other than the default GID on the first port in the system. After applying this patch, the numbers now look like this: ib1: step total ms max ms min us us / conn create id : 30.15 0.03 1.00 3.01 bind addr : 80.27 0.04 7.00 8.03 resolve addr : 43.02 13.53 589.00 4.30 resolve route: 482.90 0.45 100.00 48.29 create qp : 3986.55 5.80 330.00 398.66 connect : 7141.53 7051.29 5005.00 714.15 disconnect : 5038.85 193.63 918.00 503.88 destroy : 37.02 0.04 2.00 3.70 ib0: step total ms max ms min us us / conn create id : 34.27 0.05 1.00 3.43 bind addr : 26.45 0.04 1.00 2.64 resolve addr : 38.25 10.54 760.00 3.82 resolve route: 604.79 0.43 97.00 60.48 create qp : 3314.95 6.34 273.00 331.49 connect : 12399.26 12351.10 8609.00 1239.93 disconnect : 5096.76 270.72 1015.00 509.68 destroy : 37.10 0.03 2.00 3.71 It's worth noting that we still suffer a bit of a penalty on connect to the wrong device, but the penalty is much less than it used to be. Follow on patches deal with this penalty. Many thanks to Neil Horman for helping to track the source of slow function that allowed us to track down the fact that the original patch I mentioned above backed out cache usage and identify just how much that impacted the system. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:24 -08:00
Eyal Perry	eb072c4b8d	RDMA/cma: Set IBoE SL (user-priority) by egress map when using vlans On top of commit `366cddb40` "IB/rdma_cm: TOS <=> UP mapping for IBoE", add support for case vlan egress map is used. When the IBoE session is being set over a vlan, inherit the socket priority to vlan priority mapping which was configured for the vlan device egress map. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-07 19:09:44 -05:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Yann Droneaud	7afbddfae9	IB/core: Temporarily disable create_flow/destroy_flow uverbs The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-21 09:44:17 -07:00
Eric W. Biederman	0bbf87d852	net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 - Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 21:59:38 -07:00
Linus Torvalds	7c049d0869	Main batch of InfiniBand/RDMA changes for 3.12 merge window: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSJ2drAAoJEENa44ZhAt0hpaUQAJ6EdNFh2bon9c/uHz1lw58A DIvVniUwWGCRgpbsv/IxZDBVX3G5IKSW6U0Y3/JxMXeIF/cGOUaJZgCeKPiYNoNA yxiEGI0ffvpWGAJUHSE2ARJKvgdjrpqGo5UzsiinEhJx3uczeZBcooosQTWDXxya /qSWH3rARkic9abuaizkuVzdWEjWLNjyh2bWnqJ4HNZplE8RfSK4bk8QtvEmpn9Z dNBKFFujysfHHGflLuSkFAkP1NdqlZeQ4/2uzD23p3YofbJwrJoJNFxkCfx3UUml fjPptlhU6kZMCwSXsD24tAV8Exr/CCgmxriFIN/Xqhu4gvBELScRPPqPqFquhtXI pP5hbG9/9P5C8BLgABe6IAvlU1lUyraR67bA78nYeKm2JpATE6T23Nx7nUgMgzRS Ee3ZvZGZvk8BZ7zyQh8D7Aig1XOtzqA9D5nLUyEJ2aRPSnXJxyi0S3Fbn0vmOMb7 8CF+99KECG8fb4sjNAjX5Zm48+7WJd+WCd3t89oUzCbJKKpa1Chctph8VH7dEfke JkEjOJhuAGqau4bIMZ2nZkISoTfSD7vbjPAxMPASS7fOdR7fe6AqDn9/LMAhWjpw 4Fb25ZW7rYf9+6jqA4MgMRFCTkXhR25FsqUUL9h8NaWai4F1dfVoZHbRvGIZah5n 5HQupXGquwES4y6pvHjc =rabl -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull main batch of InfiniBand/RDMA changes from Roland Dreier: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits) RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch IB/iser: Fix redundant pointer check in dealloc flow IB/iser: Fix possible memory leak in iser_create_frwr_pool() IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards RDMA/ocrdma: Fix passing wrong opcode to modify_srq RDMA/ocrdma: Fill PVID in UMC case RDMA/ocrdma: Add ABI versioning support RDMA/ocrdma: Consider multiple SGES in case of DPP RDMA/ocrdma: Fix for displaying proper link speed RDMA/ocrdma: Increase STAG array size RDMA/ocrdma: Dont use PD 0 for userpace CQ DB RDMA/ocrdma: FRMA code cleanup RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24 RDMA/ocrdma: Fix to work with even a single MSI-X vector RDMA/ocrdma: Remove the MTU check based on Ethernet MTU RDMA/ocrdma: Add support for fast register work requests (FRWR) RDMA/ocrdma: Create IRD queue fix IB/core: Better checking of userspace values for receive flow steering IB/mlx4: Add receive flow steering support IB/core: Export ib_create/destroy_flow through uverbs ...	2013-09-05 09:39:27 -07:00
Linus Torvalds	27703bb4a6	PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+ XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT BiHX6YFWtDDqRlVv8Q0F =LeaW -----END PGP SIGNATURE----- Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO	2013-09-04 17:31:11 -07:00
Roland Dreier	82af24ac6f	Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	2013-09-03 09:01:08 -07:00
Matan Barak	22878dbc91	IB/core: Better checking of userspace values for receive flow steering - Don't allow unsupported comp_mask values, user should check ibv_query_device to know which features are supported. - Add a check in ib_uverbs_create_flow() to verify the size passed from the user space. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 11:12:48 -07:00
Hadar Hen Zion	436f2ad05a	IB/core: Export ib_create/destroy_flow through uverbs Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:53:14 -07:00
Igor Ivanov	400dbc9658	IB/core: Infrastructure for extensible uverbs commands Add infrastructure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means it needs to have additional arguments, we will be able to add them without creating a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version. This patch for itself doesn't provide the whole scheme which is also dependent on adding a comp_mask field to each extended uverbs command struct. The new header framework allows for future extension of the CMD arguments (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing new command (that is a command that supports the new uverbs command header format suggested in this patch) w/o bumping ABI version and with maintaining backward and formward compatibility to new and old libibverbs versions. In the uverbs command we are passing both uverbs arguments and the provider arguments. We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider input argument size. Same goes for the response (the uverbs CMD output argument). For example take the create_cq call and the mlx4_ib provider: The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and in_words = sizeof(mlx4_create_cq)/4 . Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes, where uverbs assumes it knows the size of its input argument - struct ibv_create_cq. Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field to the struct which is basically bit field indicating which fields exists in the struct (as done for the libibverbs API extension), but we need a way to tell what is the total size of the struct and not assume the struct size is predefined (since we may get different struct sizes from different user libibverbs versions). So we know at which point the provider input argument (struct mlx4_create_cq) begins. Same goes for extending the provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size. Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:52:03 -07:00
Hadar Hen Zion	319a441d13	IB/core: Add receive flow steering support The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, which receive plain Ethernet packets, specifically packets that don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow(), which allow providing a flow specification for a QP. When there's a match between the specification and a received packet, the packet is forwarded to that QP, in a the same way one uses ib_attach_multicast() for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy, which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow() is a struct ib_flow_attr, which contains a few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, just with a little API enhancement which defines the newly added spec. The flow spec structures are defined with TLV (Type-Length-Value) entries, which allows calling ib_create_flow() with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower numerical value for the priority field means higher priority. The returned value from ib_create_flow() is a struct ib_flow, which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow(). Applications that offload TCP/IP traffic can also be written over IB UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING to denote support for flow steering. The ib_flow_attr enum type supports usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:51:52 -07:00
Yishai Hadas	846be90d81	IB/core: Fixes to XRC reference counting in uverbs Added reference counting mechanism for XRC target QPs between ib_uqp_object and its ib_uxrcd_object. This prevents closing an XRC domain that is still attached to a QP. In addition, add missing code in ib_uverbs_destroy_srq() to handle ib_uxrcd_object reference counting correctly when destroying an xsrq. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:21:32 -07:00
Yishai Hadas	73c40c616a	IB/core: Add locking around event dispatching on XRC target QPs Fix a potential race when event occurrs on a target XRC QP and in the middle of reporting that on its shared qps, one of them is destroyed by user space application. Also add note for kernel consumers in ib_verbs.h that they must not destroy the QP from within the handler. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:21:32 -07:00
Steve Wise	24d44a391f	RDMA/cma: Add IPv6 support for iWARP Modify the type of local_addr and remote_addr fields in struct iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold IPv6 and IPv4 addresses uniformly. Change the references of local_addr and remote_addr in cxgb4, cxgb3, nes and amso drivers to match this. However to be able to actully run traffic over IPv6, low-level drivers have to add code to support this. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> [ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 12:32:31 -07:00
Roland Dreier	569935db80	Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma' and 'qib' into for-next	2013-07-31 14:24:06 -07:00
Jack Morgenstein	ef5ed4166f	IB/core: Create QP1 using the pkey index which contains the default pkey Currently, QP1 is created using pkey index 0. This patch simply looks for the index containing the default pkey, rather than hard-coding pkey index 0. This change will have no effect in native mode, since QP0 and QP1 are created before the SM configures the port, so pkey table will still be the default table defined by the IB Spec, in C10-123: "If non-volatile storage is not used to hold P_Key Table contents, then if a PM (Partition Manager) is not present, and prior to PM initialization of the P_Key Table, the P_Key Table must act as if it contains a single valid entry, at P_Key_ix = 0, containing the default partition key. All other entries in the P_Key Table must be invalid." Thus, in the native mode case, the driver will find the default pkey at index 0 (so it will be no different than the hard-coding). However, in SR-IOV mode, for VFs, the pkey table may be paravirtualized, so that the VF's pkey index zero may not necessarily be mapped to the real pkey index 0. For VFs, therefore, it is important to find the virtual index which maps to the real default pkey. This commit does the following for QP1 creation: 1. Find the pkey index containing the default pkey, and use that index if found. ib_find_pkey() returns the index of the limited-membership default pkey (0x7FFF) if the full-member default pkey is not in the table. 2. If neither form of the default pkey is found, use pkey index 0 (previous behavior). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:15:17 -07:00
Sean Hefty	5eb695c177	RDMA/cma: Only call cma_save_ib_info() for CM REQs Calling cma_save_ib_info() for CM SIDR REQs results in a crash accessing an invalid path record pointer. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 00:50:44 -07:00
Sean Hefty	e511d1ae16	RDMA/cma: Fix accessing invalid private data for UD If a application is using AF_IB with a UD QP, but does not provide any private data, we will end up accessing invalid memory. Check for this case and handle it appropriately. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 00:50:40 -07:00
Paul Bolle	8fb488d740	RDMA/cma: Fix gcc warning Building cma.o triggers this gcc warning: drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’: drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here This is a false positive, as "port" will always be initialized if we're at "found". But if we assign to "id_priv->id.port_num" directly, we can drop "port". That will, obviously, silence gcc. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 16:11:22 -07:00
Rusty Russell	8c6ffba0ed	PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. Sweep of the simple cases. Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-15 11:25:01 +09:30
Linus Torvalds	c552441373	Main batch of InfiniBand/RDMA changes for 3.11 merge window: - AF_IB (native IB addressing) for CMA from Sean Hefty - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - Resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - Other small changes and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q GBYjHWMUpZXIFHQaH7am =nZh8 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - AF_IB (native IB addressing) for CMA from Sean Hefty - new mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - other small changes and fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) mlx5: Return -EFAULT instead of -EPERM IB/qib: Log all SDMA errors unconditionally IB/qib: Fix module-level leak mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() mlx5_core: Fixes for sparse warnings IB/mlx5: Make profile[] static in main.c mlx5: Fix parameter type of health_handler_t mlx5: Add driver for Mellanox Connect-IB adapters IB/core: Add reserved values to enums for low-level driver use IB/srp: Bump driver version and release date IB/srp: Make HCA completion vector configurable IB/srp: Maintain a single connection per I_T nexus IB/srp: Fail I/O fast if target offline IB/srp: Skip host settle delay IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Fix remove_one crash due to resource exhaustion IB/qib: New transmitter tunning settings for Dell 1.1 backplane IB/core: Fix error return code in add_port() ...	2013-07-13 12:57:21 -07:00
Linus Torvalds	496322bc91	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit `0a4db187a9` ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...	2013-07-09 18:24:39 -07:00
Roland Dreier	0eba551148	Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next	2013-07-08 11:22:11 -07:00
Roland Dreier	da183c7af8	IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() The macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must be used by default to not leak file descriptor across exec(). Replace calls to get_unused_fd() in uverbs with calls to get_unused_fd_flags(O_CLOEXEC). Inheriting uverbs fds across exec() cannot be used to do anything useful. Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>. Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 11:15:45 -07:00
Kees Cook	02aa2a3763	drivers: avoid format string in dev_set_name Calling dev_set_name with a single paramter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents, including wrappers like device_create*() and bdi_register(). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:41 -07:00
Wei Yongjun	80b15043e3	IB/core: Fix error return code in add_port() Fix to return -ENOMEM in the add_port() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-24 13:52:22 -07:00
Sean Hefty	ce117ffac2	RDMA/cma: Export AF_IB statistics Report AF_IB source and destination addresses through netlink interface. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:45 -07:00
Sean Hefty	5bc2b7b397	RDMA/ucma: Allow user space to specify AF_IB when joining multicast Allow user space applications to join multicast groups using MGIDs directly. MGIDs may be passed using AF_IB addresses. Since the current multicast join command only supports addresses as large as sockaddr_in6, define a new structure for joining addresses specified using sockaddr_ib. Since AF_IB allows the user to specify the qkey when resolving a remote UD QP address, when joining the multicast group use the qkey value, if one has been assigned. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:45 -07:00
Sean Hefty	209cf2a751	RDMA/ucma: Allow user space to pass AF_IB into resolve Allow user space applications to call resolve_addr using AF_IB. To support sockaddr_ib, we need to define a new structure capable of handling the larger address size. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:44 -07:00
Sean Hefty	eebe4c3a62	RDMA/ucma: Allow user space to bind to AF_IB Support user space binding to addresses using AF_IB. Since sockaddr_ib is larger than sockaddr_in6, we need to define a larger structure when binding using AF_IB. This time we use sockaddr_storage to cover future cases. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:43 -07:00
Sean Hefty	05ad94577e	RDMA/ucma: Name changes to indicate only IP addresses supported Several commands into the RDMA CM from user space are restricted to supporting addresses which fit into a sockaddr_in6 structure: bind address, resolve address, and join multicast. With the addition of AF_IB, we need to support addresses which are larger than sockaddr_in6. This will be done by adding new commands that exchange address information using sockaddr_storage. However, to support existing applications, we maintain the current commands and structures, but rename them to indicate that they only support IPv4 and v6 addresses. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:42 -07:00
Sean Hefty	edaa7a5578	RDMA/ucma: Add ability to query GID addresses Part of address resolution is mapping IP addresses to IB GIDs. With the changes to support querying larger addresses and more path records, also provide a way to query IB GIDs after resolution completes. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:42 -07:00
Sean Hefty	cf53936f22	RDMA/cma: Export cma_get_service_id() Allow the rdma_ucm to query the IB service ID formed or allocated by the rdma_cm by exporting the cma_get_service_id() functionality. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:41 -07:00
Sean Hefty	ac53b264b2	RDMA/ucma: Support querying when IB paths are not reversible The current query_route call can return up to two path records. The assumption being that one is the primary path, with optional support for an alternate path. In both cases, the paths are assumed to be reversible and are used to send CM MADs. With the ability to manually set IB path data, the rdma cm can eventually be capable of using up to 6 paths per connection: forward primary, reverse primary, forward alternate, reverse alternate, reversible primary path for CM MADs reversible alternate path for CM MADs. (It is unclear at this time if IB routing will complicate this) In order to handle more flexible routing topologies, add a new command to report any number of paths. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:40 -07:00
Sean Hefty	2e08b5879e	IB/sa: Export function to pack a path record into wire format Allow converting from struct ib_sa_path_rec to the IB defined SA path record wire format. This will be used to report path data from the rdma cm into user space. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:39 -07:00
Sean Hefty	ee7aed4528	RDMA/ucma: Support querying for AF_IB addresses The sockaddr structure for AF_IB is larger than sockaddr_in6. The rdma cm user space ABI uses the latter to exchange address information between user space and the kernel. To support querying for larger addresses, define a new query command that exchanges data using sockaddr_storage, rather than sockaddr_in6. Unlike the existing query_route command, the new command only returns address information. Route (i.e. path record) data is separated. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:39 -07:00
Sean Hefty	94d0c93941	RDMA/cma: Only listen on IB devices when using AF_IB If an rdma_cm_id is bound to AF_IB, with a wild card address, only listen on IB devices. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:38 -07:00
Sean Hefty	5c438135ad	RDMA/cma: Set qkey for AF_IB Allow the user to specify the qkey when using AF_IB. The qkey is added to struct rdma_ucm_conn_param in place of a reserved field, but for backwards compatability, is only accessed if the associated rdma_cm_id is using AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:37 -07:00
Sean Hefty	e8160e1593	RDMA/cma: Expose private data when using AF_IB If the source or destination address is AF_IB, then do not reserve a portion of the private data in the IB CM REQ or SIDR REQ messages for the cma header. Instead, all private data should be exported to the user. When AF_IB is used, the rdma cm does not have sufficient information to fill in the cma header. Additionally, this will be necessary to support any IB connection through the rdma cm interface, Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:36 -07:00
Sean Hefty	fbaa1a6d85	RDMA/cma: Merge cma_get/save_net_info With the removal of SDP related code, we can merge cma_get_net_info() with cma_save_net_info(), since we're only ever dealing with a single header format. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 23:35:29 -07:00
Sean Hefty	01602f113f	RDMA/cma: Remove unused SDP related code The SDP protocol was never merged upstream. Remove unused SDP related code from the RDMA CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:05 -07:00
Sean Hefty	496ce3ce17	RDMA/cma: Add support for AF_IB to cma_get_service_id() cma_get_service_id() forms the service ID based on the port space and port number of the rdma_cm_id. Extend the call to support AF_IB, which contains the service ID directly. This will be needed to support any arbitrary SID. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:05 -07:00
Sean Hefty	f68194ca88	RDMA/cma: Add support for AF_IB to rdma_resolve_route() Allow rdma_resolve_route() to handle the case where the user specified the source and destination addresses using AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:04 -07:00
Sean Hefty	f17df3b0de	RDMA/cma: Add support for AF_IB to rdma_resolve_addr() Allow the user to specify the remote address using AF_IB format. When AF_IB is used, the remote address simply needs to be recorded, and no resolution using ARP is done. The local address may still need to be matched with a local IB device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:04 -07:00
Sean Hefty	4ae7152e0b	RDMA/cma: Verify that source and dest sa_family are the same Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:04 -07:00
Sean Hefty	b0569e4075	RDMA/cma: Restrict AF_IB loopback to binding to IB devices only If a user specifies AF_IB as the source address for a loopback connection, limit the resolution to IB devices only. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:04 -07:00
Sean Hefty	f4753834b5	RDMA/cma: Add helper functions to return id address information Provide inline helpers to extract source and destination address data from the rdma_cm_id. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:04 -07:00
Sean Hefty	6a3e362d3c	RDMA/cma: Do not modify sa_family when setting loopback address cma_resolve_loopback is called after an rdma_cm_id has been bound to a specific sa_family and port. Once the source sa_family for the id has been set, do not modify it. Only the actual IP address portion of the source address needs to be set. As part of this fix, we can simplify setting the source address by moving the loopback address assignment from cma_resolve_loopback to cma_bind_loopback. cma_bind_loopback is only invoked when the source address is the loopback address. Finally, add loopback support for AF_IB as part of the change. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:03 -07:00
Sean Hefty	680f920a2e	RDMA/cma: Allow user to specify AF_IB when binding Modify rdma_bind_addr to allow the user to specify AF_IB when binding to a device. AF_IB indicates that the user is not mapping an IP address to the native IB addressing. (The mapping may have already been done, or is not needed) Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:03 -07:00
Sean Hefty	58afdcb738	RDMA/cma: Update port reservation to support AF_IB The AF_IB uses a 64-bit service id (SID), which the user can control through the use of a mask. The rdma_cm will assign values to the unmasked portions of the SID based on the selected port space and port number. Because the IB spec divides the SID range into several regions, a SID/mask combination may fall into one of the existing port space ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the correct RDMA port space. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:03 -07:00
Sean Hefty	ef560861c0	IB/addr: Add AF_IB support to ip_addr_size Add support for AF_IB to ip_addr_size, and rename the function to account for the change. Give the compiler more control over whether the call should be inline or not by moving the definition into the .c file, removing the static inline, and exporting it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:02 -07:00
Sean Hefty	2e2d190c5e	RDMA/cma: Include AF_IB in loopback and any address checks Enhance checks for loopback and any address to support AF_IB in addition to AF_INET and AF_INT6. This will allow future patches to use AF_IB when binding and resolving addresses. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:02 -07:00
Sean Hefty	c8dea2f9f0	RDMA/cma: Allow enabling reuseaddr in any state The rdma_cm only allows setting reuseaddr if the corresponding rdma_cm_id is in the idle state. Allow setting this value in other states. This brings the behavior more inline with sockets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 13:08:01 -07:00
Jiri Pirko	351638e7de	net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 13:11:01 -07:00
Roland Dreier	ea9627c800	Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next	2013-05-08 14:12:37 -07:00
Steve Wise	e413a823f6	RDMA/iwcm: Don't touch cmid after dropping reference The function cm_work_handler() cannot touch the cm_id after it derefs it, because it might be freed on another concurrent thread. If there are more work items queued for this cm_id, then we know there must be more references because they are added when the work items are queued. So in the while loop inside cm_work_handler(), after derefing, if the queue is empty, then exit the function. Otherwise we know it's safe to re-acquire the lock. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-04-24 17:47:33 -07:00
Shlomo Pongratz	eec9e29fc5	IB/core: Verify that QP handler is valid before dispatching events For QPs of type IB_QPT_XRC_TGT the IB core assigns a common event handler __ib_shared_qp_event_handler(), and the optionally supplied event handler is stored. When the common handler is called it iterates on all opened QPs and calles the original handlers without checking if they are NULL. Fix that. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-04-16 22:42:54 -07:00
Sasha Levin	b67bfe0d42	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Tejun Heo	e8c8d1bc06	idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c MAX_IDR_MASK is another weirdness in the idr interface. As idr covers whole positive integer range, it's defined as 0x7fffffff or INT_MAX. Its usage in idr_find(), idr_replace() and idr_remove() is bizarre. They basically mask off the sign bit and operate on the rest, so if the caller, by accident, passes in a negative number, the sign bit will be masked off and the remaining part will be used as if that was the input, which is worse than crashing. The constant is visible in idr.h and there are several users in the kernel. * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter() Basically used to test if adap->nr is a negative number which isn't -1 and returns -EINVAL if so. idr_alloc() already has negative @start checking (w/ WARN_ON_ONCE), so this can go away. * drivers/infiniband/core/cm.c:cm_alloc_id() drivers/infiniband/hw/mlx4/cm.c:id_map_alloc() Used to wrap cyclic @start. Can be replaced with max(next, 0). Note that this type of cyclic allocation using idr is buggy. These are prone to spurious -ENOSPC failure after the first wraparound. * fs/super.c:get_anon_bdev() The ID allocated from ida is masked off before being tested whether it's inside valid range. ida allocated ID can never be a negative number and the masking is unnecessary. Update idr_() functions to fail with -EINVAL when negative @id is specified and update other MAX_IDR_MASK users as described above. This leaves MAX_IDR_MASK without any user, remove it and relocate other MAX_IDR_ constants to lib/idr.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Wolfram Sang <wolfram@the-dreams.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:20 -08:00
Tejun Heo	3b069c5d85	IB/core: convert to idr_alloc() Convert to the much saner new idr interface. v2: Mike triggered WARN_ON() in idr_preload() because send_mad(), which may be used from non-process context, was calling idr_preload() unconditionally. Preload iff @gfp_mask has __GFP_WAIT. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Shani Michaeli	6b52a12bc3	IB/uverbs: Implement memory windows support in uverbs The existing user/kernel uverbs API has IB_USER_VERBS_CMD_ALLOC/DEALLOC_MW. Implement these calls, along with destroying user memory windows during process cleanup. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:59:09 -08:00
Shani Michaeli	7083e42ee2	IB/core: Add "type 2" memory windows support This patch enhances the IB core support for Memory Windows (MWs). MWs allow an application to have better/flexible control over remote access to memory. Two types of MWs are supported, with the second type having two flavors: Type 1 - associated with PD only Type 2A - associated with QPN only Type 2B - associated with PD and QPN Applications can allocate a MW once, and then repeatedly bind the MW to different ranges in MRs that are associated to the same PD. Type 1 windows are bound through a verb, while type 2 windows are bound by posting a work request. The 32-bit memory key is composed of a 24-bit index and an 8-bit key. The key is changed with each bind, thus allowing more control over the peer's use of the memory key. The changes introduced are the following: * add memory window type enum and a corresponding parameter to ib_alloc_mw. * type 2 memory window bind work request support. * create a struct that contains the common part of the bind verb struct ibv_mw_bind and the bind work request into a single struct. * add the ib_inc_rkey helper function to advance the tag part of an rkey. Consumer interface details: * new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support for these features. Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B memory windows. It can set neither to indicate it doesn't support type 2 windows at all. * modify existing provides and consumers code to the new param of ib_alloc_mw and the ib_mw_bind_info structure Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:51:45 -08:00
shefty	63f05be2c0	RDMA/cm: Change return value from find_gid_port() Problem reported by Dan Carpenter <dan.carpenter@oracle.com>: The patch 3c86aa70bf67: "RDMA/cm: Add RDMA CM support for IBoE devices" from Oct 13, 2010, leads to the following warning: net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create() error: passing non neg 1 to ERR_PTR This bug would result in a NULL dereference. svc_rdma_create() is supposed to return ERR_PTRs or valid pointers, but instead it returns ERR_PTRs, valid pointers and 1. The call tree is: svc_rdma_create() => rdma_bind_addr() => cma_acquire_dev() => find_gid_port() rdma_bind_addr() should return a valid errno. Fix this by having find_gid_port() also return a valid errno. If we can't find the specified GID on a given port, return -EADDRNOTAVAIL, rather than -EAGAIN, to better indicate the error. We also drop using the special return value of '1' and instead pass through the error returned by the underlying verbs call. On such errors, rather than aborting the search, we simply continue to check the next device/port. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-29 12:16:29 -08:00
David S. Miller	8dd9117cc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Pulled mainline in order to get the UAPI infrastructure already merged before I pull in David Howells's UAPI trees for networking. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-09 13:14:32 -04:00
Linus Torvalds	ca4da6948b	Second batch of changes for the 3.7 merge window: - Late-breaking fix for IPoIB on mlx4 SR-IOV VFs. - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n (new netlink config changes are to blame). - Make sure retry count values are in range in RDMA CM. - A few nes hardware driver fixes and cleanups. - Have iSER initiator use >1 interrupt vectors if available. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQbkIkAAoJEENa44ZhAt0hrLEP/3jNpwR+6/rbVxcPkmVVITsn yUz9S3V7loJqQqlHt7ktphgcoUpar+xs2C8DbCyVupyee4K3zd7Lw+wOVO2OJ16Y +gE8PZB6mSYOVaI0JW26Qqe2gcQ5DLZ4ic6E8gEiuyjNo3ig3MZl17ZffTI+5lXi igR+Qnsy2bm18JI6aumSeKYa17f7EmOK63jaIp0jc95vrtWke3jRRs6+bmmDLV7q 4ZxCP7ckBTRFVkisXlWZ95MmUbg/lhWosZGdJwjRYs+LW3tzgY18OQZgbw6FJKpJ 0OouAO51UMKcizfg7ikwUIw6/apaCFXXv/pH3rIeq18qiJeKcDsIXseY7sFRboHR fIPpfhHENokcxtlOReaw7hnlyMuwrLiYbF6TjPN9t8dWTNbwe+4W0K9G60fu1wmk qjaUTBCCuDF4KtKqCxP01cZzXyjJrworw5p+Nc1wxds+JeyKRCpVRrfYep8vOgft z6KJjQGDF16MvXmhF1dD9VBpmaMCOA4NDhfa0IcgHPTOLc/7Nbe5NTD95InlzM3Q /C0CPj+4J4kzEMnTBHVJvv1NeD3G1RQD80bce6ViTbgOUg9pmeGHkLWe420h7xT4 u/LBnwM1dcqY7BDOnn3ycYoJgr5OCQZOpLcNWjTXv4t51mBOYtM6AJ6IBIvkOmSz QjEyiDEEvT2K7c/rhYOz =9Jv4 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband changes from Roland Dreier: "Second batch of changes for the 3.7 merge window: - Late-breaking fix for IPoIB on mlx4 SR-IOV VFs. - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n (new netlink config changes are to blame). - Make sure retry count values are in range in RDMA CM. - A few nes hardware driver fixes and cleanups. - Have iSER initiator use >1 interrupt vectors if available." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cma: Check that retry count values are in range IB/iser: Add more RX CQs to scale out processing of SCSI responses RDMA/nes: Bump the version number of nes driver RDMA/nes: Remove unused module parameter "send_first" RDMA/nes: Remove unnecessary if-else statement RDMA/nes: Add missing break to switch. mlx4_core: Adjust flow steering attach wrapper so that IB works on SR-IOV VFs IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n	2012-10-07 17:19:49 +09:00
Gao feng	809d5fc9bf	infiniband: pass rdma_cm module to netlink_dump_start set netlink_dump_control.module to avoid panic. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:30:56 -04:00
Fengguang Wu	125c4c706b	idr: rename MAX_LEVEL to MAX_IDR_LEVEL To avoid name conflicts: drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined While at it, also make the other names more consistent and add parentheses. [akpm@linux-foundation.org: repair fallout] [sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: walter harms <wharms@bfs.de> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:04:56 +09:00
Sean Hefty	4ede178a5e	RDMA/cma: Check that retry count values are in range The retry_count and rnr_retry_count connection parameters are both 3-bit values. Check that the values are in range and reduce if they're not. This fixes a problem reported by Doug Ledford <dledford@redhat.com> that resulted in the userspace rping test (part of the librdmacm samples) failing to run over Intel IB HCAs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Use min_t() to avoid warnings about type mismatch. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-04 19:11:54 -07:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Linus Torvalds	7a9a2970b5	First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL fgN5n987wru91ewdX4gW =PAcy -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes" This merge also removes a new use of __cancel_delayed_work(), and replaces it with the regular cancel_delayed_work() that is now irq-safe thanks to the workqueue updates. That said, I suspect the sequence in question should probably use "mod_delayed_work()". I just did the minimal "don't use deprecated functions" fixup, though. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits) IB/qib: Fix local access validation for user MRs mlx4_core: Disable SENSE_PORT for multifunction devices mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs mlx4_core: Stash PCI ID driver_data in mlx4_priv structure IB/srp: Avoid having aborted requests hang IB/srp: Fix use-after-free in srp_reset_req() IB/qib: Add a qib driver version RDMA/nes: Fix compilation error when nes_debug is enabled RDMA/nes: Print hardware resource type RDMA/nes: Fix for crash when TX checksum offload is off RDMA/nes: Cosmetic changes RDMA/nes: Fix for incorrect MSS when TSO is on RDMA/nes: Fix incorrect resolving of the loopback MAC address mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem mlx4_core: Trivial cleanups to driver log messages mlx4_core: Trivial readability fix: "0X30" -> "0x30" IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations mlx4: Paravirtualize Node Guids for slaves mlx4: Activate SR-IOV mode for IB ...	2012-10-02 17:20:40 -07:00
Linus Torvalds	aecdc33e11	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...	2012-10-02 13:38:27 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Roland Dreier	d172f5a4ab	Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus	2012-10-02 07:43:59 -07:00
Jack Morgenstein	73aaa7418f	IB/core: Add ib_find_exact_cached_pkey() When P_Key tables potentially contain both full and partial membership copies for the same P_Key, we need a function to find the index for an exact (16-bit) P_Key. This is necessary when the master forwards QP1 MADs sent by guests. If the guest has sent the MAD with a limited membership P_Key, we need to to forward the MAD using the same limited membership P_Key. Since the master may have both the limited and the full member P_Keys in its table, we must make sure to retrieve the limited membership P_Key in this case. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:30 -07:00
Jack Morgenstein	ff7166c447	IB/core: Handle table with full and partial membership for the same P_Key Extend the cached and non-cached P_Key table lookups to handle limited and full membership of the same P_Key to co-exist in the P_Key table. This is necessary for SR-IOV, to allow for some guests would to have the full membership P_Key in their virtual P_Key table, while other guests on the same physical HCA would have the limited one. To support this, we need both the limited and full membership P_Keys to be present in the master's (hypervisor physical port) P_Key table. The algorithm for handling P_Key tables which contain both the limited and the full membership versions of the same P_Key works as follows: When scanning the P_Key table for a 15-bit P_Key: A. If there is a full member version of that P_Key anywhere in the table, return its index (even if a limited-member version of the P_Key exists earlier in the table). B. If the full member version is not in the table, but the limited-member version is in the table, return the index of the limited P_Key. Signed-off-by: Liran Liss <liranl@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:29 -07:00
Dotan Barak	2a22fb8c69	RDMA/cma: Use consistent component mask for IPoIB port space multicast joins CMA multicast joins for the IPoIB port space need to use the same component mask used by the ipoib driver. Otherwise, it's possible for the CMA to create a group to which a join made by ipoib will fail, or vise-versa. Some of the component mask fields set by ipoib weren't set by the CMA, fix that. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:31:47 -07:00
Dotan Barak	d8c9166961	IB/core: Remove unused variables in ucm/ucma Remove unused wait objects from ucm/ucma events flow. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:31:46 -07:00

1 2 3 4 5 ...

846 Commits