Commit Graph

846 Commits

Author SHA1 Message Date
Roland Dreier 7b909bb49a Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next 2014-10-14 14:09:12 -07:00
Jack Morgenstein a040f95dc8 IB/core: Fix XRC race condition in ib_uverbs_open_qp
In ib_uverbs_open_qp, the sharable xrc target qp is created as a
"pseudo" qp and added to a list of qp's sharing the same physical
QP.  This is done before the "pseudo" qp is assigned a uobject.

There is a race condition here if an async event arrives at the
physical qp.  If the event is handled after the pseudo qp is added to
the list, but before it is assigned a uobject, the kernel crashes in
ib_uverbs_qp_event_handler, due to trying to dereference a NULL
uobject pointer.

Note that simply checking for non-NULL is not enough, due to error
flows in ib_uverbs_open_qp.  If the failure is after assigning the
uobject, but before the qp has fully been created, we still have a
problem.

Thus, in ib_uverbs_qp_event_handler, we test that the uobject is
present, and also that it is live.

Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14 00:30:56 -07:00
Devesh Sharma 8b0f93d949 IB/core: Clear AH attr variable to prevent garbage data
During create-ah from userspace, uverbs is sending garbage data in
attr.dmac and attr.vlan_id.  This patch sets attr.dmac and
attr.vlan_id to zero.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14 00:29:06 -07:00
Eli Cohen 377b513485 IB/core: Avoid leakage from kernel to user space
Clear the reserved field of struct ib_uverbs_async_event_desc which is
copied to user space.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Roland Dreier 3bdad2d13f Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2014-09-22 10:05:40 -07:00
Matan Barak a59c5850f0 IB/core: When marshaling uverbs path, clear unused fields
When marsheling a user path to the kernel struct ib_sa_path, need
to zero smac, dmac and set the vlan id to the "no vlan" value.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Reported-by: Aleksey Senin <alekseys@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:52 -07:00
Shawn Bohrer 87773dd56d IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get
In debugging an application that receives -ENOMEM from ib_reg_mr(), I
found that ib_umem_get() can fail because the pinned_vm count has
wrapped causing it to always be larger than the lock limit even with
RLIMIT_MEMLOCK set to RLIM_INFINITY.

The wrapping of pinned_vm occurs because the process that calls
ib_reg_mr() will have its mm->pinned_vm count incremented.  Later a
different process with a different mm_struct than the one that
allocated the ib_umem struct ends up releasing it which results in
decrementing the new processes mm->pinned_vm count past zero and
wrapping.

I'm not entirely sure what circumstances cause a different process to
release the ib_umem than the one that allocated it but the kernel
stack trace of the freeing process from my situation looks like the
following:

    Call Trace:
     [<ffffffff814d64b1>] dump_stack+0x19/0x1b
     [<ffffffffa0b522a5>] ib_umem_release+0x1f5/0x200 [ib_core]
     [<ffffffffa0b90681>] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
     [<ffffffffa0b4d93c>] ib_destroy_qp+0x12c/0x170 [ib_core]
     [<ffffffffa0cc7129>] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
     [<ffffffff81141cba>] __fput+0xba/0x240
     [<ffffffff81141e4e>] ____fput+0xe/0x10
     [<ffffffff81060894>] task_work_run+0xc4/0xe0
     [<ffffffff810029e5>] do_notify_resume+0x95/0xa0
     [<ffffffff814e3dd0>] int_signal+0x12/0x17

The following patch fixes the issue by storing the pid struct of the
process that calls ib_umem_get() so that ib_umem_release and/or
ib_umem_account() can properly decrement the pinned_vm count of the
correct mm_struct.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reviewed-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19 09:55:42 -07:00
Roland Dreier d087f6ad72 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next 2014-08-14 08:58:04 -07:00
Ira Weiny 1471cb6ca6 IB/mad: Add user space RMPP support
Using the new registration mechanism, define a flag that indicates the
user wishes to process RMPP messages in user space rather than have
the kernel process them.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:36:00 -07:00
Ira Weiny 0f29b46d49 IB/mad: add new ioctl to ABI to support new registration options
Registrations options are specified through flags.  Definitions of flags will
be in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:36:00 -07:00
Ira Weiny 9ad13a4234 IB/mad: Add dev_notice messages for various umad/mad registration failures
Registration failures can be difficult to debug from userspace.  This
gives more visibility.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:36:00 -07:00
Ira Weiny 7ef5d4b046 IB/mad: Update module to [pr|dev]_* style print messages
Use dev_* style print when struct device is available.

Also combine previously line broken user-visible strings as per
Documentation/CodingStyle:

"However, never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

[ Remove PFX so the patch actually builds.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:35:39 -07:00
Ira Weiny f426a40eb6 IB/umad: Update module to [pr|dev]_* style print messages
Use dev_* style print when struct device is available.

Also combine previously line broken user-visible strings as per
Documentation/CodingStyle:

"However, never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 19:50:07 -07:00
Steve Wise 2f0304d218 RDMA/iwcm: Use a default listen backlog if needed
If the user creates a listening cm_id with backlog of 0 the IWCM ends
up not allowing any connection requests at all.  The correct behavior
is for the IWCM to pick a default value if the user backlog parameter
is zero.

Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
iwarp support without this fix.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:33:24 -07:00
Matan Barak 7e6edb9b2e IB/core: Add user MR re-registration support
Memory re-registration is a feature that enables changing the
attributes of a memory region registered by user-space, including PD,
translation (address and length) and access flags.

Add the required support in uverbs and the kernel verbs API.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:11:13 -07:00
Roland Dreier eeaddf3670 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-06-10 10:12:14 -07:00
Tatyana Nikolova 30dc5e63d6 RDMA/core: Add support for iWARP Port Mapper user space service
This patch adds iWARP Port Mapper (IWPM) Version 2 support.  The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections.  They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports.  The iWARP Port
Mapper provides a standard mechanism to accomplish this.  Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application.  If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space.  All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process.  Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection.  The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client.  In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
   ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
   generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
   corresponding to the two iWarp drivers - nes and cxgb4 which use
   the IWPM service

4) Enums are added to enumerate the attributes in the netlink
   messages, which are exchanged between the user space IWPM service
   and the iWARP drivers

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com>

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
  Carpenter and Steve Wise.  Fix sparse endianness in hash.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10 10:11:45 -07:00
Bart Van Assche 60e1751cb5 IB/umad: Fix use-after-free on close
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
triggers a use-after-free.  __fput() invokes f_op->release() before it
invokes cdev_put().  Make sure that the ib_umad_device structure is
freed by the cdev_put() call instead of f_op->release().  This avoids
that changing the port mode from IB into Ethernet and back to IB
followed by restarting opensmd triggers the following kernel oops:

    general protection fault: 0000 [#1] PREEMPT SMP
    RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
    Call Trace:
     [<ffffffff81190f20>] cdev_put+0x20/0x30
     [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
     [<ffffffff8118e35e>] ____fput+0xe/0x10
     [<ffffffff810723bc>] task_work_run+0xac/0xe0
     [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
     [<ffffffff814b8398>] int_signal+0x12/0x17

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6b58: IB/umad: Fix error handling
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-06 11:38:31 -07:00
Haggai Eran 584482ac80 IB/core: Fix kobject leak on device register error flow
The ports kobject isn't being released during error flow in device
registration.  This patch refactors the ports kobject cleanup into a
single function called from both the error flow in device registration
and from the unregistration function.

A couple of attributes aren't being deleted (iw_stats_group, and
ib_class_attributes).  While this may be handled implicitly by the
destruction of their kobjects, it seems better to handle all the
attributes the same way.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Make free_port_list_attributes() static.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05 09:37:10 -07:00
Haggai Eran cad6d02acc IB/core: Fix port kobject deletion during error flow
When encountering an error during the add_port function, adding a port
to sysfs, the port kobject is freed without being deleted from sysfs.

Instead of freeing it directly, the patch uses kobject_put to release
the kobject and delete it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:03:49 -07:00
Haggai Eran 373c0ea181 IB/core: Remove unneeded kobject_get/put calls
The ib_core module will call kobject_get on the parent object of each
kobject it creates.  This is redundant since kobject_add does that
anyway.

As a side effect, this patch should fix leaking the ports kobject and
the device kobject during unregister flow, since the previous code
didn't seem to take into account the kobject_get calls on behalf of
the child kobjects.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:03:49 -07:00
Roland Dreier 8385fd8414 IB/core: Fix sparse warnings about redeclared functions
Fix a few functions that are declared with __attribute_const__ in the
ib_verbs.h header file but defined without it in verbs.c.  This gets rid
of the following sparse warnings:

    drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers
    drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers
    drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers
    drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:01:42 -07:00
Roland Dreier 5343c00dd0 IB/mad: Fix sparse warning about gfp_t use
Properly convert gfp_t & result to bool to fix:

    drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types)
    drivers/infiniband/core/sa_query.c:621:33:    expected bool [unsigned] [usertype] preload
    drivers/infiniband/core/sa_query.c:621:33:    got restricted gfp_t

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-03 10:24:24 -07:00
Bart Van Assche 8ec0a0e6b5 IB/umad: Fix error handling
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
or if nonseekable_open() fails.

Avoid leaking a kref count, that sm_sem is kept down and also that the
IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
nonseekable_open() fails.

Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL.

Moving the kref_get() call from the start of ib_umad_*open() to the
end is safe since it is the responsibility of the caller of these
functions to ensure that the cdev pointer remains valid until at least
when these functions return.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>

[ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>

[ nonseekable_open() can't actually fail, but....  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:27:30 -07:00
Roland Dreier f7eaa7ed8f Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next 2014-04-03 08:30:17 -07:00
Moni Shoua b2853fd6c2 IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler
The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module.  It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 14:05:26 -07:00
Yan Burman 2c34e68f42 IB/mad: Check and handle potential DMA mapping errors
Running with DMA_API_DEBUG enabled and not checking for DMA mapping
errors triggers a kernel stack trace with "DMA-API: device driver
failed to check map error" message.  Add these checks to the MAD
module, both to be be more robust and also eliminate these
false-positive stack traces.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 10:36:07 -07:00
Sagi Grimberg 1b01d33560 IB/core: Introduce signature verbs API
Introduce a verbs interface for signature-related operations.  A
signature handover operation configures the layouts of data and
protection attributes both in memory and wire domains.

Signature operations are:

- INSERT:
  Generate and insert protection information when handing over
  data from input space to output space.
- validate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will offload
data integrity generation/validation while performing the actual data
transfer.

Additions:

1. HCA signature capabilities in device attributes
    Verbs provider supporting signature handover operations fills
    relevant fields in device attributes structure returned by
    ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
    Creating a QP that will carry signature handover operations may
    require some special preparations from the verbs provider.  So we
    add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
    created QP may carry out signature handover operations.  Expose
    signature support to verbs layer (no support for now).

3. New send work request IB_WR_REG_SIG_MR
    Signature handover work request. This WR will define the signature
    handover properties of the memory/wire domains as well as the
    domains layout. The purpose of this work request is to bind all
    the needed information for the signature operation:

    - data to be transferred:  wr->sg_list (ib_sge).
      * The raw data, pre-registered to a single MR (normally, before
        signature, this MR would have been used directly for the data
        transfer)
    - data protection guards: sig_handover.prot (ib_sge).
      * The data protection buffer, pre-registered to a single MR, which
        contains the data integrity guards of the raw data blocks.
        Note that it may not always exist, only in cases where the user is
        interested in storing protection guards in memory.
    - signature operation attributes: sig_handover.sig_attrs.
      * Tells the HCA how to validate/generate the protection information.

    Once the work request is executed, the memory region that will
    describe the signature transaction will be the sig_mr.  The
    application can now go ahead and send the sig_mr.rkey or use the
    sig_mr.lkey for data transfer.

4. New Verb ib_check_mr_status
    check_mr_status verb checks the status of the memory region post
    transaction.  The first check that may be used is
    IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
    errors are pending for a specific signature-enabled ib_mr.  This
    verb is a lightwight check and is allowed to be taken from
    interrupt context.  An application must call this verb after it is
    known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg 17cd3a2db8 IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Yishai Hadas eeb8461e36 IB: Refactor umem to use linear SG table
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-04 10:34:28 -08:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Roland Dreier fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Roland Dreier 8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Or Gerlitz 05633102d8 IB/core: Fix unused variable warning
Fix the below "make W=1" build warning:

    drivers/infiniband/core/iwcm.c: In function ‘destroy_cm_id’:
    drivers/infiniband/core/iwcm.c:330: warning: variable ‘ret’ set but not used

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 22:55:04 -08:00
Somnath Kotur 5462eddd7a RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal()
If addr is not a linklocal address, the code incorrectly fails to
return and ends up assigning the scope ID to the scope id of the
address, which is wrong.  Fix by checking if it's a link local address
first, and immediately return 0 if not.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 22:49:17 -08:00
Wei Yongjun 990acea616 IB/cm: Fix missing unlock on error in cm_init_qp_rtr_attr()
Add the missing unlock before return from function cm_init_qp_rtr_attr()
in the error handling case.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Matan Barak 2f85d24e60 IB/core: Make ib_addr a core IB module
IP based addressing introduces the usage of rdma_addr_find_dmac_by_grh()
within ib_core.  Since this function is declared in ib_addr, ib_addr
should be a part of the core INFINIBAND modules, rather than
INFINIBAND_ADDR_TRANS.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Or Gerlitz ed4c54e5b4 IB/core: Resolve Ethernet L2 addresses when modifying QP
Existing user space applications provide only IBoE L3 address
attributes to the kernel when they issue a modify QP modify.  To work
with them and let such apps (plus kernel consumers which don't use the
RDMA-CM) keep working transparently under the IBoE GID IP addressing
changes, add an Eth L2 address resolution helper.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Moni Shoua 7b85627b9f IB/cma: IBoE (RoCE) IP-based GID addressing
Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 14:12:35 -08:00
Upinder Malhi 5db5765e25 IB/core: Add support for RDMA_NODE_USNIC_UDP
Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Aruna-Hewapathirane 63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Matan Barak dd5f03beb4 IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:20:54 -08:00
Upinder Malhi 248567f793 IB/core: Add RDMA_TRANSPORT_USNIC_UDP
Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:45 -08:00
Roland Dreier 22f12c60e1 Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus 2013-12-23 09:19:02 -08:00
Yann Droneaud 6cc3df840a IB/uverbs: Check access to userspace response buffer in extended command
This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions.  If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.

This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.

Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:

https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502

  Make read/write always do the full "access_ok()" tests.

  The actual user copy will do them too, but only for the
  range that ends up being actually copied. That hides
  bugs when the range has been clamped by file size or other
  issues.

Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.

Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:34 -08:00
Yann Droneaud 6bcca3d4a3 IB/uverbs: Check input length in flow steering uverbs
Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:

- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
  length.

This will ensure the newly added flow steering uverbs don't try to
process truncated commands.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:33 -08:00
Yann Droneaud 98a37510ec IB/uverbs: Set error code when fail to consume all flow_spec items
If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.

This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:33 -08:00
Yann Droneaud c780d82a74 IB/uverbs: Check reserved fields in create_flow
As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:32 -08:00
Yann Droneaud 2782c2d302 IB/uverbs: Check comp_mask in destroy_flow
Just like the check added to create_flow in 22878dbc91 ("IB/core:
Better checking of userspace values for receive flow steering"),
comp_mask must be checked in destroy_flow too.

Since only empty comp_mask is currently supported, any other value
must be rejected.

This check was silently added in a previous patch[1] to move comp_mask
in extended command header, part of previous patchset[2] against
create/destroy_flow uverbs. The idea of moving comp_mask to the header
was discarded for the final patchset[3].

Unfortunately the check added in destroy_flow uverb was not integrated
in the final patchset.

[1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com
[2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com
[3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

Cc: Matan Barak <matanb@mellanox.com>
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:31 -08:00
Yann Droneaud 7efb1b19b3 IB/uverbs: Check reserved field in extended command header
As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:30 -08:00