First of a series of patches which add support for direct userspace access to
InfiniBand hardware -- so-called "userspace verbs." I believe these patches
are ready to merge, but a final review would be useful.
These patches should incorporate all of the feedback from the discussion when
I posted an earlier version back in April (see
http://lkml.org/lkml/2005/4/4/267 for the start of the thread). In
particular, memory pinned for use by userspace is accounted for in
current->mm->vm_locked and requests to pin memory are checked against
RLIMIT_MEMLOCK.
This patch:
Modify the ib_verbs.h header file with changes required for InfiniBand
userspace verbs support. We add a few structures to keep track of userspace
context, and extend the driver API so that low-level drivers know when they're
creating resources that will be used from userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of fields with size_bits == 64. Pointed out by Hal Rosenstock.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a copy of the id we'll return to the consumer so that we don't
dereference query->sa_query after calling send_mad(). A completion may
occur very quickly and end up freeing the query before we get to do
anything after send_mad().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's about time for a version bump.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Future versions of Mellanox HCA firmware will require command mailboxes to be
aligned to 4K. Support this by using a pci_pool to allocate all mailboxes.
This has the added benefit of shrinking the source and text of mthca.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Free page_list buffer on error path of mthca_reg_phys_mr().
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split allocation of MTT range from creation of MR. This will be useful for
implementing shared memory regions and userspace verbs.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make mthca_table_put() and mthca_table_put_range() NOPs if the device is not
mem-free, so that we don't have to have "if (mthca_is_memfree())" tests in the
callers of these functions. This makes our code more readable and
maintainable, and saves a couple dozen bytes of text in ib_mthca.ko as well.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix memset to use sizeof *props instead of just sizeof props.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for unreliable connected (UC) transport to mthca driver:
- Add attributes for UC to modify QP table.
- Add support for posting UC work requests.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mthca apparently had the meanings of the max_rd_atomic and max_dest_rd_atomic
QP attributes backwards. max_rd_atomic limits the maximum number of
outstanding RDMA/atomic requests as an initiator (on a send queue), and
max_dest_rd_atomic specifies the resources allocated to handle RMDA/atomic
requests from the remote end of the connection. We were programming our QP
context with these values swapped.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix offset of static_rate in QP context. Pointed out by Dror Goldenberg.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch all allocations of coherent memory from pci_alloc_consistent() to
dma_alloc_coherent(), so that we can pass GFP_KERNEL. This should help when
the system is low on memory.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up CQ debugging code: make dump_cqe print on one line, and only dump
error CQ entries for local operation errors.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix incorrect cut-n-paste in error messages.
- Add missing newlines in error messages.
- Use DRV_NAME instead of "ib_mthca" in a couple of places.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add Sun copyright to files modified by Tom Duffy.
Signed-off-by: Tom Duffy <tduffy@sun.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs: fix the rest of the kernel so if an attribute doesn't
implement show or store method read/write will return
-EIO instead of 0 or -EINVAL or -EPERM.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make MTU field in SA PathRecord and MCMemberRecord a u8 rather than an enum
to avoid complications with endianness.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Free all unclaimed MAD receive buffers when userspace closes our file so we
don't leak memory.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check if a client passes a NULL callback into an SA query, and if so, never
call back. This fixes an oops if someone unloads ib_ipoib and ib_sa in
rapid succession. ib_ipoib does an MCMember delete with a NULL callback
and 0 timeout on unload, which is usually fine since the delete completes
successfully. However, if ib_sa is unloaded immediately afterwards, the
delete will be canceled and ib_sa will try to call the (now already
unloaded) ib_ipoib module back with the cancel completion, which triggers
the oops.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix order of #include lines in mthca_memfree.c
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correct unwinding in error path of mthca_init_icm().
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Decouple table of HCA features from exact HCA device type. Add a current FW
version field so we can warn when someone is using old FW. Add support for
new MT25204 HCA.
Remove the warning about mem-free support, since it should be pretty solid at
this point.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix RDMA in mem-free mode: we need to make sure that the RDMA context memory
is mapped for the HCA.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update initialization of receive queue to match new documentation. This
change is required to support new MT25204 HCA.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up mem-free mode support by introducing mthca_is_memfree() function,
which encapsulates the logic of deciding if a device is mem-free.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor tweaks to firmware command handling: kill off an unused get of a value,
and add a little more info to debug output.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement fast memory regions (FMRs), where the driver writes directly into
the HCA's translation tables rather than requiring a firmware command. For
Tavor, MTTs for FMR are separate from regular MTTs, and are reserved at driver
initialization. This is done to limit the amount of virtual memory needed to
map the MTTs. For Arbel, there's no such limitation, and all MTTs and MPTs
may be used for FMR or for regular MR.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split Tavor and Arbel/mem-free index<->hw key munging routines, so that FMR
implementation can call correct implementation without testing HCA type (which
it already knows).
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add mthca_table_find() function, which returns the lowmem address of an entry
in a mem-free HCA's context tables. This will be used by the FMR
implementation.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add code for SYNC_TPT firmware command, which will be used by FMR
implementation.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add mthca_write64_raw() function, which will be used to write FMR entries that
are in ioremapped PCI memory.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Encapsulate the buddy allocator used for MTT segments. This cleans up the
code and also gets us ready to add FMR support.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make address handle verbs usable from interrupt context.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fill in missing fields in send completions.
Signed-off-by: Itamar Rabenstein <itamar@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix bug in MTT allocation in mem-free mode.
I misunderstood the MTT size value returned by the firmware -- it is really
the size of a single MTT entry, since mem-free mode does not segment the MTT
as the original firmware did. This meant that our MTT addresses ended up
being off by a factor of 8. This meant that our MTT allocations might
overlap, and so we could overwrite and corrupt earlier memory regions when
writing new MTT entries.
We fix this by always using our 64-byte MTT segment size. This allows some
simplification of the code as well, since there's no reason to put the MTT
segment size in a variable -- we can always use our enum value directly.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add code to support RDMA and atomic send work requests in mem-free mode.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CQ numbers are only 24 bits, so only print 6 hex digits and mask off reserved
part when reporting a CQ event.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On error path, only free doorbell records if we're in mem-free mode.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Print IRQ number when NOP command interrupt test fails to help debugging.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Release mutex on error return path from mthca_alloc_db().
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix error handling in MR allocation for mem-free mode: mthca_free must get an
MR index, not a key.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Doorbell record pages are allocated in HCA page size chunks (always 4096
bytes), so we need to divide by 4096 and not PAGE_SIZE when figuring out how
many pages we'll need space for.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's cleaner to kfree mthca_mr, and not rely on the fact that ib_mr is the
first field in mthca_mr.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The first buffer of a memory region is not required to be page-aligned, so
don't return an error if it's not.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When posting a work request with immediate data, put the immediate data in the
immediate data field of the hardware's work request (rather than overwriting
the flags field).
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix calculation of rdb_shift by using original number of QPs, not
their slot in profile[] (which will be rearranged when we sort it).
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement more of the device_query method in mthca.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In mem-free mode, when allocating memory regions, make sure that the HCA has
context memory mapped to cover the virtual space used for the MPT and MTTs
being used.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of MAD agent registrations with mgmt_class == 0. In this case
ib_umad should pass a NULL registration request to the MAD core rather than a
request with mgmt_class set to 0.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mask bits correctly from jhash result in ib_fmr_hash() so that the
computed bucket index is within our hash table. This fixes an SDP
crash.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate no longer needed include files
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace the *wc field in ib_mad_recv_wc from pointing to a structure on the
stack to one allocated with the received MAD buffer. This allows a client to
access the *wc field after their receive completion handler has returned.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert IPoIB to use debugfs instead of its own custom debugging filesystem.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correct and simplify calculation of static rate. We need to round up the
quotient of (local_rate - path_rate) / path_rate. To round up we add
(path_rate - 1) to the numerator, so the quotient simplifies to (local_rate -
1) / path_rate.
No idea how I came up with the old formula.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set skb->mac.raw on receive. This fixes crashes when this is
dereferenced, for example by netfilter or when PF_PACKET is used.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!