Increment version number for DMEM toleration.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dma_sync_single() is deprecated now, and the use in mthca is wrong:
there should be a dma_sync_single_for_cpu() before touching the memory
from the CPU, and a dma_sync_single_for_device() afterwards. Fix
this, prompted by a kick in the pants from a patch from FUJITA
Tomonori <fujita.tomonori@lab.ntt.co.jp>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During cluster testing, one QP was not closed, as FIN is not handled
properly when its rexmit count expires or in some cases when RST is is
received after sending FIN. The reason is that the cm_id does not get
decremented under these conditions.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In nes_query_device(), max_qp_init_rd_atom is incorrectly set to
max_qp_wr. This was found when a test application had a dapl async
event error.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This prevents the memcpy() of a guid_entries element using a negative index.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement toleration of dynamic memory operations and 16 GB gigantic
pages, where "toleration" means that the driver can cope with dynamic
memory operations that happen before the driver is loaded. While the
ehca driver is loaded, dynamic memory operations are still prohibited
by returning NOTIFY_BAD from the memory notifier.
On module load the driver walks through available system memory,
checks for available memory ranges and then registers the kernel
internal memory region accordingly. The translation of address ranges
is implemented via a 3-level busmap.
Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: general@lists.openfabrics.org
Cc: Christoph Raisch <raisch@de.ibm.com>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.
Cc: general@lists.openfabrics.org
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mlx4: Add strong ordering to local inval and fast reg work requests
IB/ehca: Remove superfluous bitmasks from QP control block
RDMA/cxgb3: Limit fast register size based on T3 limitations
RDMA/cxgb3: Report correct port state and MTU
mlx4_core: Add module parameter for number of MTTs per segment
IB/mthca: Add module parameter for number of MTTs per segment
RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
infiniband: Remove void casts
IB/ehca: Increment version number
IB/ehca: Remove unnecessary memory operations for userspace queue pairs
IB/ehca: Fall back to vmalloc() for big allocations
IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice. Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time.
Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ConnectX Programmer's Reference Manual states that the "SO" bit
must be set when posting Fast Register and Local Invalidate send work
requests. When this bit is set, the work request will be executed
only after all previous work requests on the send queue have been
executed. (If the bit is not set, Fast Register and Local Invalidate
WQEs may begin execution too early, which violates the defined
semantics for these operations)
This fixes the issue with NFS/RDMA reported in
<http://lists.openfabrics.org/pipermail/general/2009-April/059253.html>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
All the fields in the control block are nicely right-aligned, so no
masking is necessary.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last two drivers that need skb->dst in their start_xmit() function
Tell dev_hard_start_xmit() to no release it by unsetting IFF_XMIT_DST_RELEASE
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
T3 firmware only supports one WRs worth of page list for fast register
work requests. The driver currently allows 2 WRs worth, which
doesn't work for T3, so reduce the limit in the driver.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control. As a result, the size of memory that can be
registered is limited too. This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a task did not complete normally due to a TMF, libiscsi will
now complete the task with the state ISCSI_TASK_ABRT_TMF. Drivers
like bnx2i that need to free resources if a command did not complete normally
can then check the task state. If a driver does not need to send
a special command if we have dropped the session then they can check
for ISCSI_TASK_ABRT_SESS_RECOV.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When we create the tcp/ip connection by calling ep_connect, we currently
just go by the routing table info.
I think there are two problems with this.
1. Some drivers do not have access to a routing table. Some drivers like
qla4xxx do not even know about other ports.
2. If you have two initiator ports on the same subnet, the user may have
set things up so that session1 was supposed to be run through port1. and
session2 was supposed to be run through port2. It looks like we could
end with both sessions going through one of the ports.
Fixes for cxgb3i from Karen Xie.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered. So use trylock and syscall_restart to avoid this
deadlock.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With a postfix increment, i is incremented one past 10K/5K before the
loop ends, so the error messages will be displayed too soon if the
test succeeds on the last iteration. Fix the comparisons to be >
instead of >=.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The queue map for flush completion circumvention is only used for
kernel space queue pairs. This patch skips the allocation of the
queue maps in case the QP is created for userspace. In addition, this
patch does not iomap the galpas for kernel usage if the queue pair is
only used in userspace. These changes will improve the performance of
creation of userspace queue pairs.
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of large queue pairs there is the possibillity of allocation
failures due to memory fragmentation when using kmalloc(). To ensure
the memory is allocated even if kmalloc() can not find chunks which
are big enough, we fall back to allocating the memory with vmalloc().
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To improve performance of driver resource allocation, replace
vmalloc() calls with kmalloc().
Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Don't overwrite fast registration page list when posting work request
RDMA/cxgb3: Don't complete flushed send work requests twice
The low-level mlx4 driver modified the page-list addresses for fast
register work requests post send to big-endian, and set a "present"
bit. This caused problems later when the consumer attempted to unmap
the pages using the page-list (using the list addresses which were
assumed to be still in CPU-endian order). Fix the mlx4 driver to
allocate two buffers and use a private buffer for the hardware-format
bus addresses.
This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1571>,
an NFS/RDMA server crash. The cause of the crash was found by Vu Pham
of Mellanox. The fix is along the lines suggested by Steve Wise in
comment #21 in bug 1571.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the SQ is flushed, mark the flushed entries as not signaled so
the poll logic doesn't re-insert the CQ entry thinking its an out of
order completion.
The bug can cause the NFS/RDMA server to crash due to processing the
same completed work request twice.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If reg_phys_mem() fails, we need to free memory allocated for MPA
frame with private data before returning the error. Also move
nes_add_ref() after the reg_phys_mem() is successful.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Running large cluster setup, we are hanging after many hours of
testing. Fixing this required going over the code and making sure the
rexmit entry was properly removed based on the cm_node's state and
packet received. Also when receiving a FIN packet, check seq# and
make sure there were no errors before calling handle_fin().
Following are the changes done in nes_cm.c:
* handle_ack_pkt() needs to return error value, so in case of error,
handle_fin() is not called. Some cleanup done while going over the code.
* handle_rst_pkt(), handling of cm_node's NES_CM_STATE_LAST_ACK is missing.
* process_packet(), in case of FIN only packet is received, call
check_seq() before processing.
* in handle_fin_pkt(), we are calling cleanup_retrans_entry() for all
conditions, even if the packets need to be dropped.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Under heavy load with large cluster testing, it may take longer to
receive a response to MPA requests. Change the driver to wait longer
after each rexmit to max time value.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
check_seq() was not checking if the seq#s have wrapped. Fix it.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a connect request comes, apbvt should only be set for
non-loopback connections.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove the NES_DEBUG that is causing the compile warning about an
unused variable when INFINIBAND_NES_DEBUG is not enabled.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
/sys/class/infiniband/nes?/fw_ver is not displaying firmware version
properly (it shows 0.0.0 with the current code). Fill in the correct
firmware version number.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With updated PHY firmware for SFP_D, setting the trace length to 1
inch for SFP_D provides a more stable link.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable repause timer for port 1. Without this setting, under stress,
the chip may misbehave.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In commit 1b949324 ("RDMA/nes: Fix SFP+ PHY initialization") there is
a mistake in the clean up code that removed port 1 CDR loop filter
settings for 10G cards other than CX4. Put the correct setting back
for appropriate PHY types.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change thermo mitigation code to flip the SerDes1 reference clock to
internal, to match the change in commit a4849fc1 ("RDMA/nes: Add
wide_ppm_offset parm for switch compatibility").
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set target can queue limit to the number of preallocated
session tasks we have.
This along with the cxgb3i can_queue patch will fix a throughput
problem where it could only queue one LU worth of data at a time.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In error paths where a CQ is not created, pbl is not freeed properly.
In nes_destroy_cq(), add the corresponding check for nescq->mcrqf to
not call nes_free_resource() when it is already done in nes_create_cq().
Signed-off-by: Miroslaw Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commands INIT_HCA, CLOSE_HCA, SYS_EN, SYS_DIS, and CLOSE_IB all have 1
second timeouts. For INIT_HCA this causes problems when had more than
2^18 are QPs configured, since the command takes more than 1 second to
complete.
All other commands have 60-second timeouts. This patch makes the
above commands consistent with the rest of the commands (and with the
chip documentation).
This patch is an expansion of a patch from Arthur Kepner
<akepner@sgi.com> fixing just the INIT_HCA timeout.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
QP attributes must stay initialized when moving back to IDLE. Zeroing
them will crash the system in _flush_qp() if the QP is subsequently
moved to ERROR and back to IDLE.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code incorrectly failed memory registration if the buffer was not
page aligned. Also, the length field is mangled causing the hardware
to think the registration is much larger than it really is.
The fix is to remove the page alignment restriction as well the
incorrect length adjustment. Also make sure that all buffers after
the first start at a page boundary, and all buffers except the last
end on a page boundary.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Initialize pbl_count_256 to 0 to get rid of the warning:
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_mr':
drivers/infiniband/hw/nes/nes_verbs.c:1955: warning: 'pbl_count_256' may be used uninitialized in this function
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If NAPI is enabled while IPoIB's CQ is being drained, it creates a
race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading
to memory corruption.
The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}()
instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED
flag instead on the ADMIN_UP flag. This way NAPI will be disabled when
ipoib_drain_cq() is called.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
NFS/RDMA currently fails to set up connections if peer2peer is on.
This is due to the fact that the NFS/RDMA client sets its ORD to 0.
If peer2peer is set, make sure the active side ORD is >= 1 and the
passive side IRD is >=1.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/nes: Add support for new SFP+ PHY
RDMA/nes: Add wide_ppm_offset parm for switch compatibility
RDMA/nes: Fix SFP+ PHY initialization
RDMA/nes: Fix nes_nic_cm_xmit() error handling
RDMA/nes: Fix error handling issues
RDMA/nes: Fix incorrect casts on 32-bit architectures
IPoIB: Document newish features
RDMA/cma: Create cm id even when IB port is down
RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups
IPoIB: Avoid free_netdev() BUG when destroying a child interface
mlx4_core: Don't leak mailbox for SET_PORT on Ethernet ports
RDMA/cxgb3: Release dependent resources only when endpoint memory is freed.
RDMA/cxgb3: Handle EEH events
IB/mlx4: Use pgprot_writecombine() for BlueFlame pages
Add new register settings for new SFP+ PHY/firmware.
Add new PHY to to nes_netdev_get/set_settings.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We have observed unstable link with a new BNT switch.
Add wide_ppm_offset parameter to allow the user to control the clock
ppm offset on the CX4 interface for better compatibility. Default is
100ppm, setting it to 1 will increase it to 300ppm. Change default
SerDes1 reference clock to external source.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
SFP+ PHY initialization has very long delays, incorrect settings for
direct attach copper cables, and inconsistent link detection.
Adjust delays to the minimum required by the PHY. Worst case is now
less than 4 seconds. Add new register settings for direct attach
cables. Change link detection logic to use two new registers for more
consistent link state detection. Reorganize code to shorten line
length.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We are getting crash or hung situation when we are running network
cable pull tests during RDMA traffic.
In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
returns failure. This is changed to success as skb is being put on
the timer routines to be processed later. In send_syn() case, we are
indicating connect failure once from nes_connect() and the other when
the rexmit retries expires.
The other issue is skb->users which we are incrementing before calling
nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
we are decrementing the skb->users at the same time putting the skb on
the rexmit path. Even if dev_queue_xmit() fails, the skb->users is
decremented already. We are removing the decrement of skb->users in
case of failure from both schedule_nes_timer() as well as from
nes_cm_timer_tick().
There is also extra check in nes_cm_timer_tick() for rexmit failure
which does a break from the loop is removed. This causes problem as
the other nodes have their cm_node->ref_count incremented and are not
processed.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix issues found by static code analysis:
(1) Check if cm_node was successfully created for loopback connection.
(2) schedule_nes_timer() does not free up allocated memory after
encountering an error. There is a WARN_ON() for this condition.
(3) there is a cm_node->freed flag which is set but not used.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The were some incorrect casts to unsigned long that caused 64-bit values
to be truncated on 32-bit architectures and made the driver pass invalid
adresses and lengths to the hardware. The problems were primarily seen
with kernels with highmem configured but some could show up in
non-highmem kernels, too.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When doing rdma_resolve_addr(), if the relevant IB port is down, the
function fails and the cm_id is not bound to the correct device.
Therefore, application does not have a device handle and cannot wait
for the port to become active. The function fails because the
underlying IPoIB interface is not joined to the broadcast group and
therefore the SA does not have a multicast record to take a Q_Key
from.
The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
id_priv->qkey if it was not set, and will be called just before the
Q_Key is really required.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When joining an IPoIB multicast group, use the same rate as in the
broadcast group. Otherwise, if the RDMA CM creates this group before
IPoIB does, it might get a different rate. This will cause IPoIB to
fail joining to the same group later on, because IPoIB uses strict
rate selection.
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We have to release the RTNL before calling free_netdev() so that the
device state has a chance to become NETREG_UNREGISTERED. Otherwise
when removing a child interface, we hit the BUG() that tests the
device state in free_netdev().
Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxgb3 l2t entry, hwtid, and dst entry were being released before
all the iwch_ep references were released. This can cause a crash in
t3_l2t_send_slow() and other places where the l2t entry is used.
The fix is to defer releasing these resources until all endpoint
references are gone.
Details:
- move flags field to the iwch_ep_common struct.
- add a flag indicating resources are to be released.
- release resources at endpoint free time instead of close/abort time.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- wrap calls into cxgb3 and fail them if we're in the middle
of a PCI EEH event.
- correctly unwind and release endpoint and other resources when
we are in an EEH event.
- dispatch IB_EVENT_DEVICE_FATAL event when cxgb3 notifies iw_cxgb3 of
a fatal error.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The PAT work on x86 has finally made pgprot_writecombine() a usable API
for modular drivers. As the comment indicates, this is exactly what we
want to use in mlx4_ib to map BlueFlame pages up to userspace, since
using WC for these pages improves small message latency significantly.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When net-next and infiniband were merged upstream, each branch deleted
one of a pair of adjacent lines from nes_nic.c, but when Linus fixed the
conflict up, he brought back both of the lines. Fix up to the intended
final tree state.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
ixgbe: Allow Priority Flow Control settings to survive a device reset
net: core: remove unneeded include in net/core/utils.c.
e1000e: update version number
e1000e: fix close interrupt race
e1000e: fix loss of multicast packets
e1000e: commonize tx cleanup routine to match e1000 & igb
netfilter: fix nf_logger name in ebt_ulog.
netfilter: fix warning in ebt_ulog init function.
netfilter: fix warning about invalid const usage
e1000: fix close race with interrupt
e1000: cleanup clean_tx_irq routine so that it completely cleans ring
e1000: fix tx hang detect logic and address dma mapping issues
bridge: bad error handling when adding invalid ether address
bonding: select current active slave when enslaving device for mode tlb and alb
gianfar: reallocate skb when headroom is not enough for fcb
Bump release date to 25Mar2009 and version to 0.22
r6040: Fix second PHY address
qeth: fix wait_event_timeout handling
qeth: check for completion of a running recovery
qeth: unregister MAC addresses during recovery.
...
Manually fixed up conflicts in:
drivers/infiniband/hw/cxgb3/cxio_hal.h
drivers/infiniband/hw/nes/nes_nic.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
RDMA/cxgb3: Enforce required firmware
IB/mlx4: Unregister IB device prior to CLOSE PORT command
mlx4_core: Add link type autosensing
mlx4_core: Don't perform SET_PORT command for Ethernet ports
RDMA/nes: Handle MPA Reject message properly
RDMA/nes: Improve use of PBLs
RDMA/nes: Remove LLTX
RDMA/nes: Inform hardware that asynchronous event has been handled
RDMA/nes: Fix tmp_addr compilation warning
RDMA/nes: Report correct vendor_id and vendor_part_id
RDMA/nes: Update copyright to new legal entity and year
RDMA/nes: Account for freed PBL after HW operation
IB: Remove useless ibdev_is_alive() tests from sysfs code
IB/sa_query: Fix AH leak due to update_sm_ah() race
IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp
IB/mad: initialize mad_agent_priv before putting on lists
IB/mad: Fix null pointer dereference in local_completions()
IB/mad: Fix RMPP header RRespTime manipulation
IB/iser: Remove hard setting of path MTU
mlx4_core: Add device IDs for MT25458 10GigE devices
...
The cxgb3 NIC driver can handle more firmware versions than iw_cxgb3,
and since commit 8207befa ("cxgb3: untie strict FW matching") cxgb3
will load with firmware versions that iw_cxgb3 can't handle. The FW
major number indicates a specific interface between the FW and
iw_cxgb3. Thus if the major number of the running firmware does not
match the required version compiled into iw_cxgb3, then iw_cxgb3 must
not register that device.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Also, removed unnecessary memset() since alloc_netdev returns
zeroed memory.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert this driver to new net_device_ops infrastructure.
Also use default net_device get-stats infrastructure
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the ConnectX programmer's reference manual, all
operations should be stopped, all QPs should be torn down and all WQEs
flushed before the CLOSE_PORT command is invoked. In some cases
reversing the order of operations (as implemented now) could cause
a loss of completions.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We do not need to have llds set the host no for the session's
parent, because we know the session's parent is going to be
the host. This removes it from the session creation callback
and converts the drivers.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The qdepth setting was useful when we needed libiscsi to verify
the setting. Now we just need to make sure if older tools
passed in zero then we need to set some default.
So this patch just has us use the sht->cmd_per_lun or if
for LLD does a host per session then we can set it on per
host basis.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
We were using the shost work queue which ended up being
a little akward since all iscsi hosts need a thread for
scanning, but only drivers hooked into libiscsi need
a workqueue for transmitting. So this patch moves the
xmit workqueue to the lib.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There is no need to cap the queue depth in the modules. We set
this in userspace and can do that there. For performance testing
with ram based targets, this is helpful since we can have very
high queue depths.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
iser has its own logging inrfastrucutre. Convert it to use
it instead of libiscsi.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
STag zero is a special STag that allows consumers to access any bus
address without registering memory. The nes driver unfortunately
allows STag zero to be used even with QPs created by unprivileged
userspace consumers, which means that any process with direct verbs
access to the nes device can read and write any memory accessible to
the underlying PCI device (usually any memory in the system). Such
access is usually given for cluster software such as MPI to use, so
this is a local privilege escalation bug on most systems running this
driver.
The driver was using STag zero to receive the last streaming mode
data; to allow STag zero to be disabled for unprivileged QPs, the
driver now registers a special MR for this data.
Cc: <stable@kernel.org>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While doing testing, there are failures as MPA Reject call is not
handled. To handle MPA Reject call, following changes are done:
*Handle inbound/outbound MPA Reject response message.
When nes_reject() is called for pending MPA request reply,
send the MPA Reject message to its peer (active
side)cm_node. The peer cm_node (active side) will indicate
Reject message event for the pending Connect Request.
*Handle MPA Reject response message for loopback connections and listener.
When MPA Request is rejected, check if it is a loopback
connection and if it is then it will send Reject message event
to its peer loopback node. Also when destroying listener,
check if the cm_nodes for that listener are loopback or not.
*Add gracefull connection close with the MPA Reject response message.
Send gracefull close (FIN, FIN ACK..) to terminate the cm_nodes.
*Some code re-org while making the above changes.
Removed recv_list and recv_list_lock from the cm_node
structure as there can be only one receive close entry on the
timer. Also implemented handle_recv_entry() as receive close
entry is processed from both nes_rem_ref_cm_node() as well as
nes_cm_timer_tick().
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Two level 256 byte PBLs was not implemented so the driver could report
out of memory when in fact there were PBLs still available.
This solution prefers to use 4KB PBLs over two level 256B PBLs until
the number of 4KB PBLs falls below a threshold. At this point the 4KB
PBL structure is converted to use 256B PBLs which prevents the driver
from running out of 4KB PBLs too quickly.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
NETIF_F_LLTX is deprecated. Remove private TX locking from the driver
and remove the NETIF_F_LLTX feature flag. This also fixes a warning
in some configs that comes from doing skb_linearize() call in the
hard_start_xmit method with IRQs disabled (if HIGHMEM is enabled,
skb_linearize() may end up enabling BHs, which is a no-no if hard IRQs
are disabled in that context). By getting rid of LLTX, we do not
disable IRQs when skb_linearize() is called.
Remove the sq_lock as it is not needed for non-LLTX. Fix ethtool not
to show the counter for sq_lock.
Reported-by: aluno3@poczta.onet.pl
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When asynchronous events are processed by software, it is necessary
to let the hardware know that software has handled the event. This
frees up the entry in the asynchronous event queue.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In find_node(), tmp_addr causes an "unused variable" warning when
INFINIBAND_NES_DEBUG is not defined. It's only used in a nes_debug()
and the print does not make sense. So take out the whole thing.
Reported-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ibv_devinfo displays 0 for vendor_id and vendor_part_id. Fill in OUI
and device_id for those two fields.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update copyright to the new legal entity, Intel-NE, Inc., an Intel
company. Update copyright for the new year.
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix occurrences where the software PBL counts were changed before the
hardware was updated. This bug allowed another thread to overallocate
the hardware resources.
Add proper PBL accounting in case nes_reg_mr() fails.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some attribute show functions test ibdev_is_alive() to make sure that
it's OK to access device state. However, the sysfs attributes will
not be registered until the device is fully initialized, and they'll
be unregistered before anything is torn down, so ibdev_is_alive()
doesn't do anything useful. Remove it.
Signed-off-by: Roland Dreier <rolandd@cisco.com>