* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mlx4: Add strong ordering to local inval and fast reg work requests
IB/ehca: Remove superfluous bitmasks from QP control block
RDMA/cxgb3: Limit fast register size based on T3 limitations
RDMA/cxgb3: Report correct port state and MTU
mlx4_core: Add module parameter for number of MTTs per segment
IB/mthca: Add module parameter for number of MTTs per segment
RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
infiniband: Remove void casts
IB/ehca: Increment version number
IB/ehca: Remove unnecessary memory operations for userspace queue pairs
IB/ehca: Fall back to vmalloc() for big allocations
IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice. Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time. This is the same bug that was reported in ib_mthca by Yinghai
Lu <yhlu.kernel@gmail.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If mlx4_create_eq() would fail for one of EQ's assigned for
completion handling, the code would try to free the same EQ
we failed to create.
The crash was found by Christoph Lameter
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default the driver opens 8 TX queues (defined by MLX4_EN_NUM_TX_RINGS).
If the driver is configured to support Per Priority Flow Control, we open
8 additional TX rings.
dev->real_num_tx_queues is always set to be MLX4_EN_NUM_TX_RINGS.
The mlx4_en_select_queue() function uses standard hashing (skb_tx_hash)
in case that PPFC is not supported or the skb contain a vlan tag,
otherwise the queue is selected according to vlan priority.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt moderation should not depend on number of incoming
bytes, but on number of incoming packets.
The previous scheme caused very high interrupts rate for small
messages when big MTU was configured.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the initialization of one of the ports failed,
there is no need to fail the other one as well.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
en_params.c file now only handles Ethtool functionality
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each debug message, the message will show interface name in case
that the net device was registered, and PCI bus ID with port number
if we were not registered yet. Messages that are not port/netdev specific
stayed in the old format
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces dma_sync_single() with dma_sync_single_for_cpu() because
dma_sync_single() is an obsolete API; include/linux/dma-mapping.h says:
/* Backwards compat, remove in 2.7.x */
#define dma_sync_single dma_sync_single_for_cpu
#define dma_sync_sg dma_sync_sg_for_cpu
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commits 9d21493b4b
and 08baf56108
(net: tx scalability works : trans_start)
(net: txq_trans_update() helper)
Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Multi queue drivers can
avoid one cache miss (on dev->trans_start) in their start_xmit()
handler.
Exceptions are NETIF_F_LLTX drivers (vxge & tehuti)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control. As a result, the size of memory that can be
registered is limited too. This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of allocation failure, the actual ring size is rounded down to
nearest power of 2. The remaining descriptors are freed.
The CQ and SRQ are allocated with the actual size and the mask is updated.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Napi structures are being created each time we open a port, but when
the port is closed the napi structure is only disabled but not removed.
This bug caused hang while removing the driver.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the return after the goto. We want the goto because it frees
memory as well as returning err.
Found by smatch (http://repo.or.cz/w/smatch.git).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
e100: do not go D3 in shutdown unless system is powering off
netfilter: revised locking for x_tables
Bluetooth: Fix connection establishment with low security requirement
Bluetooth: Add different pairing timeout for Legacy Pairing
Bluetooth: Ensure that HCI sysfs add/del is preempt safe
net: Avoid extra wakeups of threads blocked in wait_for_packet()
net: Fix typo in net_device_ops description.
ipv4: Limit size of route cache hash table
Add reference to CAPI 2.0 standard
Documentation/isdn/INTERFACE.CAPI
update Documentation/isdn/00-INDEX
ixgbe: Fix WoL functionality for 82599 KX4 devices
veth: prevent oops caused by netdev destructor
xfrm: wrong hash value for temporary SA
forcedeth: tx timeout fix
net: Fix LL_MAX_HEADER for CONFIG_TR_MODULE
mlx4_en: Handle page allocation failure during receive
mlx4_en: Fix cleanup flow on cq activation
vlan: update vlan carrier state for admin up/down
netfilter: xt_recent: fix stack overread in compat code
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (21 commits)
RDMA/nes: Update iw_nes version
RDMA/nes: Fix error path in nes_accept()
RDMA/nes: Fix hang issues for large cluster dynamic connections
RDMA/nes: Increase rexmit timeout interval
RDMA/nes: Check for sequence number wrap-around
RDMA/nes: Do not set apbvt entry for loopback
RDMA/nes: Fix unused variable compile warning when INFINIBAND_NES_DEBUG=n
RDMA/nes: Fix fw_ver in /sys
RDMA/nes: Set trace length to 1 inch for SFP_D
RDMA/nes: Enable repause timer for port 1
RDMA/nes: Correct CDR loop filter setting for port 1
RDMA/nes: Modify thermo mitigation to flip SerDes1 ref clk to internal
RDMA/nes: Fix resource issues in nes_create_cq() and nes_destroy_cq()
RDMA/nes: Remove root_256()'s unused pbl_count_256 parameter
mlx4_core: Fix memory leak in mlx4_enable_msi_x()
IB/mthca: Fix timeout for INIT_HCA and a few other commands
RDMA/cxgb3: Don't zero QP attrs when moving to IDLE
RDMA/nes: Fix bugs in nes_reg_phys_mr()
RDMA/nes: Fix compiler warning at nes_verbs.c:1955
IPoIB: Disable NAPI while CQ is being drained
...
If we failed to allocate new fragments for receive buffer,
the packet should be dropped and packets should be reused.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of mlx4_en_activate_cq() failure, the cleanup
code would go to rx_err and try to disable unactivated rings.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the msi_x option is enabled but pci_enable_msix() fails (not
enough vectors are available etc), the entries array was not freed on
the error path.
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin@ext.bull.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If creating a workqueue fails, don't jump to the error path where that
same workqueue is destroyed, since destroy_workqueue() can't handle a
NULL pointer.
This was spotted by the Coverity checker (CID 2617).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The per ring counters are implemented in SW. Now moving to have the total
counters as the sum of all rings. This way the numbers will always be consistent
and we no longer depend on HW buffer size limitations for those counters
that can be insufficient in some cases.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The former usage was to set the NETIF_F_HW_CSUM flag which is not used
in get_tx_csum. It caused Ethtool to show tx checksum as "on" even
though it was turned off in previous operation.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The low level driver always assumes this handler exists.
The lack of it could cause kernel panic
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The query whether the port is up or not should be done at
the execution of the restart task and not when it is queued.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of failure of either srq creation or page allocation,
the cleanup code handled the failed ring as well, and tried
to destroy resources that where not allocated.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/nes: Add support for new SFP+ PHY
RDMA/nes: Add wide_ppm_offset parm for switch compatibility
RDMA/nes: Fix SFP+ PHY initialization
RDMA/nes: Fix nes_nic_cm_xmit() error handling
RDMA/nes: Fix error handling issues
RDMA/nes: Fix incorrect casts on 32-bit architectures
IPoIB: Document newish features
RDMA/cma: Create cm id even when IB port is down
RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups
IPoIB: Avoid free_netdev() BUG when destroying a child interface
mlx4_core: Don't leak mailbox for SET_PORT on Ethernet ports
RDMA/cxgb3: Release dependent resources only when endpoint memory is freed.
RDMA/cxgb3: Handle EEH events
IB/mlx4: Use pgprot_writecombine() for BlueFlame pages
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in. In particular, all
delayed works which want to rearm themselves will have to do that. So it
would seem fair to offer a helper function for this operation.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 793730bf ("mlx4_core: Don't perform SET_PORT command for
Ethernet ports") introduced a leak of mailbox buffers when SET_PORT
was called for Ethernet ports, since it added a return after the
mailbox was allocated. Fix this by checking the port type and
returning *before* allocating the mailbox.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
ixgbe: Allow Priority Flow Control settings to survive a device reset
net: core: remove unneeded include in net/core/utils.c.
e1000e: update version number
e1000e: fix close interrupt race
e1000e: fix loss of multicast packets
e1000e: commonize tx cleanup routine to match e1000 & igb
netfilter: fix nf_logger name in ebt_ulog.
netfilter: fix warning in ebt_ulog init function.
netfilter: fix warning about invalid const usage
e1000: fix close race with interrupt
e1000: cleanup clean_tx_irq routine so that it completely cleans ring
e1000: fix tx hang detect logic and address dma mapping issues
bridge: bad error handling when adding invalid ether address
bonding: select current active slave when enslaving device for mode tlb and alb
gianfar: reallocate skb when headroom is not enough for fcb
Bump release date to 25Mar2009 and version to 0.22
r6040: Fix second PHY address
qeth: fix wait_event_timeout handling
qeth: check for completion of a running recovery
qeth: unregister MAC addresses during recovery.
...
Manually fixed up conflicts in:
drivers/infiniband/hw/cxgb3/cxio_hal.h
drivers/infiniband/hw/nes/nes_nic.c
When a port's link is down (except to driver restart) and the port is
configured for auto sensing, we try to sense port link type (Ethernet
or InfiniBand) in order to determine how to initialize the port. If
the port type needs to be changed, all mlx4 for the device interfaces
are unregistered and then registered again with the new port
types. Sensing is done with intervals of 3 seconds.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The same operation is performed when the Ethernet driver initializes
the port.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix
drivers/net/mlx4/profile.c: In function `mlx4_make_profile':
drivers/net/mlx4/profile.c:110: warning: comparison of distinct pointer types lacks a cast
This happened because num_possible_cpus() was secretly changed by
commit ae7a47e7 ("cpumask: make cpumask.h eat its own dogfood.") from
returning "int" to (now) returning "unsigned int". I think that was a
good change, so we should just swallow the fallout.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Vladimir Sokolovsky <vlad@mellanox.co.il>
Cc: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: Add dependency on INFINIBAND_ADDR_TRANS
IPoIB: Do not join broadcast group if interface is brought down
RDMA/nes: Fix for NIPQUAD removal
IPoIB: Fix loss of connectivity after bonding failover on both sides
IB/mlx4: Don't register IB device for adapters with no IB ports
mlx4_core: Fix warning from min()
IB/ehca: spin_lock_irqsave() takes an unsigned long
Some devices were converted incorrectly and are missing the validate
address hooks.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent cpumask changes changed num_possible_cpus() from returning an int
to returning an unsigned int. This means that doing
min(num_possible_cpus(), <int expression>)
now produces a warning like
drivers/net/mlx4/main.c: In function 'mlx4_enable_msi_x':
drivers/net/mlx4/main.c:915: warning: comparison of distinct pointer types lacks a cast
Fix this by using min_t(int, ...).
Signed-off-by: Roland Dreier <rolandd@cisco.com>