Commit Graph

251 Commits

Author SHA1 Message Date
David J. Wilder 0cd4d0fd9b IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()
IPoIB can miss a change in destination GID under some conditions.  The
problem is caused when ipoib_neigh->dgid contains a stale address.
The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().

This can happen when a system using bonding on its IPoIB interfaces
has switched its active interface from interface A to B and back to A.
The system that fails over will not correctly processes the 2nd
address change, as described below.

When an address has changed neighbor->ha is updated with the new
address.  Each neighbor has an associated ipoib_neigh.
ipoib_neigh->dgid also holds a copy of the remote node's hardware
address.  When an address changes neighbor->ha is updated by the
network layer (arp code) with the new address.  IPoIB detects this
change in ipoib_start_xmit() by comparing neighbor->ha with
ipoib_neigh->dgid.  The bug is that ipoib_neigh->dgid may already
contain the new address (A) thus the change from B to A is missed by
ipoib.  Here is the sequence of events:

    ipoib_neigh->dgid = A  and  neighbor->ha = A

The address is switched to B (the first switch)

    neighbor->ha = B

The change is seen in ipoib_start_xmit() -- neighbor->ha !=
ipoib_neigh->dgid so ipoib_neigh is released, and a new one is
allocated.

The allocator may return the same chunk of memory that was just
released, therefore ipoib_neigh->dgid still contains A at this point.

ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
following conditions are true dgid is not updated:

        1) __path_find() returns a path
        2) path->ah is NULL

The remote system now switches from address B to A, neighbor->ha is
updated to A.

Now we have again : ipoib_neigh->dgid = A  and  neighbor->ha = A

Since the addresses are the same ipoib won't process the change in
address.  Fix this by zeroing out the dgid field when allocating a new
struct ipoib_neigh.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 10:03:00 -08:00
Linus Torvalds d7757be133 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Don't turn on carrier for a non-active port
  IB/mthca: Fix access to freed memory in catastrophic event handling
  mlx4_core: Pass cache line size to device FW
  RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
  IB/mad: Fix lock-lock-timer deadlock in RMPP code
2009-09-24 17:06:01 -07:00
Moni Shoua 5ee9512084 IPoIB: Don't turn on carrier for a non-active port
Multicast joins can succeed even if the IB port is down.  This happens
when the SM runs on the same port with the requesting port.  However,
IPoIB calls netif_carrier_on() when the join of the broadcast group
succeeds, without caring about the state of the IB port.  The result
is an IPoIB interface in RUNNING state but without an active IB port
to support it.

If a bonding interface uses this IPoIB interface as a slave it might
not detect that this slave is almost useless and failover
functionality will be damaged.  The fix checks the state of the IB
port in the carrier_task before calling netif_carrier_on().

Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24 12:01:05 -07:00
Linus Torvalds d7e9660ad9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
  netxen: update copyright
  netxen: fix tx timeout recovery
  netxen: fix file firmware leak
  netxen: improve pci memory access
  netxen: change firmware write size
  tg3: Fix return ring size breakage
  netxen: build fix for INET=n
  cdc-phonet: autoconfigure Phonet address
  Phonet: back-end for autoconfigured addresses
  Phonet: fix netlink address dump error handling
  ipv6: Add IFA_F_DADFAILED flag
  net: Add DEVTYPE support for Ethernet based devices
  mv643xx_eth.c: remove unused txq_set_wrr()
  ucc_geth: Fix hangs after switching from full to half duplex
  ucc_geth: Rearrange some code to avoid forward declarations
  phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
  drivers/net/phy: introduce missing kfree
  drivers/net/wan: introduce missing kfree
  net: force bridge module(s) to be GPL
  Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
  ...

Fixed up trivial conflicts:

 - arch/x86/include/asm/socket.h

   converted to <asm-generic/socket.h> in the x86 tree.  The generic
   header has the same new #define's, so that works out fine.

 - drivers/net/tun.c

   fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
   switched over to using 'tun->socket.sk' instead of the redundantly
   available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
   to the TUN driver") which added a new 'tun->sk' use.

   Noted in 'next' by Stephen Rothwell.
2009-09-14 10:37:28 -07:00
Jason Gunthorpe 5e47596bee IPoIB: Check multicast address format
Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list.  This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier 721d67cdca IPoIB: Drop priv->lock before calling ipoib_send()
IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path.  However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier cd0bcf4cb9 IPoIB: Remove unused <rdma/ib_cache.h> includes
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:38 -07:00
Eric Dumazet 451f144398 drivers: Kill now superfluous ->last_rx stores
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:07:36 -07:00
Eric Dumazet adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Eric Dumazet 86d15cd833 net: unset IFF_XMIT_DST_RELEASE for qeth and ipoib
Last two drivers that need skb->dst in their start_xmit() function

Tell dev_hard_start_xmit() to no release it by unsetting  IFF_XMIT_DST_RELEASE

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-30 23:04:46 -07:00
Eric W. Biederman 26574401fe net: Fix ipoib rtnl_lock sysfs deadlock.
Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered.  So use trylock and syscall_restart to avoid this
deadlock.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:59 -07:00
Yossi Etigin e028cc55cc IPoIB: Disable NAPI while CQ is being drained
If NAPI is enabled while IPoIB's CQ is being drained, it creates a
race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading
to memory corruption.

The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}()
instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED
flag instead on the ADMIN_UP flag. This way NAPI will be disabled when
ipoib_drain_cq() is called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-04-20 13:58:08 -07:00
Roland Dreier edb5abb1e2 IPoIB: Avoid free_netdev() BUG when destroying a child interface
We have to release the RTNL before calling free_netdev() so that the
device state has a chance to become NETREG_UNREGISTERED.  Otherwise
when removing a child interface, we hit the BUG() that tests the
device state in free_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-31 10:22:32 -07:00
Linus Torvalds 13220a94d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
  ixgbe: Allow Priority Flow Control settings to survive a device reset
  net: core: remove unneeded include in net/core/utils.c.
  e1000e: update version number
  e1000e: fix close interrupt race
  e1000e: fix loss of multicast packets
  e1000e: commonize tx cleanup routine to match e1000 & igb
  netfilter: fix nf_logger name in ebt_ulog.
  netfilter: fix warning in ebt_ulog init function.
  netfilter: fix warning about invalid const usage
  e1000: fix close race with interrupt
  e1000: cleanup clean_tx_irq routine so that it completely cleans ring
  e1000: fix tx hang detect logic and address dma mapping issues
  bridge: bad error handling when adding invalid ether address
  bonding: select current active slave when enslaving device for mode tlb and alb
  gianfar: reallocate skb when headroom is not enough for fcb
  Bump release date to 25Mar2009 and version to 0.22
  r6040: Fix second PHY address
  qeth: fix wait_event_timeout handling
  qeth: check for completion of a running recovery
  qeth: unregister MAC addresses during recovery.
  ...

Manually fixed up conflicts in:
	drivers/infiniband/hw/cxgb3/cxio_hal.h
	drivers/infiniband/hw/nes/nes_nic.c
2009-03-26 15:54:36 -07:00
Stephen Hemminger fe8114e8e1 infiniband: convert ipoib to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:14 -07:00
Jack Morgenstein 71d98b4628 IPoIB: In unicast_arp_send(), only free newly-created paths
If path_rec_start() returns error, call path_free() only if the path
was newly-created.  If we free an existing path whose valid flag was zero,
(but do not detach it from the list) we cause corruption of the
path list (of which it is a member), and get a kernel crash.

The simplest solution is to not free an existing path -- just leave it
in the list as-is (i.e., with its valid flag cleared).

Thanks to Yossi Etigin of Voltaire for identifying the problem flow
which caused the kernel crash.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moni Shua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-17 14:51:47 -08:00
Ben Hutchings 288379f050 net: Remove redundant NAPI functions
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:33:50 -08:00
Yossi Etigin 3c20962086 IPoIB: Do not print error messages for multicast join retries
When IPoIB tries to join a multicast group, and the SA module's SM
address handle is NULL (because of an SM change, etc), the join
returns with -EAGAIN status.  In that case, don't print an error
message unless multicast debugging is enabled.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-16 13:42:59 -08:00
Roland Dreier cbbe1efa49 IPoIB: Fix deadlock between ipoib_open() and child interface create
Fix a deadlock between child interface creation/deletion and ipoib
start/stop.  The former takes vlan_mutex, and then might take RTNL via
register_netdev()/unregister_netdev().  The latter is executed with
RTNL held, and tries to take vlan_mutex, which can lead to an AB-BA
deadlock.

Fix this by having the child interface creation/deletion code take the
RTNL first so vlan_mutex always nests inside RTNL.  We can use
register_netdevice() for child interfaces because we form the
interface name from the parent interface and hence don't need the '%'
expansion of register_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-14 21:44:39 -08:00
Roland Dreier b8a1b1ce14 IPoIB: Fix hang in napi_disable() if P_Key is never found
After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already
enabled"), if an interface is brought up but the corresponding P_Key
never appears, then ipoib_stop() will hang in napi_disable(), because
ipoib_open() returns before it does napi_enable().

Fix this by changing ipoib_open() to call napi_enable() even if the
P_Key isn't present.

Reported-by: Yossi Etigin <yosefe@Voltaire.COM>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-14 14:55:41 -08:00
Yossi Etigin 50df48f59d IPoIB: Do not join broadcast group if interface is brought down
Because the ipoib_workqueue is not flushed when ipoib interface is
brought down, ipoib_mcast_join() may trigger a join to the broadcast
group after priv->broadcast was set to NULL (during cleanup).  This
will cause the system to be a member of the broadcast group when
interface is down.  As a side effect, this breaks the optimization of
setting the Q_key only when joining the broadcast group.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-12 19:28:42 -08:00
Yossi Etigin a50df398cd IPoIB: Fix loss of connectivity after bonding failover on both sides
Fix bonding failover in the case both peers failover and the
gratuitous ARP is lost.  In that case, the sender side will create an
ipoib_neigh and issue a path request with the old GID first.  When
skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
will not be added to the path->list of the path of the new GID,
because the ipoib_neigh already exists.  It will not have an AH
either, because of sender-side failover.  Therefore, it will not get
an AH when the path is resolved.

The solution here is to compare GIDs in ipoib_start_xmit() even if
neigh->ah is invalid.  Comparing with an uninitialized value of
neigh->dgid should be fine, since a spurious match is harmless (and
astronomically unlikely too).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-09 14:05:11 -08:00
Neil Horman 908a7a16b8 net: Remove unused netdev arg from some NAPI interfaces.
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter.  This patch cleans up that api by
properly removing it..

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 20:43:12 -08:00
David S. Miller 198d6ba4d7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/isdn/i4l/isdn_net.c
	fs/cifs/connect.c
2008-11-18 23:38:23 -08:00
Yossi Etigin ff79ae8083 IPoIB: Fix crash in path_rec_completion()
Fix a crash in path_rec_completion() during an SM up/down loop.  If
more than one path record request is issued, the first completion
releases path->done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path->valid and changed the test "if
(!path)" to "if (!path || !path->valid)".  This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path.  Having two queries on the same
path leads to a crash.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:39 -08:00
Yossi Etigin 93a3ab939b IPoIB: Fix hang in ipoib_flush_paths()
ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path().  Then,
ipoib_flush_paths() will wait for path->done, but it will never
complete because the request was not issued at all.  Fix this by
completing path->done if issuing the query fails.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:38 -08:00
Yossi Etigin fe25c56190 IPoIB: Don't enable NAPI when it's already enabled
If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable().  ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().

Fix this by moving the call to napi_enable() to after the test for
P_Key presence.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:36 -08:00
Harvey Harrison 5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison 8c165a8383 infiniband: remove IPOIB_GID_RAW_ARG, IPOIB_GID_ARG, IPOIB_GID_FMT
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:37 -07:00
Harvey Harrison fcace2fe7a infiniband: ipoib replace IPOIB_GID_FMT with %p6
Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:36 -07:00
Or Gerlitz 83bb63f62b IPoIB: Set netdev offload features properly for child (VLAN) interfaces
Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-22 15:49:49 -07:00
Or Gerlitz 70c9c0db54 IPoIB: Clean up ethtool support
Add a get_rx_csum method.  Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-22 15:49:29 -07:00
Roland Dreier 2767840a5c IPoIB: Always initialize poll_timer to avoid crash on unload
ipoib_ib_dev_stop() does del_timer_sync(&priv->poll_timer), but if a
P_key for an interface is not found, poll_timer is not initialized, so
this leads to a crash or hang.  Fix this by moving where poll_timer is
initialized to ipoib_ib_dev_init(), which is always called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1172>.

Debugged-by: Yosef Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 15:58:52 -07:00
Roland Dreier 943c246e9b IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-30 10:36:21 -07:00
Roland Dreier c9da4bad5b IPoIB: Fix crash when path record fails after path flush
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path->ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>

Reported-by: Rabah Salem <ravah@mellanox.com>
Debugged-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-25 15:26:15 -07:00
Yossi Etigin e8224e4b80 IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop().  We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.  This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes.  This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-16 11:57:45 -07:00
Roland Dreier a77a57a1a2 IPoIB: Fix deadlock on RTNL in ipoib_stop()
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-19 15:01:32 -07:00
David J. Wilder b1404069f6 IPoIB/cm: Use vmalloc() to allocate rx_rings
There are users that are running UDP applications that require a large
receive queue size in order to get good performance.  To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-08 15:51:29 -07:00
Roland Dreier e08198169e IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()
wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.

Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-30 07:21:46 -07:00
Roland Dreier 9905922446 IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists.  Correct this to talk about the files under
debugfs that are really created.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Roland Dreier 99c3a5a9e3 IPoIB/cm: Connected mode is no longer EXPERIMENTAL
Connected mode is now tested and used by lots of people.  No need to
hide it under CONFIG_EXPERIMENTAL.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Or Gerlitz 01b3fc8b15 IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
David S. Miller 49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
David S. Miller b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
Eli Cohen bc3a290b51 IPoIB: Double default RX/TX ring sizes
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks.  With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput.  Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen e112373fd6 IPoIB/cm: Reduce connected mode TX object size
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen bd3606715e IPoIB: Use dev_set_mtu() to change mtu
When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Eli Cohen c8c2afe360 IPoIB: Use rtnl lock/unlock when changing device flags
Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Roland Dreier 9eae554c17 IPoIB: Get rid of ipoib_mcast_detach() wrapper
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function                                     old     new   delta
ipoib_mcast_leave                            357     319     -38
ipoib_mcast_detach                            67       -     -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00