Commit Graph

235461 Commits

Author SHA1 Message Date
Allan Stephens 672d99e19a tipc: Convert node object array to a hash table
Replaces the dynamically allocated array of pointers to the cluster's
node objects with a static hash table. Hash collisions are resolved
using chaining, with a typical hash chain having only a single node,
to avoid degrading performance during processing of incoming packets.
The conversion to a hash table reduces the memory requirements for
TIPC's node table to approximately the same size it had prior to
the previous commit.

In addition to the hash table itself, TIPC now also maintains a
linked list for the node objects, sorted by ascending network address.
This list allows TIPC to continue sending responses to user space
applications that request node and link information in sorted order.
The list also improves performance when name table update messages are
sent by making it easier to identify the nodes that must be notified.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens f831c963b5 tipc: Eliminate configuration for maximum number of cluster nodes
Gets rid of the need for users to specify the maximum number of
cluster nodes supported by TIPC. TIPC now automatically provides
support for all 4K nodes allowed by its addressing scheme.

Note: This change sets TIPC's memory usage to the amount used by
a maximum size node table with 4K entries.  An upcoming patch that
converts the node table from a linear array to a hash table will
compact the node table to a more efficient design, but for clarity
it is nice to have all the Kconfig infrastruture go away separately.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens d1bcb11544 tipc: Split up unified structure of network-related variables
Converts the fields of the global "tipc_net" structure into individual
variables.  Since the struct was never referenced as a complete unit,
its existence was pointless.  This will facilitate upcoming changes to
TIPC's node table and simpify upcoming relocation of the variables so
they are only visible to the files that actually use them.

This change is essentially cosmetic in nature, and doesn't affect the
operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens 9df3b7eb6e tipc: Fix problem with missing link in "tipc-config -l" output
Removes a race condition that could cause TIPC's internal counter
of the number of links it has to neighboring nodes to have the
incorrect value if two independent threads of control simultaneously
create new link endpoints connecting to two different nodes using two
different bearers. Such under counting would result in TIPC failing to
list the final link(s) in its response to a configuration request to
list all of the node's links. The counter is now updated atomically
to ensure that simultaneous increments do not interfere with each
other.

Thanks go to Peter Butler <pbutler@pt.com> for his assistance in
diagnosing and fixing this problem.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens 71092ea122 tipc: Add support for SO_RCVTIMEO socket option
Adds support for the SO_RCVTIMEO socket option to TIPC's socket
receive routines.

Thanks go out to Raj Hegde <rajenhegde@yahoo.ca> for his contribution
to the development and testing this enhancement.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens f137917332 tipc: Cosmetic changes to node subscription code
Relocates the code that notifies users of node subscriptions so that
it is adjacent to the rest of the routines that implement TIPC's node
subscription capability. Renames the name table routine that is
invoked by a node subscription to better reflect its purpose and to
be consistent with other, similar name table routines.

These changes are cosmetic in nature, and do not alter the behavior
of TIPC.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens 431697eb60 tipc: Prevent null pointer error when removing a node subscription
Prevents a null pointer dereference from occurring if a node subscription
is triggered at the same time that the subscribing port or publication is
terminating the subscription. The problem arises if the triggering routine
asynchronously activates and deregisters the node subscription while
deregistration is already underway -- the deregistration routine may find
that the pointer it has just verified to be non-NULL is now NULL.
To avoid this race condition the triggering routine now simply marks the
node subscription as defunct (to prevent it from re-activating)
instead of deregistering it. The subscription is now both deregistered
and destroyed only when the subscribing port or publication code terminates
the node subscription.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens a3796f895f tipc: Add network address mask helper routines
Introduces a pair of helper routines that convert the network address
for a TIPC node into the network address for its cluster or zone.

This is a cosmetic change designed to avoid future errors caused by
the incorrect use of address bitmasks, and does not alter the existing
operation of TIPC.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens aa84729484 tipc: Correct broadcast link peer info when displaying links
Fixes a typo in the calculation of the network address of a node's own
cluster when generating a response to the configuration command that
lists all of the node's links. The correct mask value for a <Z.C.N>
network address uses 1's for the 8-bit zone and 12-bit cluster parts
and 0's for the 12-bit node part.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens 0232fd0ac4 tipc: Allow receiving into iovec containing multiple entries
Enhances TIPC's socket receive routines to support iovec structures
containing more than a single entry. This change leverages existing
sk_buff routines to do most of the work; the only significant change
to TIPC itself is that an sk_buff now records how much data has been
already consumed as an numeric offset, rather than as a pointer to
the first unread data byte.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
David S. Miller bef55aebd5 decnet: Convert to use flowidn where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:55 -08:00
David S. Miller 1958b856c1 net: Put fl6_* macros to struct flowi6 and use them again.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:55 -08:00
David S. Miller 4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller 9cce96df5b net: Put fl4_* macros to struct flowi4 and use them again.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller f42454d632 ipv4: Kill fib_semantic_match declaration from fib_lookup.h
This function no longer exists.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:53 -08:00
David S. Miller 7e1dc7b6f7 net: Use flowi4 and flowi6 in xfrm layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:52 -08:00
David S. Miller 2032656e76 net: Add flowi6_* member helper macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:52 -08:00
David S. Miller a1bbb0e698 netfilter: Use flowi4 and flowi6 in xt_TCPMSS
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:51 -08:00
David S. Miller 5a49d0e04d netfilter: Use flowi4 and flowi6 in nf_conntrack_h323_main
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:51 -08:00
David S. Miller b6f21b2680 ipv4: Use flowi4 in UDP
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:50 -08:00
David S. Miller 3073e5ab92 netfilter: Use flowi4 in nf_nat_standalone.c
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:50 -08:00
David S. Miller da91981bee ipv4: Use flowi4 in ipmr code.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:49 -08:00
David S. Miller 9ade22861f ipv4: Use flowi4 in FIB layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:49 -08:00
David S. Miller 9d6ec93801 ipv4: Use flowi4 in public route lookup interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:48 -08:00
David S. Miller 68a5e3dd0a ipv4: Use struct flowi4 internally in routing lookups.
We will change the externally visible APIs next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:48 -08:00
David S. Miller 22bd5b9b13 ipv4: Pass ipv4 flow objects into fib_lookup() paths.
To start doing these conversions, we need to add some temporary
flow4_* macros which will eventually go away when all the protocol
code paths are changed to work on AF specific flowi objects.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:47 -08:00
David S. Miller 59b1a94c9a net: Add flowiX_to_flowi() shorthands.
This is just a shorthand which will help in passing around AF
specific flow structures as generic ones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:47 -08:00
David S. Miller 56bb8059e1 net: Break struct flowi out into AF specific instances.
Now we have struct flowi4, flowi6, and flowidn for each address
family.  And struct flowi is just a union of them all.

It might have been troublesome to convert flow_cache_uli_match() but
as it turns out this function is completely unused and therefore can
be simply removed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:46 -08:00
David S. Miller 6281dcc94a net: Make flowi ports AF dependent.
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:46 -08:00
David S. Miller 08704bcbf0 net: Create union flowi_uli
This will be used when we have seperate flowi types.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:45 -08:00
David S. Miller 806566cc78 net: Create struct flowi_common
Pull out the AF independent members of struct flowi into a
new struct flowi_common

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:45 -08:00
David S. Miller 1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller ca116922af xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok().
There is only one caller of xfrm_bundle_ok(), and that always passes these
parameters as NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:43 -08:00
David S. Miller fbef0a4091 net: Remove unnecessary padding in struct flowi
Move tos, scope, proto, and flags to the beginning of
the structure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:43 -08:00
David S. Miller 78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller 1561747ddf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next-2.6 2011-03-12 14:41:02 -08:00
David S. Miller ab1ebc9530 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-03-12 11:06:59 -08:00
John Fastabend 1f4a0244ff ixgbe: DCB, PFC not cleared until reset occurs
The PFC configuration is not cleared until the device is reset. This
has not been a problem because setting DCB attributes forced a
hardware reset. Now that we no longer require this reset to occur
PFC remains configured even after being disabled until the
device is reset.

This removes a goto in the PFC hardware set routines for 82598 and
82599 devices that was short circuiting the clear.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:15:35 -08:00
Lior Levy ff4ab20611 ixgbe: add support for VF Transmit rate limit using iproute2
Implemented ixgbe_ndo_set_vf_bw function which is being used by iproute2
tool. In addition, updated ixgbe_ndo_get_vf_config function to show the
actual rate limit to the user.

The rate limitation can be configured only when the link is up and the
link speed is 10Gb.
The rate limit value can be 0 or ranged between 11 and actual link
speed measured in Mbps. A value of '0' disables the rate limit for
this specific VF.

iproute2 usage will be 'ip link set ethX vf Y rate Z'.
After the command is made, the rate will be changed instantly.
To view the current rate limit, use 'ip link show ethX'.

The rates will be zeroed only upon driver reload or a link speed change.

This feature is being supported by 82599 and X540 devices.

Signed-off-by: Lior Levy <lior.levy@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:15:08 -08:00
John Fastabend 1390a59452 ixgbe: DCB, set minimum bandwidth per traffic class
DCB provides a guaranteed bandwidth in the case with 0%
bandwidth then no bandwidth is guaranteed. However the
traffic class should still be able to transmit traffic.
For this to work the traffic class must be given the
minimum credits required to send a frame.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:14:48 -08:00
Emil Tantilov 6fb456a07c ixgbe: correct typo in define name
VF Free Running Timer register name missing an F.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:14:28 -08:00
Emil Tantilov 9dda173667 ixgbe: update PHY code to support 100Mbps as well as 1G/10G
This change updates the PHY setup code to support 100Mbps capable PHYs
as well as 10G and 1Gbps.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:14:01 -08:00
Emil Tantilov 7e7eb43463 ixgbe: remove timer reset to 0 on timeout
The VF mailbox polling for acks and messages would reset the timer to zero
on a timeout. Under heavy load a timeout may actually occur without being
the result of an error and when this occurs it is not practical to perform
a full VF driver reset on every message timeout. Instead, just return an
error (which is already done) and the VF driver will have an opportunity
to retry the operation.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:13:41 -08:00
John Fastabend c27931da83 ixgbe: DCB during ifup use correct CEE or IEEE mode
DCB settings are cleared in the hardware across link events
during ifup ixgbe reprograms the hardware for DCB if it is
enabled. Now that we have two modes CEE or IEEE we need to
use the correct set of configuration data.

This patch checks the dcbx_cap bits and then enables the
device in the correct mode.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:13:20 -08:00
John Fastabend 17049d30c2 ixgbe: IEEE 802.1Qaz, implement priority assignment table
This patch adds support to use the priority assignment
table in the ieee_ets structure to map priorities to
traffic classes. Previously ixgbe only supported a
1:1 mapping. Now we can enable and disable hardware
DCB support when multiple traffic classes are actually
being used. This allows the default case all priorities
mapped to traffic class 0 to work in normal hardware
mode and utilize the full packet buffer.

This patch does not address putting the hardware in
4TC mode so packet buffer space may be underutilized
in this case. A follow up patch can address this
optimization. But at least we have the hooks to do
this now.

Also CEE will behave as it always has and map priorities
1:1 with traffic classes.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:12:54 -08:00
John Fastabend 3b97fd6954 ixgbe: DCB, missed translation from 8021Qaz TSA to CEE link strict
The patch below  allowed IEEE 802.1Qaz and CEE DCB hardware
configurations to use common hardware set routines,

commit 88eb696cc6a7af8f9272266965b1a4dd7d6a931b
Author: John Fastabend <john.r.fastabend@intel.com>
Date:   Thu Feb 10 03:02:11 2011 -0800

    ixgbe: DCB, abstract out dcb_config from DCB hardware configuration

However the case when CEE link strict and group strict
are set was missed and are currently being mapped
incorrectly in some configurations.

This patch resolves this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:12:35 -08:00
John Fastabend 8187cd485b ixgbe: DCB: enable RSS to be used with DCB
RSS had previously been disabled when DCB was enabled because
DCB was single queued per traffic class. Now that DCB implements
multiple Tx/Rx rings per traffic class enable RSS.

Here RSS hashes across the queues in the traffic class.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain.@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:12:14 -08:00
John Fastabend 24095aa347 ixgbe: enable ndo_tc_setup
This patch adds the ndo_tc_setup to ixgbe. By default we set
the device to use strict priority.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain.@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:11:53 -08:00
John Fastabend e5b6463557 ixgbe: DCB, use multiple Tx rings per traffic class
This enables multiple {Tx|Rx} rings per traffic class while in DCB
mode. In order to get this working as expected the tc_to_tx net
device mapping is configured as well as the prio_tc_map.

skb priorities are mapped across a range of queue pairs to get
a distribution per traffic class. The maximum number of
queue pairs used while in DCB mode is capped at 64. The hardware
max is actually 128 queues but 64 is sufficient for now and
allocating more seemed a bit excessive. It is easy enough to
increase the cap later if need be.

To get the 802.1Q priority tags inserted correctly ixgbe was
previously using the skb queue_mapping field to directly set
the 802.1Q priority. This no longer works because we have removed
the 1:1 mapping between queues and traffic class. Each ring
is aligned with an 802.1Qaz traffic class so here we add an
extra field to the ring struct to identify the 802.1Q traffic
class. This uses an extra byte of the ixgbe_ring struct
fortunately there was a 2byte hole,

struct ixgbe_ring {
        void *                     desc;                 /*     0     8 */
        struct device *            dev;                  /*     8     8 */
        struct net_device *        netdev;               /*    16     8 */
        union {
                struct ixgbe_tx_buffer * tx_buffer_info; /*           8 */
                struct ixgbe_rx_buffer * rx_buffer_info; /*           8 */
        };                                               /*    24     8 */
        long unsigned int          state;                /*    32     8 */
        u8                         atr_sample_rate;      /*    40     1 */
        u8                         atr_count;            /*    41     1 */
        u16                        count;                /*    42     2 */
        u16                        rx_buf_len;           /*    44     2 */
        u16                        next_to_use;          /*    46     2 */
        u16                        next_to_clean;        /*    48     2 */
        u8                         queue_index;          /*    50     1 */
        u8                         reg_idx;              /*    51     1 */
        u16                        work_limit;           /*    52     2 */

        /* XXX 2 bytes hole, try to pack */

        u8 *                       tail;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

Now we can set the VLAN priority directly and it will be
correct. User space can indicate the 802.1Qaz priority
using the SO_PRIORITY setsocket() option and QOS layer will
steer the skb to the correct rings. Additionally using
the multiq qdisc with a queue_mapping action works as
well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:11:29 -08:00
John Fastabend dc166e22ed ixgbe: DCB remove ixgbe_fcoe_getapp routine
Remove ixgbe_fcoe_getapp() and use the generic kernel
routine instead. Also add application priority to the
kernel maintained list on setapp so applications and
stacks can query the value.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-12 04:11:11 -08:00