Removes a race condition that could cause TIPC's internal counter
of the number of links it has to neighboring nodes to have the
incorrect value if two independent threads of control simultaneously
create new link endpoints connecting to two different nodes using two
different bearers. Such under counting would result in TIPC failing to
list the final link(s) in its response to a configuration request to
list all of the node's links. The counter is now updated atomically
to ensure that simultaneous increments do not interfere with each
other.
Thanks go to Peter Butler <pbutler@pt.com> for his assistance in
diagnosing and fixing this problem.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds support for the SO_RCVTIMEO socket option to TIPC's socket
receive routines.
Thanks go out to Raj Hegde <rajenhegde@yahoo.ca> for his contribution
to the development and testing this enhancement.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Relocates the code that notifies users of node subscriptions so that
it is adjacent to the rest of the routines that implement TIPC's node
subscription capability. Renames the name table routine that is
invoked by a node subscription to better reflect its purpose and to
be consistent with other, similar name table routines.
These changes are cosmetic in nature, and do not alter the behavior
of TIPC.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Prevents a null pointer dereference from occurring if a node subscription
is triggered at the same time that the subscribing port or publication is
terminating the subscription. The problem arises if the triggering routine
asynchronously activates and deregisters the node subscription while
deregistration is already underway -- the deregistration routine may find
that the pointer it has just verified to be non-NULL is now NULL.
To avoid this race condition the triggering routine now simply marks the
node subscription as defunct (to prevent it from re-activating)
instead of deregistering it. The subscription is now both deregistered
and destroyed only when the subscribing port or publication code terminates
the node subscription.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Introduces a pair of helper routines that convert the network address
for a TIPC node into the network address for its cluster or zone.
This is a cosmetic change designed to avoid future errors caused by
the incorrect use of address bitmasks, and does not alter the existing
operation of TIPC.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fixes a typo in the calculation of the network address of a node's own
cluster when generating a response to the configuration command that
lists all of the node's links. The correct mask value for a <Z.C.N>
network address uses 1's for the 8-bit zone and 12-bit cluster parts
and 0's for the 12-bit node part.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enhances TIPC's socket receive routines to support iovec structures
containing more than a single entry. This change leverages existing
sk_buff routines to do most of the work; the only significant change
to TIPC itself is that an sk_buff now records how much data has been
already consumed as an numeric offset, rather than as a pointer to
the first unread data byte.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
To start doing these conversions, we need to add some temporary
flow4_* macros which will eventually go away when all the protocol
code paths are changed to work on AF specific flowi objects.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a shorthand which will help in passing around AF
specific flow structures as generic ones.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have struct flowi4, flowi6, and flowidn for each address
family. And struct flowi is just a union of them all.
It might have been troublesome to convert flow_cache_uli_match() but
as it turns out this function is completely unused and therefore can
be simply removed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*
This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.
This is the first step to move in that direction.
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.
Signed-off-by: David S. Miller <davem@davemloft.net>
The PFC configuration is not cleared until the device is reset. This
has not been a problem because setting DCB attributes forced a
hardware reset. Now that we no longer require this reset to occur
PFC remains configured even after being disabled until the
device is reset.
This removes a goto in the PFC hardware set routines for 82598 and
82599 devices that was short circuiting the clear.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implemented ixgbe_ndo_set_vf_bw function which is being used by iproute2
tool. In addition, updated ixgbe_ndo_get_vf_config function to show the
actual rate limit to the user.
The rate limitation can be configured only when the link is up and the
link speed is 10Gb.
The rate limit value can be 0 or ranged between 11 and actual link
speed measured in Mbps. A value of '0' disables the rate limit for
this specific VF.
iproute2 usage will be 'ip link set ethX vf Y rate Z'.
After the command is made, the rate will be changed instantly.
To view the current rate limit, use 'ip link show ethX'.
The rates will be zeroed only upon driver reload or a link speed change.
This feature is being supported by 82599 and X540 devices.
Signed-off-by: Lior Levy <lior.levy@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
DCB provides a guaranteed bandwidth in the case with 0%
bandwidth then no bandwidth is guaranteed. However the
traffic class should still be able to transmit traffic.
For this to work the traffic class must be given the
minimum credits required to send a frame.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
VF Free Running Timer register name missing an F.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change updates the PHY setup code to support 100Mbps capable PHYs
as well as 10G and 1Gbps.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The VF mailbox polling for acks and messages would reset the timer to zero
on a timeout. Under heavy load a timeout may actually occur without being
the result of an error and when this occurs it is not practical to perform
a full VF driver reset on every message timeout. Instead, just return an
error (which is already done) and the VF driver will have an opportunity
to retry the operation.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
DCB settings are cleared in the hardware across link events
during ifup ixgbe reprograms the hardware for DCB if it is
enabled. Now that we have two modes CEE or IEEE we need to
use the correct set of configuration data.
This patch checks the dcbx_cap bits and then enables the
device in the correct mode.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support to use the priority assignment
table in the ieee_ets structure to map priorities to
traffic classes. Previously ixgbe only supported a
1:1 mapping. Now we can enable and disable hardware
DCB support when multiple traffic classes are actually
being used. This allows the default case all priorities
mapped to traffic class 0 to work in normal hardware
mode and utilize the full packet buffer.
This patch does not address putting the hardware in
4TC mode so packet buffer space may be underutilized
in this case. A follow up patch can address this
optimization. But at least we have the hooks to do
this now.
Also CEE will behave as it always has and map priorities
1:1 with traffic classes.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The patch below allowed IEEE 802.1Qaz and CEE DCB hardware
configurations to use common hardware set routines,
commit 88eb696cc6a7af8f9272266965b1a4dd7d6a931b
Author: John Fastabend <john.r.fastabend@intel.com>
Date: Thu Feb 10 03:02:11 2011 -0800
ixgbe: DCB, abstract out dcb_config from DCB hardware configuration
However the case when CEE link strict and group strict
are set was missed and are currently being mapped
incorrectly in some configurations.
This patch resolves this.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
RSS had previously been disabled when DCB was enabled because
DCB was single queued per traffic class. Now that DCB implements
multiple Tx/Rx rings per traffic class enable RSS.
Here RSS hashes across the queues in the traffic class.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain.@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds the ndo_tc_setup to ixgbe. By default we set
the device to use strict priority.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain.@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This enables multiple {Tx|Rx} rings per traffic class while in DCB
mode. In order to get this working as expected the tc_to_tx net
device mapping is configured as well as the prio_tc_map.
skb priorities are mapped across a range of queue pairs to get
a distribution per traffic class. The maximum number of
queue pairs used while in DCB mode is capped at 64. The hardware
max is actually 128 queues but 64 is sufficient for now and
allocating more seemed a bit excessive. It is easy enough to
increase the cap later if need be.
To get the 802.1Q priority tags inserted correctly ixgbe was
previously using the skb queue_mapping field to directly set
the 802.1Q priority. This no longer works because we have removed
the 1:1 mapping between queues and traffic class. Each ring
is aligned with an 802.1Qaz traffic class so here we add an
extra field to the ring struct to identify the 802.1Q traffic
class. This uses an extra byte of the ixgbe_ring struct
fortunately there was a 2byte hole,
struct ixgbe_ring {
void * desc; /* 0 8 */
struct device * dev; /* 8 8 */
struct net_device * netdev; /* 16 8 */
union {
struct ixgbe_tx_buffer * tx_buffer_info; /* 8 */
struct ixgbe_rx_buffer * rx_buffer_info; /* 8 */
}; /* 24 8 */
long unsigned int state; /* 32 8 */
u8 atr_sample_rate; /* 40 1 */
u8 atr_count; /* 41 1 */
u16 count; /* 42 2 */
u16 rx_buf_len; /* 44 2 */
u16 next_to_use; /* 46 2 */
u16 next_to_clean; /* 48 2 */
u8 queue_index; /* 50 1 */
u8 reg_idx; /* 51 1 */
u16 work_limit; /* 52 2 */
/* XXX 2 bytes hole, try to pack */
u8 * tail; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
Now we can set the VLAN priority directly and it will be
correct. User space can indicate the 802.1Qaz priority
using the SO_PRIORITY setsocket() option and QOS layer will
steer the skb to the correct rings. Additionally using
the multiq qdisc with a queue_mapping action works as
well.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove ixgbe_fcoe_getapp() and use the generic kernel
routine instead. Also add application priority to the
kernel maintained list on setapp so applications and
stacks can query the value.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement ieee_setapp dcbnl ops in ixgbe. This is required
to setup FCoE which requires dedicated resources. If the
app data is not for FCoE then no action is taken in ixgbe
except to add it to the dcb_app_list.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This implements dcbnl get and set capabilities ops. The
devices supported by ixgbe can be configured to run in
IEEE or CEE modes but not both.
With the DCBX set capabilities bit we add an explicit
signal that must be used to toggle between these modes.
This patch adds logic to fail the CEE command set_hw_all()
which programs the device with a CEE configuration if
the CEE caps bit is not set. Similarly, IEEE set
commands will fail if the IEEE caps bit is not set. We
allow most CEE config set commands to occur because they
do not touch the hardware until set_hw_all() is called.
The one exception to the above is the {set|get}app routines.
These must always be protected by caps bits to ensure
side effects do not corrupt the current configured mode.
By requiring the caps bit to be set correctly we can
maintain a consistent configuration in the hardware
for CEE or IEEE modes and prevent partial hardware
configurations that may occur if user space does
not send a complete IEEE or CEE configurations.
It is expected that user space will signal a DCBX mode
before programming device.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch updates igb version to 3.0.6.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>