eth_type_trans() will use these functions if DSA is enabled, which
blocks building DSA as a module.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 88491d8103 ("drivers/net: Kconfig
& Makefile cleanup") changed the type of these options to bool, but
they select code that could (and still can) be built as modules.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pmtu informations on the inetpeer are visible for output and
input routes. On packet forwarding, we might propagate a learned
pmtu to the sender. As we update the pmtu informations of the
inetpeer on demand, the original sender of the forwarded packets
might never notice when the pmtu to that inetpeer increases.
So use the mtu of the outgoing device on packet forwarding instead
of the pmtu to the final destination.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is, we return null as the default mtu of blackhole routes.
This may lead to a propagation of a bogus pmtu if the default_mtu
method of a blackhole route is invoked. So return dst->dev->mtu
as the default mtu instead.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip entries from foreign network namespaces.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was copy and pasted from the IPv4 code. We're calling the
ip4 version of that function and map4 is NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix below build error:
CC drivers/net/ethernet/marvell/mv643xx_eth.o
drivers/net/ethernet/marvell/mv643xx_eth.c: In function 'mv643xx_eth_get_drvinfo':
drivers/net/ethernet/marvell/mv643xx_eth.c:1505: error: 'info' undeclared (first use in this function)
drivers/net/ethernet/marvell/mv643xx_eth.c:1505: error: (Each undeclared identifier is reported only once
drivers/net/ethernet/marvell/mv643xx_eth.c:1505: error: for each function it appears in.)
make[4]: *** [drivers/net/ethernet/marvell/mv643xx_eth.o] Error 1
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can not update iph->daddr in ip_options_rcv_srr(), It is too early.
When some exception ocurred later (eg. in ip_forward() when goto
sr_failed) we need the ip header be identical to the original one as
ICMP need it.
Add a field 'nexthop' in struct ip_options to save nexthop of LSRR
or SSRR option.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use round_jiffies_relative to align the ehea workqueue and avoid
extra wakeups.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we enable multiqueue by default the ehea driver is using
quite a lot of memory for its buffer pools. With 4 queues we
consume 64MB in the jumbo packet ring, 16MB in the medium packet
ring and 16MB in the tiny packet ring.
We should only fill the jumbo ring once the MTU is increased but
for now halve it's size so it consumes 32MB. Also reduce the tiny
packet ring, with 4 queues we had 16k entries which is overkill.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transmiting a fragmented skb, qlge fills a descriptor with the
fragment addresses, after DMA-mapping them. If there are more than eight
fragments, it will use the eighth descriptor as a pointer to an external
list. After mapping this external list, called OAL to a structure
containing more descriptors, it fills it with the extra fragments.
However, considering that systems with pages larger than 8KiB would have
less than 8 fragments, which was true before commit a715dea3c8, it
defined a macro for the OAL size as 0 in those cases.
Now, if a skb with more than 8 fragments (counting skb->data as one
fragment), this would start overwriting the list of addresses already
mapped and would make the driver fail to properly unmap the right
addresses on architectures with pages larger than 8KiB.
Besides that, the list of mappings was one size too small, since it must
have a mapping for the maxinum number of skb fragments plus one for
skb->data and another for the OAL. So, even on architectures with page
sizes 4KiB and 8KiB, a skb with the maximum number of fragments would
make the driver overwrite its counter for the number of mappings, which,
again, would make it fail to unmap the mapped DMA addresses.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix port identify test on 5461x PHY by driving LEDs through MDIO.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_assign_pointer(ptr, NULL) can be safely replaced by
RCU_INIT_POINTER(ptr, NULL)
(old rcu_assign_pointer() macro was testing the NULL value and could
omit the smp_wmb(), but this had to be removed because of compiler
warnings)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When add sources to interface failure, need to roll back the sfcount[MODE]
to before state. We need to match it corresponding.
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since linux 2.6.26 (commit c6aefafb7e : Add IPv6 support to TCP SYN
cookies), we can drop a SYN packet reusing a TIME_WAIT socket.
(As a matter of fact we fail to send the SYNACK answer)
As the client resends its SYN packet after a one second timeout, we
accept it, because first packet removed the TIME_WAIT socket before
being dropped.
This probably explains why nobody ever noticed or complained.
Reported-by: Jesse Young <jlyo@jlyo.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported issues when using dev_kfree_skb() on UP systems and
systems with low numbers of cores. dev_kfree_skb_irq() will
properly save IRQ state before freeing the skb.
Tested on 3.1.1 and 3.2_rc2
Example of reproducible trace of kernel 3.1.1
------------[ cut here ]------------
WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
...
Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
Call Trace:
[<c1022970>] warn_slowpath_common+0x65/0x7a
[<c102699e>] ? local_bh_enable+0x32/0x79
[<c1022994>] warn_slowpath_null+0xf/0x13
[<c102699e>] local_bh_enable+0x32/0x79
[<c134bfd8>] destroy_conntrack+0x7c/0x9b
[<c134890b>] nf_conntrack_destroy+0x1f/0x26
[<c132e3a6>] skb_release_head_state+0x74/0x83
[<c132e286>] __kfree_skb+0xb/0x6b
[<c132e30a>] consume_skb+0x24/0x26
[<c127c925>] b44_poll+0xaa/0x449
[<c1333ca1>] net_rx_action+0x3f/0xea
[<c1026a44>] __do_softirq+0x5f/0xd5
[<c10269e5>] ? local_bh_enable+0x79/0x79
<IRQ> [<c1026c32>] ? irq_exit+0x34/0x8d
[<c1003628>] ? do_IRQ+0x74/0x87
[<c13f5329>] ? common_interrupt+0x29/0x30
[<c1006e18>] ? default_idle+0x29/0x3e
[<c10015a7>] ? cpu_idle+0x2f/0x5d
[<c13e91c5>] ? rest_init+0x79/0x7b
[<c15c66a9>] ? start_kernel+0x297/0x29c
[<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
---[ end trace 583f33bb1aa207a9 ]---
Signed-off-by: Xander Hover <LKML@hover.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Distributions are using this in their default scripts, so don't hide
them behind the advanced setting.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I broke the build with the addition of netprio_cgroups if CONFIG_CGROUPS=n.
This patch corrects it by moving the offending struct into an ifdef
CONFIG_CGROUPS block. Also clean up a few needless defines and inline functions
that don't get called if CONFIG_CGROUPS isn't defined while Im at it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 72a3effaf6 ([NET]: Size listen hash tables using backlog
hint) added a bug allowing inet6_synq_hash() to return an out of bound
array index, because of u16 overflow.
Bug can happen if system admins set net.core.somaxconn &
net.ipv4.tcp_max_syn_backlog sysctls to values greater than 65536
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given we dont use anymore the struct net_device *dev argument, and this
interface brings litle benefit, remove netdev_{alloc|free}_page(), to
debloat include/linux/skbuff.h a bit.
(Some drivers used a mix of these interfaces and alloc_pages())
When allocating a page given to device for DMA transfer (device to
memory), it makes sense to use a cold one (__GFP_COLD)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per discussion with Ben Hutchings and David Miller, go through and
remove assignments of "N/A" to fw_version in various drivers'
.get_drvinfo routines. While there clean-up some use of bare
constants and such.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4ba7d99978.
The original patch was a misguided attempt to improve performance on
some hardware that is apparently prone to spurious interrupt generation.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 23085d5796.
The original patch was a misguided attempt to improve performance on
some hardware that is apparently prone to spurious interrupt generation.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
when skb_shift, we want to shift paged data from skb to tgt frag area.
Original comments revert the shift order
Signed-off-by: Feng King <kinwin2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the vcc backends do the right thing with respect the receive
queue on registration, allow MSK_PEEK for atm sockets.
This allows a userspace program to inspect the packets and decide what
backend to use to handle them.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function moves the implementation found in the clip and br2684
modules to common code, correctly unlinks the skb from the queue
before pushing it and makes pppoatm use it.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't see the point on substracting the skb len from the netdev
stats.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This snippet has caused several bugs in the past, and I don't see the
point on substracting the skb len from netdev stats.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Future devices may or may not be capable of supporting larger rx
producer rings. This patch changes the code so that this flag is set on
an ASIC rev to ASIC rev basis. Also, this patch changes a place where
the LRG_PROD_RING_CAP flag was not controlling how the rx standard
producer ring size was set.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BD replenish thresholds for the 57765 and newer ASIC revs are a
little strict. They were tuned for a mode that is currently unused.
This patch relaxes the thresholds so that they are set to values more
inline with the resources available.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes tg3's 1000Base-X flow control resolution to look like
the 1000Base-T flow control resolution code.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This considers some simple cases that are common and easy to validate
Note in particular that there are no ...s in the rule, so all of the
matched code has to be contiguous
The semantic patch that makes this change is available
in scripts/coccinelle/api/alloc/kzalloc-simple.cocci.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current addr_match() is errh, under-optimized.
Compiler doesn't know that memcmp() branch doesn't trigger for IPv4.
Also, pass addresses by value -- they fit into register.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the requisite documentation to explain to new users how net_prio cgroups work
Signed-off-by:Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds in the infrastructure code to create the network priority
cgroup. The cgroup, in addition to the standard processes file creates two
control files:
1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map
2) priomap - This is a writeable file. On read it reports a table of 2-tuples
<name:priority> where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroup
This cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modification
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 28011cf19b (net: Add ethtool to mii advertisment conversion
helpers) added a helper function ethtool_adv_to_mii_100bt() and
tg3_copper_is_advertising_all(), tg3_phy_autoneg_cfg() were
modified to use this.
Before that commit, ethtool to mii advertisement conversion was
done wrt speed, but now pause operation is also taken account.
So, in tg3_copper_is_advertising_all(), below condition becomes
true and this makes link up fails.
if ((adv_reg & ADVERTISE_ALL) != all_mask)
return 0;
To fix this add ADVERTISE_ALL bit and operation to cap speed,
and change default advertisement not including ADVERTISED_Pause.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SKB_TRUESIZE() provides a better approximation of expected skb truesize.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>