linux

Commit Graph

Author	SHA1	Message	Date
YOSHIFUJI Hideaki / 吉藤英明	12fd84f438	ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. Because of rt->n removal, we do not need neigh argument any more. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:41:13 -05:00
Steffen Klassert	6f809da27c	ipv6: Add an error handler for icmp6 pmtu and redirect events are now handled in the protocols error handler, so add an error handler for icmp6 to do this. It is needed in the case when we have no socket context. Based on a patch by Duan Jiong. Reported-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:19:42 -05:00
Greg Kroah-Hartman	ed408f7c0f	Merge 3.9-rc4 into driver-core-next This is to fix up a build problem with a wireless driver due to the dynamic-debug patches in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 19:48:18 -08:00
YOSHIFUJI Hideaki / 吉藤英明	887c95cc1d	ipv6: Complete neighbour entry removal from dst_entry. CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6fd6ce2056	ipv6: Do not depend on rt->n in ip6_finish_output2(). If neigh is not found, create new one. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	707be1ff3d	ipv6: Do not depend on rt->n in ip6_dst_lookup_tail(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2152caea71	ipv6: Do not depend on rt->n in rt6_probe(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	145a36217a	ipv6: Do not depend on rt->n in rt6_check_neigh(). CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	c440f1609b	ipv6: Do not depend on rt->n in ip6_pol_route(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	dd0cbf29b1	ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway. Do not depend on rt->n. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	8e022ee63f	ndisc: Remove tbl argument for __ipv6_neigh_lookup(). We can refer to nd_tbl directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	7ff74a596b	ndisc: Update neigh->updated with write lock. neigh->nud_state and neigh->updated are under protection of neigh->lock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
Romain KUNTZ	7efdba5bd9	ipv6: fix header length calculation in ip6_append_data() Commit `299b0767` (ipv6: Fix IPsec slowpath fragmentation problem) has introduced a error in the header length calculation that provokes corrupted packets when non-fragmentable extensions headers (Destination Option or Routing Header Type 2) are used. rt->rt6i_nfheader_len is the length of the non-fragmentable extension header, and it should be substracted to rt->dst.header_len, and not to exthdrlen, as it was done before commit `299b0767`. This patch reverts to the original and correct behavior. It has been successfully tested with and without IPsec on packets that include non-fragmentable extensions headers. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 03:37:13 -05:00
David S. Miller	4b87f92259	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/networking/ip-sysctl.txt drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c Both conflicts were simply overlapping context. A build fix for qlcnic is in here too, simply removing the added devinit annotations which no longer exist. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-15 15:05:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6059283378	ipv6 netevent: Remove old_neigh from netevent_redirect. The only user is cxgb3 driver. old_neigh is used to check device change, but it must not happen on redirect. In this sense, we can remove old_neigh argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-14 15:04:59 -05:00
YOSHIFUJI Hideaki / 吉藤英明	dd3332bfcb	ipv6: Store Router Alert option in IP6CB directly. Router Alert option is very small and we can store the value itself in the skb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2b464f61f0	ipv6 xfrm: Use ipv6_addr_hash() in xfrm6_tunnel_spi_hash_byaddr(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	c08977bb2b	ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	daad151263	ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input(). Move generalized version of ipv6_is_mld() to header, and use it from ip6_mc_input(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	e7219858ac	ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass(). Commit `7a3198a8` ("ipv6: helper function to get tclass") introduced ipv6_tclass(), but similar function is already available as ipv6_get_dsfield(). We might be able to call ipv6_tclass() from ipv6_get_dsfield(), but it is confusing to have two versions. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6502ca527f	ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	3e4e4c1f2d	ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel. This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-13 20:17:13 -05:00
Kees Cook	f9ceb16ec4	net/ipv6: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David S. Miller <davem@davemloft.net>	2013-01-11 11:40:01 -08:00
Romain Kuntz	21caa6622b	ipv6: use addrconf_get_prefix_route for prefix route lookup [v2] Replace ip6_route_lookup() with addrconf_get_prefix_route() when looking up for a prefix route. This ensures that the connected prefix is looked up in the main table, and avoids the selection of other matching routes located in different tables as well as blackhole or prohibited entries. In addition, this fixes an Opps introduced by commit `64c6d08e` (ipv6: del unreachable route when an addr is deleted on lo), that would occur when a blackhole or prohibited entry is selected by ip6_route_lookup(). Such entries have a NULL rt6i_table argument, which is accessed by __ip6_del_rt() when trying to lock rt6i_table->tb6_lock. The function addrconf_is_prefix_route() is not used anymore and is removed. [v2] Minor indentation cleanup and log updates. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:22:54 -08:00
Romain Kuntz	85da53bf1c	ipv6: fix the noflags test in addrconf_get_prefix_route The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:13:33 -08:00
YOSHIFUJI Hideaki / 吉藤英明	6c40d100ce	ipv6: Use container_of macro instead of magic number to get ipv6 header. In ipv6_recv_error(), addr_offset points to daddr field of the ip header. To get ipv6 header, use container_of() macro instead of substracting magic number (24). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:59:53 -08:00
YOSHIFUJI Hideaki / 吉藤英明	ba96bcbcd2	ipv6: Use FIELD_SIZEOF() in inet6_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-09 23:38:23 -08:00
Cong Wang	acb3e04119	ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library As suggested by David, udp6_csum_init() is too big to be inlined, move it to ipv6 static library, net/ipv6/ip6_checksum.c. And the generic csum_ipv6_magic() too. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:56:10 -08:00
Li RongQing	a9403f8aeb	ah6/esp6: set transport header correctly for IPsec tunnel mode. IPsec tunnel does not set ECN field to CE in inner header when the ECN field in the outer header is CE, and the ECN field in the inner header is ECT(0) or ECT(1). The cause is ipip6_hdr() does not return the correct address of inner header since skb->transport-header is not the inner header after esp6_input_done2(), or ah6_input(). Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-01-08 12:41:30 +01:00
Hannes Frederic Sowa	5d134f1c1f	tcp: make sysctl_tcp_ecn namespace aware As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl namespace aware. The reason behind this patch is to ease the testing of ecn problems on the internet and allows applications to tune their own use of ecn. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:09:56 -08:00
YOSHIFUJI Hideaki / 吉藤英明	71bcdba06d	ndisc: Use struct rd_msg for redirect message. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:08:38 -08:00
YOSHIFUJI Hideaki / 吉藤英明	b7dc8c3959	ndisc: Remove unused space at tail of skb for ndisc messages. (TAKE 3) Currently, the size of skb allocated for NDISC is MAX_HEADER + LL_RESERVED_SPACE(dev) + packet length + dev->needed_tailroom, but only LL_RESERVED_SPACE(dev) bytes is "reserved" for headers. As a result, the skb looks like this (after construction of the message): head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\|<--MAX_HEADER-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom As the name implies, "MAX_HEADER" is used for headers, and should be "reserved" in prior to packet construction. Or, if some space is really required at the tail of ther skb, it should be explicitly documented. We have several option after construction of NDISC message: Option 1: head data tail end +---------------------------------------------+ + \| \| \| +---------------------------------------------+ \|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Option 2: head data tail end +--------------------------------------------------+ + \| \| \| +--------------------------------------------------+ \|<--MAX_HEADER-->\|<---ipv6 packet------>\|<--tlen-->\| = dev ->needed_ tailroom Option 3: head data tail end +--------------------------------------------------------------+ + \| \| \| \| +--------------------------------------------------------------+ \|<--MAX_HEADER-->\|<-hlen---->\|<---ipv6 packet------>\|<--tlen-->\| =LL_ = dev RESERVED_ ->needed_ SPACE(dev) tailroom Our tunnel drivers try expanding headroom and the space for tunnel encapsulation was not a mandatory space -- so we are not seeing bugs here --, but just for optimization for performance critial situations. Since NDISC messages are not performance critical unlike TCP, and as we know outgoing device, LL_RESERVED_SPACE(dev) should be just enough for the device in most (if not all) cases: LL_RESERVED_SPACE(dev) <= LL_MAX_HEADER <= MAX_HEADER Note that LL_RESERVED_SPACE(dev) is also enough for NDISC over SIT (e.g., ISATAP). So, I think Option 1 is just fine here. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 15:16:44 -08:00
Ulrich Weber	429da4c0b1	netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation csum16_add() has a broken carry detection, should be: sum += sum < (__force u16)b; Instead of fixing csum16_add, remove the custom checksum functions and use the generic csum_add/csum_sub ones. Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-04 20:03:02 +01:00
David S. Miller	ac196f8c92	Merge branch 'master' of git://1984.lsi.us.es/nf Pablo Neira Ayuso says: ==================== The following batch contains Netfilter fixes for 3.8-rc1. They are a mixture of old bugs that have passed unnoticed (I'll pass these to stable) and more fresh ones from the previous merge window, they are: * Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd showing up wrong address, from Bob Hockney. * Fix a comment in nf_conntrack_ipv6, from Florent Fourcot. * Fix a leak an error path in ctnetlink while creating an expectation, from Jesper Juhl. * Fix missing ICMP time exceeded in the IPv6 defragmentation code, from Haibo Xi. * Fix inconsistent handling of routing changes in MASQUERADE for the new connections case, from Andrew Collins. * Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to crashes in the ixgbe driver (since it seems to access the transport header with TSO enabled), from Mukund Jampala. * Recover obsoleted NOTRACK target by including it into the CT and spot a warning via printk about being obsoleted. Many people don't check the scheduled to be removal file under Documentation, so we follow some less agressive approach to kill this in a year or so. Spotted by Florian Westphal, patch from myself. * Fix race condition in xt_hashlimit that allows to create two or more entries, from myself. * Fix crash if the CT is used due to the recently added facilities to consult the dying and unconfirmed conntrack lists, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 14:28:17 -08:00
Isaku Yamahata	ae782bb16c	ipv6/ip6_gre: set transport header correctly ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-26 15:19:56 -08:00
Cong Ding	bd7790286b	ipv6: addrconf.c: remove unnecessary "if" the value of err is always negative if it goes to errout, so we don't need to check the value of err. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-19 12:50:06 -08:00
Haibo Xi	97cf00e93c	netfilter: nf_ct_reasm: fix conntrack reassembly expire code Commit `b836c99fd6` (ipv6: unify conntrack reassembly expire code with standard one) use the standard IPv6 reassembly code(ip6_expire_frag_queue) to handle conntrack reassembly expire. In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get which device received this expired packet.so we must save ifindex when NF_conntrack get this packet. With this patch applied, I can see ICMP Time Exceeded sent from the receiver when the sender sent out 1/2 fragmented IPv6 packet. Signed-off-by: Haibo Xi <haibbo@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-16 23:41:25 +01:00
Florent Fourcot	d7a769ff0e	netfilter: nf_conntrack_ipv6: fix comment for packets without data Remove ambiguity of double negation. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-16 23:28:31 +01:00
Andrew Collins	c65ef8dc7b	netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE Since (`a0ecb85` netfilter: nf_nat: Handle routing changes in MASQUERADE target), the MASQUERADE target handles routing changes which affect the output interface of a connection, but only for ESTABLISHED connections. It is also possible for NEW connections which already have a conntrack entry to be affected by routing changes. This adds a check to drop entries in the NEW+conntrack state when the oif has changed. Signed-off-by: Andrew Collins <bsderandrew@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-16 23:28:30 +01:00
Mukund Jampala	c6f408996c	netfilter: ip[6]t_REJECT: fix wrong transport header pointer in TCP reset The problem occurs when iptables constructs the tcp reset packet. It doesn't initialize the pointer to the tcp header within the skb. When the skb is passed to the ixgbe driver for transmit, the ixgbe driver attempts to access the tcp header and crashes. Currently, other drivers (such as our 1G e1000e or igb drivers) don't access the tcp header on transmit unless the TSO option is turned on. <1>BUG: unable to handle kernel NULL pointer dereference at 0000000d <1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] <4>pdpt = 0000000085e5d001 pde = 0000000000000000 <0>Oops: 0000 [#1] SMP [...] <4>Pid: 0, comm: swapper Tainted: P 2.6.35.12 #1 Greencity/Thurley <4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16 <4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] <4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000 <4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48 <4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 <0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000) <0>Stack: <4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002 <4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318 <4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002 <0>Call Trace: <4> [<d0d176c9>] ? 0xd0d176c9 <4> [<d0d18a4d>] ? 0xd0d18a4d <4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7 <4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114 <4> [<411f056a>] ? __qdisc_run+0xca/0xe0 <4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0 <4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f <4> [<411e94a1>] ? neigh_update+0x29c/0x330 <4> [<4121cf29>] ? arp_process+0x49c/0x4cd <4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<4121c6d5>] ? T.901+0x38/0x3b <4> [<4121c918>] ? arp_rcv+0xa3/0xb4 <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346 <4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f <4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30 <4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe] <4> [<41013468>] ? lapic_next_event+0x13/0x16 <4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4 <4> [<411e1b03>] ? net_rx_action+0x55/0x127 <4> [<4102da1a>] ? __do_softirq+0x77/0xeb <4> [<4102dab1>] ? do_softirq+0x23/0x27 <4> [<41003a67>] ? do_IRQ+0x7d/0x8e <4> [<41002a69>] ? common_interrupt+0x29/0x30 <4> [<41007bcf>] ? mwait_idle+0x48/0x4d <4> [<4100193b>] ? cpu_idle+0x37/0x4c <0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38 ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00 <0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24 <0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP Signed-off-by: Mukund Jampala <jbmukund@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-16 23:27:35 +01:00
Simon Arlott	df48419107	ipv6: Fix Makefile offload objects The following commit breaks IPv6 TCP transmission for me: Commit `75fe83c322` Author: Vlad Yasevich <vyasevic@redhat.com> Date: Fri Nov 16 09:41:21 2012 +0000 ipv6: Preserve ipv6 functionality needed by NET This patch fixes the typo "ipv6_offload" which should be "ipv6-offload". I don't know why not including the offload modules should break TCP. Disabling all offload options on the NIC didn't help. Outgoing pulseaudio traffic kept stalling. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-16 09:15:53 -08:00
Christoph Paasch	e337e24d66	inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since `093d282321` (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-14 13:14:07 -05:00
Duan Jiong	093d04d42f	ipv6: Change skb->data before using icmpv6_notify() to propagate redirect In function ndisc_redirect_rcv(), the skb->data points to the transport header, but function icmpv6_notify() need the skb->data points to the inner IP packet. So before using icmpv6_notify() to propagate redirect, change skb->data to point the inner IP packet that triggered the sending of the Redirect, and introduce struct rd_msg to make it easy. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-14 13:14:07 -05:00
YOSHIFUJI Hideaki / 吉藤英明	7bdc1b4aba	ndisc: Fix padding error in link-layer address option. If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad, post padding is not initialized correctly. (Un)fortunately, the only type that requires pad is Infiniband, whose pad is 2 and data_len is 20, and this logical error has not become obvious, but it is better to fix. Note that ndisc_opt_addr_space() handles the situation described above correctly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-13 12:58:11 -05:00
YOSHIFUJI Hideaki	fd0ea7dbfa	ndisc: Unexport ndisc_{build,send}_skb(). These symbols were exported for bonding device by commit `305d552a` ("bonding: send IPv6 neighbor advertisement on failover"). It bacame obsolete by commit `7c899432` ("bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by commit `4f5762ec` ("bonding: Remove obsolete source file 'bond_ipv6.c'"). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-12 12:42:29 -05:00
Eric Dumazet	0e1efe9d5e	ipv6: avoid taking locks at socket dismantle ipv6_sock_mc_close() is called for ipv6 sockets at close time, and most of them don't use multicast. Add a test to avoid contention on a shared spinlock. Same heuristic applies for ipv6_sock_ac_close(), to avoid contention on a shared rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-05 16:01:28 -05:00
David S. Miller	b1afce9538	ipv6: Protect ->mc_forwarding access with CONFIG_IPV6_MROUTE Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 14:46:34 -05:00
Nicolas Dichtel	193c1e478c	ip6mr: fix rtm_family of rtnl msg We talk about IPv6, hence the family is RTNL_FAMILY_IP6MR! rtnl_register() is already called with RTNL_FAMILY_IP6MR. The bug is here since the beginning of this function (commit `5b285cac35`). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:27:24 -05:00
Nicolas Dichtel	812e44dd18	ip6mr: advertise new mfc entries via rtnl This patch allows to monitor mf6c activities via rtnetlink. To avoid parsing two times the mf6c oifs, we use maxvif to allocate the rtnl msg, thus we may allocate some superfluous space. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:11 -05:00
Nicolas Dichtel	1eb99af52c	ipmr/ip6mr: allow to get unresolved cache via netlink /proc/net/ip[6]_mr_cache allows to get all mfc entries, even if they are put in the unresolved list (mfc[6]_unres_queue). But only the table RT_TABLE_DEFAULT is displayed. This patch adds the parsing of the unresolved list when the dump is made via rtnetlink, hence each table can be checked. In IPv6, we set rtm_type in ip6mr_fill_mroute(), because in case of unresolved mfc __ip6mr_fill_mroute() will not set it. In IPv4, it is already done. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:11 -05:00
Nicolas Dichtel	9a68ac72a4	ipmr/ip6mr: report origin of mfc entry into rtnl msg A mfc entry can be static or not (added via the mroute_sk socket). The patch reports MFC_STATIC flag into rtm_protocol by setting rtm_protocol to RTPROT_STATIC or RTPROT_MROUTED. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:11 -05:00
Nicolas Dichtel	adfa85e45d	ipmr/ip6mr: advertise mfc stats via rtnetlink These statistics can be checked only via /proc/net/ip_mr_cache or SIOCGETSGCNT[_IN6] and thus only for the table RT_TABLE_DEFAULT. Advertising them via rtnetlink allows to get statistics for all cache entries, whatever the table is. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:10 -05:00
Nicolas Dichtel	70b386a0cc	ip6mr: use nla_nest_* helpers This patch removes the skb manipulations when nested attributes are added by using standard helpers. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:10 -05:00
Nicolas Dichtel	d67b8c616b	netconf: advertise mc_forwarding status This patch advertise the MC_FORWARDING status for IPv4 and IPv6. This field is readonly, only multicast engine in the kernel updates it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:08:10 -05:00
David S. Miller	e8ad1a8fab	Merge branch 'master' of git://1984.lsi.us.es/nf-next Pablo Neira Ayuso says: ==================== * Remove limitation in the maximum number of supported sets in ipset. Now ipset automagically increments the number of slots in the array of sets by 64 new spare slots, from Jozsef Kadlecsik. * Partially remove the generic queue infrastructure now that ip_queue is gone. Its only client is nfnetlink_queue now, from Florian Westphal. * Add missing attribute policy checkings in ctnetlink, from Florian Westphal. * Automagically kill conntrack entries that use the wrong output interface for the masquerading case in case of routing changes, from Jozsef Kadlecsik. * Two patches two improve ct object traceability. Now ct objects are always placed in any of the existing lists. This allows us to dump the content of unconfirmed and dying conntracks via ctnetlink as a way to provide more instrumentation in case you suspect leaks, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-04 13:01:19 -05:00
Paul Marks	a5a81f0b90	ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=n I believe this commit from 2008 was incorrect: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503 When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be sprayed across all routers (indirectly triggering NUD) until one of them becomes NUD_VALID. However, the following experiment demonstrates that this does not work: 1) Connect to an IPv6 network. 2) Change the router's MAC (and link-local) address. The kernel will lock onto the first router and never try the new one, even if the first becomes unreachable. This patch fixes the problem by allowing rt6_check_neigh() to return 0; if all routers return 0, then rt6_select() will fall back to round-robin behavior. This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y. Note that rt6_check_neigh() is only used in a boolean context, so I've changed its return type accordingly. Signed-off-by: Paul Marks <pmarks@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 15:34:47 -05:00
Shmulik Ladkani	9ba2add3cf	ipv6: Make 'addrconf_rs_timer' send Router Solicitations (and re-arm itself) if Router Advertisements are accepted As of `026359b` [ipv6: Send ICMPv6 RSes only when RAs are accepted], Router Solicitations are sent whenever kernel accepts Router Advertisements on the interface. However, this logic isn't reflected in 'addrconf_rs_timer'. The timer fails to issue subsequent RS messages (and fails to re-arm itself) if forwarding is enabled and the special hybrid mode is enabled (accept_ra=2). Fix the condition determining whether next RS should be sent, by using 'ipv6_accept_ra()'. Reported-by: Ami Koren <amikoren@yahoo.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 13:59:57 -05:00
Jozsef Kadlecsik	a0ecb85a2c	netfilter: nf_nat: Handle routing changes in MASQUERADE target When the route changes (backup default route, VPNs) which affect a masqueraded target, the packets were sent out with the outdated source address. The patch addresses the issue by comparing the outgoing interface directly with the masqueraded interface in the nat table. Events are inefficient in this case, because it'd require adding route events to the network core and then scanning the whole conntrack table and re-checking the route for all entry. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-12-03 15:14:20 +01:00
Shmulik Ladkani	aeaf6e9d2f	ipv6: unify logic evaluating inet6_dev's accept_ra property As of `026359b` [ipv6: Send ICMPv6 RSes only when RAs are accepted], the logic determining whether to send Router Solicitations is identical to the logic determining whether kernel accepts Router Advertisements. However the condition itself is repeated in several code locations. Unify it by introducing 'ipv6_accept_ra()' accessor. Also, simplify the condition expression, making it more readable. No semantic change. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-01 11:36:37 -05:00
Eric Dumazet	ce43b03e88	net: move inet_dport/inet_num in sock_common commit `68835aba4d` (net: optimize INET input path further) moved some fields used for tcp/udp sockets lookup in the first cache line of struct sock_common. This patch moves inet_dport/inet_num as well, filling a 32bit hole on 64 bit arches and reducing number of cache line misses in lookups. Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match before addresses match, as this check is more discriminant. Remove the hash check from MATCH() macros because we dont need to re validate the hash value after taking a refcount on socket, and use likely/unlikely compiler hints, as the sk_hash/hash check makes the following conditional tests 100% predicted by cpu. Introduce skc_addrpair/skc_portpair pair values to better document the alignment requirements of the port/addr pairs used in the various MATCH() macros, and remove some casts. The namespace check can also be done at last. This slightly improves TCP/UDP lookup times. IP/TCP early demux needs inet->rx_dst_ifindex and TCP needs inet->min_ttl, lets group them together in same cache line. With help from Ben Hutchings & Joe Perches. Idea of this patch came after Ling Ma proposal to move skc_hash to the beginning of struct sock_common, and should allow him to submit a final version of his patch. My tests show an improvement doing so. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Joe Perches <joe@perches.com> Cc: Ling Ma <ling.ma.program@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-30 15:02:56 -05:00
David S. Miller	e7165030db	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Conflicts: net/ipv6/exthdrs_core.c Jesse Gross says: ==================== This series of improvements for 3.8/net-next contains four components: * Support for modifying IPv6 headers * Support for matching and setting skb->mark for better integration with things like iptables * Ability to recognize the EtherType for RARP packets * Two small performance enhancements The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge conflicts. I left it as is but can do the merge if you want. The conflicts are: * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of exthdrs_core.c. Both should stay. * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c after this patch. The IPVS user has two instances of the old constant name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-30 12:01:30 -05:00
Nicolas Dichtel	f4e0b4c5e1	ip6tnl/sit: drop packet if ECN present with not-ECT This patch reports the change made by Stephen Hemminger in ipip and gre[6] in commit `eccc1bb8d4` (tunnel: drop packet if ECN present with not-ECT). Goal is to handle RFC6040, Section 4.2: Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. The patch takes benefits from common function added in net/inet_ecn.h. Like it was done for Xin4 tunnels, it adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Errors are also tracked via rx_frame_error. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-28 11:37:11 -05:00
Joe Perches	03f52a0a55	ip6mr: Add sizeof verification to MRT6_ASSERT and MT6_PIM Verify the length of the user-space arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:35:58 -05:00
Joe Perches	53d6841d22	ipv4/ipmr and ipv6/ip6mr: Convert int mroute_do_<foo> to bool Save a few bytes per table by convert mroute_do_assert and mroute_do_pim from int to bool. Remove !! as the compiler does that when assigning int to bool. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-25 16:34:17 -05:00
David S. Miller	24bc518a68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/tx.c Minor iwlwifi conflict in TX queue disabling between 'net', which removed a bogus warning, and 'net-next' which added some status register poking code. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-25 12:49:17 -05:00
Andrey Vagin	2b9164771e	ipv6: adapt connect for repair move This is work the same as for ipv4. All other hacks about tcp repair are in common code for ipv4 and ipv6, so this patch is enough for repairing ipv6 connections. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-22 15:30:14 -05:00
David S. Miller	242a18d137	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== This pull request is intended for net-next and contains the following changes: 1) Remove a redundant check when initializing the xfrm replay functions, from Ulrich Weber. 2) Use a faster per-cpu helper when allocating ipcomt transforms, from Shan Wei. 3) Use a static gc threshold value for ipv6, simmilar to what we do for ipv4 now. 4) Remove a commented out function call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-22 15:25:55 -05:00
Eric Dumazet	b4dd006760	ipv6: fix inet6_csk_update_pmtu() return value In case of error, inet6_csk_update_pmtu() should consistently return NULL. Bug added in commit `35ad9b9cf7` (ipv6: Add helper inet6_csk_update_pmtu().) Reported-by: Lluís Batlle i Rossell <viric@viric.name> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-20 15:16:15 -05:00
Nicolas Dichtel	e2f1f072db	sit: allow to configure 6rd tunnels via netlink This patch add the support of 6RD tunnels management via netlink. Note that netdev_state_change() is now called when 6RD parameters are updated. 6RD parameters are updated only if there is at least one 6RD attribute. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-20 13:43:28 -05:00
Eric W. Biederman	3594698a1f	net: Make CAP_NET_BIND_SERVICE per user namespace Allow privileged users in any user namespace to bind to privileged sockets in network namespaces they control. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:33:37 -05:00
Eric W. Biederman	b51642f6d7	net: Enable a userns root rtnl calls that are safe for unprivilged users - Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:33:36 -05:00
Eric W. Biederman	c027aab4a6	net: Enable some sysctls that are safe for the userns root - Enable the per device ipv4 sysctls: net/ipv4/conf/<if>/forwarding net/ipv4/conf/<if>/mc_forwarding net/ipv4/conf/<if>/accept_redirects net/ipv4/conf/<if>/secure_redirects net/ipv4/conf/<if>/shared_media net/ipv4/conf/<if>/rp_filter net/ipv4/conf/<if>/send_redirects net/ipv4/conf/<if>/accept_source_route net/ipv4/conf/<if>/accept_local net/ipv4/conf/<if>/src_valid_mark net/ipv4/conf/<if>/proxy_arp net/ipv4/conf/<if>/medium_id net/ipv4/conf/<if>/bootp_relay net/ipv4/conf/<if>/log_martians net/ipv4/conf/<if>/tag net/ipv4/conf/<if>/arp_filter net/ipv4/conf/<if>/arp_announce net/ipv4/conf/<if>/arp_ignore net/ipv4/conf/<if>/arp_accept net/ipv4/conf/<if>/arp_notify net/ipv4/conf/<if>/proxy_arp_pvlan net/ipv4/conf/<if>/disable_xfrm net/ipv4/conf/<if>/disable_policy net/ipv4/conf/<if>/force_igmp_version net/ipv4/conf/<if>/promote_secondaries net/ipv4/conf/<if>/route_localnet - Enable the global ipv4 sysctl: net/ipv4/ip_forward - Enable the per device ipv6 sysctls: net/ipv6/conf/<if>/forwarding net/ipv6/conf/<if>/hop_limit net/ipv6/conf/<if>/mtu net/ipv6/conf/<if>/accept_ra net/ipv6/conf/<if>/accept_redirects net/ipv6/conf/<if>/autoconf net/ipv6/conf/<if>/dad_transmits net/ipv6/conf/<if>/router_solicitations net/ipv6/conf/<if>/router_solicitation_interval net/ipv6/conf/<if>/router_solicitation_delay net/ipv6/conf/<if>/force_mld_version net/ipv6/conf/<if>/use_tempaddr net/ipv6/conf/<if>/temp_valid_lft net/ipv6/conf/<if>/temp_prefered_lft net/ipv6/conf/<if>/regen_max_retry net/ipv6/conf/<if>/max_desync_factor net/ipv6/conf/<if>/max_addresses net/ipv6/conf/<if>/accept_ra_defrtr net/ipv6/conf/<if>/accept_ra_pinfo net/ipv6/conf/<if>/accept_ra_rtr_pref net/ipv6/conf/<if>/router_probe_interval net/ipv6/conf/<if>/accept_ra_rt_info_max_plen net/ipv6/conf/<if>/proxy_ndp net/ipv6/conf/<if>/accept_source_route net/ipv6/conf/<if>/optimistic_dad net/ipv6/conf/<if>/mc_forwarding net/ipv6/conf/<if>/disable_ipv6 net/ipv6/conf/<if>/accept_dad net/ipv6/conf/<if>/force_tllao - Enable the global ipv6 sysctls: net/ipv6/bindv6only net/ipv6/icmp/ratelimit Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:33:00 -05:00
Eric W. Biederman	af31f412c7	net: Allow userns root to control ipv6 Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow the SIOCSIFADDR ioctl to add ipv6 addresses. Allow the SIOCDIFADDR ioctl to delete ipv6 addresses. Allow the SIOCADDRT ioctl to add ipv6 routes. Allow the SIOCDELRT ioctl to delete ipv6 routes. Allow creation of ipv6 raw sockets. Allow setting the IPV6_JOIN_ANYCAST socket option. Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR socket option. Allow setting the IPV6_TRANSPARENT socket option. Allow setting the IPV6_HOPOPTS socket option. Allow setting the IPV6_RTHDRDSTOPTS socket option. Allow setting the IPV6_DSTOPTS socket option. Allow setting the IPV6_IPSEC_POLICY socket option. Allow setting the IPV6_XFRM_POLICY socket option. Allow sending packets with the IPV6_2292HOPOPTS control message. Allow sending packets with the IPV6_2292DSTOPTS control message. Allow sending packets with the IPV6_RTHDRDSTOPTS control message. Allow setting the multicast routing socket options on non multicast routing sockets. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for setting up, changing and deleting tunnels over ipv6. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for setting up, changing and deleting ipv6 over ipv4 tunnels. Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding, deleting, and changing the potential router list for ISATAP tunnels. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:32:45 -05:00
Eric W. Biederman	dfc47ef863	net: Push capable(CAP_NET_ADMIN) into the rtnl methods - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:32:44 -05:00
Eric W. Biederman	464dc801c7	net: Don't export sysctls to unprivileged users In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 20:30:55 -05:00
Vlad Yasevich	75fe83c322	ipv6: Preserve ipv6 functionality needed by NET Some pieces of network use core pieces of IPv6 stack. Keep them available while letting new GSO offload pieces depend on CONFIG_INET. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-18 02:34:00 -05:00
David S. Miller	67f4efdce7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor line offset auto-merges. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-17 22:00:43 -05:00
David S. Miller	545b29019c	Merge branch 'master' of git://1984.lsi.us.es/nf-next Conflicts: net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c Minor conflict due to some IS_ENABLED conversions done in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-16 12:42:43 -05:00
Steffen Klassert	0afe21fdf6	xfrm6: Remove commented out function call to xfrm6_input_fini xfrm6_input_fini() is not in the tree since more than 10 years, so remove the commented out function call. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2012-11-16 08:07:56 +01:00
Vlad Yasevich	d4d0d3557b	ipv6: Fix build error with udp_offload Add ip6_checksum.h include. This should resolve the following issue that shows up on power: net/ipv6/udp_offload.c: In function 'udp6_ufo_send_check': net/ipv6/udp_offload.c:29:2: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 22:48:32 -05:00
Vlad Yasevich	f191a1d17f	net: Remove code duplication between offload structures Move the offload callbacks into its own structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:39:51 -05:00
Vlad Yasevich	c6b641a4c6	ipv6: Pull IPv6 GSO registration out of the module Sing GSO support is now separate, pull it out of the module and make it its own init call. Remove the cleanup functions as they are no longer called. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:39:24 -05:00
Vlad Yasevich	3c73a0368e	ipv6: Update ipv6 static library with newly needed functions UDP offload needs some additional functions to be in the static kernel for it work correclty. Move those functions into the core. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:39:23 -05:00
Vlad Yasevich	2207afc8bf	ipv6: Move exthdr offload support into separate file Move the exthdr offload functionality into a separeate file in preparate for moving it out of the module Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:18 -05:00
Vlad Yasevich	5edbb07dc9	ipv6: Separate out UDP offload functionality Pull UDP GSO code into a separate file in preparation for moving the code out of the module. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:18 -05:00
Vlad Yasevich	8663e02aba	ipv6: Separate tcp offload functionality Pull TCPv6 offload functionality into its won file in preparation for moving it out of the module. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:18 -05:00
Vlad Yasevich	d1da932ed4	ipv6: Separate ipv6 offload support Separate IPv6 offload functionality into its own file in preparation for the move out of the module Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:17 -05:00
Vlad Yasevich	3336288a9f	ipv6: Switch to using new offload infrastructure. Switch IPv6 protocol to using the new GRO/GSO calls and data. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:17 -05:00
Vlad Yasevich	8ca896cfdd	ipv6: Add new offload registration infrastructure. Create a new data structure for IPv6 protocols that holds GRO/GSO callbacks and a new array to track the protocols that register GRO/GSO. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:17 -05:00
Vlad Yasevich	22061d8014	net: Switch to using the new packet offload infrustructure Convert to using the new GSO/GRO registration mechanism and new packet offload structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 17:36:17 -05:00
Nicolas Dichtel	1ff05fb711	ip6tnl: fix sparse warnings in ip6_tnl_netlink_parms() This change fixes a sparse warning triggered by casting the flowinfo from netlink messages in an u32 instead of be32. This change corrects that in order to resolve the sparse warning. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 13:46:29 -05:00
Nicolas Dichtel	d440b72068	sit: fix sparse warnings This change fixes several sparse warnings about endianness problem. The wrong nla_*() functions were used. It also fix a sparse warning about a flag test (field i_flags). This field is used in this file like a local flag only, so it is more an u16 (gre uses it as a be16). This sparse warning was already there before the patch that add netlink management, the code has just been moved. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-15 13:46:29 -05:00
Steffen Klassert	c381328659	xfrm: Use a static gc threshold value for ipv6 Unlike ipv4 did, ipv6 does not handle the maximum number of cached routes dynamically. So no need to try to handle the IPsec gc threshold value dynamically. This patch sets the IPsec gc threshold value back to 1024 routes, as it is for non-IPsec routes. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2012-11-15 09:00:18 +01:00
Nicolas Dichtel	f372341602	sit: add support of link creation via rtnl This patch add the support of 'ip link .. type sit'. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:39 -05:00
Nicolas Dichtel	e4c94a9cdc	sit: rename rtnl functions for consistency Functions in this file start with ipip6_. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:38 -05:00
Nicolas Dichtel	a12c9a8581	sit/rtnl: add missing parameters on dump IFLA_IPTUN_FLAGS and IFLA_IPTUN_PMTUDISC were missing. There is only one possible flag in i_flag: SIT_ISATAP. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:38 -05:00
Nicolas Dichtel	f9cd5a5536	sit: always notify change when params are updated netdev_state_change() was called only when end points or link was updated. Now that all parameters are advertised via netlink, we must advertise any change. This patch also prepares the support of sit tunnels management via rtnl. The code which update tunnels will be put in a new function. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:38 -05:00
Nicolas Dichtel	0b11245722	ip6tnl: add support of link creation via rtnl This patch add the support of 'ip link .. type ip6tnl'. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:37 -05:00
Nicolas Dichtel	b58d731acc	ip6tnl: rename rtnl functions for consistency Functions in this file start with ip6_tnl_. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:37 -05:00
Nicolas Dichtel	cfa323b6b9	ip6tnl/rtnl: add IFLA_IPTUN_PROTO on dump IPv6 tunnels can have three mode: 4in6, 6in6 and xin6. This information was missing in the netlink message. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 22:02:37 -05:00
Li RongQing	c75ea26040	ipv6: remove obsolete comments in route.c Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 18:51:45 -05:00
Amerigo Wang	e086cadc08	net: unify for_each_ip_tunnel_rcu() The defitions of for_each_ip_tunnel_rcu() are same, so unify it. Also, don't hide the parameter 't'. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 18:49:50 -05:00
Amerigo Wang	aa0010f880	net: convert __IPTUNNEL_XMIT() to an inline function __IPTUNNEL_XMIT() is an ugly macro, convert it to a static inline function, so make it more readable. IPTUNNEL_XMIT() is unused, just remove it. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-14 18:49:50 -05:00
Hannes Frederic Sowa	d4596bad2a	ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-13 14:38:47 -05:00
Hannes Frederic Sowa	5cb04436ee	ipv6: add knob to send unsolicited ND on link-layer address change This patch introduces a new knob ndisc_notify. If enabled, the kernel will transmit an unsolicited neighbour advertisement on link-layer address change to update the neighbour tables of the corresponding hosts more quickly. This is the equivalent to arp_notify in ipv4 world. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-13 14:27:45 -05:00
Florian Westphal	d3976a53ce	netfilter: ipv6: only provide sk_bound_dev_if for link-local addr yoshfuji points out that sk_bound_dev_if should only be provided for link-local addresses. IPv6 getpeer/sockname also has this test, i.e. we will now only set sin6_scope_id if the original(!) destination was a link-local address. Reported-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-11-13 13:42:29 +01:00
Ansis Atteka	9195bb8e38	ipv6: improve ipv6_find_hdr() to skip empty routing headers This patch prepares ipv6_find_hdr() function so that it could be able to skip routing headers, where segements_left is 0. This is required to handle multiple routing header case correctly when changing IPv6 addresses. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2012-11-12 12:31:37 -08:00
YOSHIFUJI Hideaki / 吉藤英明	9fafd65ad4	ipv6 ndisc: Use pre-defined in6addr_linklocal_allnodes. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-12 15:23:21 -05:00
David S. Miller	d4185bbf62	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-10 18:32:51 -05:00
Jesse Gross	f8f626754e	ipv6: Move ipv6_find_hdr() out of Netfilter code. Open vSwitch will soon also use ipv6_find_hdr() so this moves it out of Netfilter-specific code into a more common location. Signed-off-by: Jesse Gross <jesse@nicira.com>	2012-11-09 17:05:07 -08:00
Nicolas Dichtel	c075b13098	ip6tnl: advertise tunnel param via rtnl It is usefull for daemons that monitor link event to have the full parameters of these interfaces when a rtnl message is sent. It allows also to dump them via rtnetlink. It is based on what is done for GRE tunnels. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-09 19:36:20 -05:00
Nicolas Dichtel	ba3e3f50a0	sit: advertise tunnel param via rtnl It is usefull for daemons that monitor link event to have the full parameters of these interfaces when a rtnl message is sent. It allows also to dump them via rtnetlink. It is based on what is done for GRE tunnels. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-09 19:36:20 -05:00
Nicolas Dichtel	a375413311	gre6: fix rtnl dump messages Spotted after a code review. Introduced by `c12b395a46` (gre: Support GRE over IPv6). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-09 17:11:17 -05:00
Hannes Frederic Sowa	60713a0ca7	ipv6: send unsolicited neighbour advertisements to all-nodes As documented in RFC4861 (Neighbor Discovery for IP version 6) 7.2.6., unsolicited neighbour advertisements should be sent to the all-nodes multicast address. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-09 16:18:52 -05:00
Li RongQing	a4477c4ddb	ipv6: remove rt6i_peer_genid from rt6_info and its handler 6431cbc25f(Create a mechanism for upward inetpeer propagation into routes) introduces these codes, but this mechanism is never enabled since rt6i_peer_genid always is zero whether it is not assigned or assigned by rt6_peer_genid(). After `5943634fc5` (ipv4: Maintain redirect and PMTU info in struct rtable again), the ipv4 related codes of this mechanism has been removed, I think we maybe able to remove them now. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-08 21:16:08 -05:00
Nicolas Dichtel	b20b6d9726	ndisc: fix a typo in a comment in ndisc_recv_na() Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-07 19:03:16 -05:00
Amerigo Wang	94e187c015	ipv6: introduce ip6_rt_put() As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-03 14:59:05 -04:00
Amerigo Wang	1a9408355e	ipv6: remove a useless NULL check In dev_forward_change(), it is useless to check if idev->dev is NULL, it is always non-NULL here. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-03 14:50:15 -04:00
Eric Dumazet	e6c022a4fa	tcp: better retrans tracking for defer-accept For passive TCP connections using TCP_DEFER_ACCEPT facility, we incorrectly increment req->retrans each time timeout triggers while no SYNACK is sent. SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for which we received the ACK from client). Only the last SYNACK is sent so that we can receive again an ACK from client, to move the req into accept queue. We plan to change this later to avoid the useless retransmit (and potential problem as this SYNACK could be lost) TCP_INFO later gives wrong information to user, claiming imaginary retransmits. Decouple req->retrans field into two independent fields : num_retrans : number of retransmit num_timeout : number of timeouts num_timeout is the counter that is incremented at each timeout, regardless of actual SYNACK being sent or not, and used to compute the exponential timeout. Introduce inet_rtx_syn_ack() helper to increment num_retrans only if ->rtx_syn_ack() succeeded. Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans when we re-send a SYNACK in answer to a (retransmitted) SYN. Prior to this patch, we were not counting these retransmits. Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS only if a synack packet was successfully queued. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Elliott Hughes <enh@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-03 14:45:00 -04:00
Nicolas Dichtel	1a72418bd7	ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop fib6_add_rt2node() will reject the nexthop if this flag is set, so we perform the check only for the first nexthop. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:38:19 -04:00
Florian Westphal	121d1e0941	netfilter: ipv6: add getsockopt to retrieve origdst userspace can query the original ipv4 destination address of a REDIRECTed connection via getsockopt(m_sock, SOL_IP, SO_ORIGINAL_DST, &m_server_addr, &addrsize) but for ipv6 no such option existed. This adds getsockopt(..., IPPROTO_IPV6, IP6T_SO_ORIGINAL_DST, ...). Without this, userspace needs to parse /proc or use ctnetlink, which appears to be overkill. This uses option number 80 for IP6T_SO_ORIGINAL_DST, which is spare, to use the same number we use in the IPv4 socket option SO_ORIGINAL_DST. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-11-02 12:26:32 +01:00
Amerigo Wang	07a936260a	ipv6: use IS_ENABLED() #if defined(CONFIG_FOO) \|\| defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 12:41:35 -04:00
David S. Miller	f8450bbe8c	Merge branch 'master' of git://1984.lsi.us.es/nf Pablo Neira Ayuso says: ==================== The following patchset contains fixes for your net tree, two of them are due to relatively recent changes, one has been a longstanding bug, they are: * Fix incorrect usage of rt_gateway in the H.323 helper, from Julian Anastasov. * Skip re-route in nf_nat code for ICMP traffic. If CONFIG_XFRM is enabled, we waste cycles to look up for the route again. This problem seems to be there since really long time. From Ulrich Weber. * Fix mismatching section in nf_conntrack_reasm, from Hein Tibosch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-31 14:54:15 -04:00
Wu Fengguang	6229b75d8d	netfilter: nf_nat: use PTR_RET Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR Generated by: coccinelle/api/ptr_ret.cocci Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-10-29 09:59:59 +01:00
Nicolas Dichtel	76f8f6cb76	rtnl/ipv6: add support of RTM_GETNETCONF This message allows to get the devconf for an interface. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-28 19:05:00 -04:00
Nicolas Dichtel	f3a1bfb11c	rtnl/ipv6: use netconf msg to advertise forwarding status Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-28 19:05:00 -04:00
Hein Tibosch	f1df1374dc	netfilter: nf_defrag_ipv6: solve section mismatch in nf_conntrack_reasm WARNING: net/ipv6/netfilter/nf_defrag_ipv6.o(.text+0xe0): Section mismatch in reference from the function nf_ct_net_init() to the function .init.text:nf_ct_frag6_sysctl_register() The function nf_ct_net_init() references the function __init nf_ct_frag6_sysctl_register(). In case nf_conntrack_ipv6 is compiled as a module, nf_ct_net_init could be called after the init code and data are unloaded. Therefore remove the "__net_init" annotation from nf_ct_frag6_sysctl_register(). Signed-off-by: Hein Tibosch <hein_tibosch@yahoo.es> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-10-28 22:44:15 +01:00
Ulrich Weber	38fe36a248	netfilter: nf_nat: don't check for port change on ICMP tuples ICMP tuples have id in src and type/code in dst. So comparing src.u.all with dst.u.all will always fail here and ip_xfrm_me_harder() is called for every ICMP packet, even if there was no NAT. Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-10-28 22:43:34 +01:00
Li RongQing	14edd87dc6	ipv6: Set default hoplimit as zero. Commit a02e4b7dae4551(Demark default hoplimit as zero) only changes the hoplimit checking condition and default value in ip6_dst_hoplimit, not zeros all hoplimit default value. Keep the zeroing ip6_template_metrics[RTAX_HOPLIMIT - 1] to force it as const, cause as a37e6e344910(net: force dst_default_metrics to const section) Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-24 23:14:17 -04:00
Nicolas Dichtel	b3ce5ae1fb	ipv6: fix sparse warnings in rt6_info_hash_nhsfn() Adding by commit `51ebd31815` which adds the support of ECMP for IPv6. Spotted-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-23 13:03:45 -04:00
Neal Cardwell	f3f121359c	ipv6: tcp: clean up tcp_v6_early_demux() icsk variable Remove an icsk variable, which by convention should refer to an inet_connection_sock rather than an inet_sock. In the process, make the tcp_v6_early_demux() code and formatting a bit more like tcp_v4_early_demux(), to ease comparisons and maintenance. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-23 13:03:44 -04:00
Nicolas Dichtel	51ebd31815	ipv6: add support of equal cost multipath (ECMP) Each nexthop is added like a single route in the routing table. All routes that have the same metric/weight and destination but not the same gateway are considering as ECMP routes. They are linked together, through a list called rt6i_siblings. ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after the other (in both case, the flag NLM_F_EXCL should not be set). The patch is based on a previous work from Luc Saillard <luc.saillard@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-23 02:38:32 -04:00
Eric Dumazet	9f0d3c2781	ipv6: addrconf: fix /proc/net/if_inet6 Commit `1d5783030a` (ipv6/addrconf: speedup /proc/net/if_inet6 filling) added bugs hiding some devices from if_inet6 and breaking applications. "ip -6 addr" could still display all IPv6 addresses, while "ifconfig -a" couldnt. One way to reproduce the bug is by starting in a shell : unshare -n /bin/bash ifconfig lo up And in original net namespace, lo device disappeared from if_inet6 Reported-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mihai Maruseac <mihai.maruseac@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-16 14:41:47 -04:00
Alexey Kuznetsov	4c67525849	tcp: resets are misrouted After commit `e2446eaa` ("tcp_v4_send_reset: binding oif to iif in no sock case").. tcp resets are always lost, when routing is asymmetric. Yes, backing out that patch will result in misrouting of resets for dead connections which used interface binding when were alive, but we actually cannot do anything here. What's died that's died and correct handling normal unbound connections is obviously a priority. Comment to comment: > This has few benefits: > 1. tcp_v6_send_reset already did that. It was done to route resets for IPv6 link local addresses. It was a mistake to do so for global addresses. The patch fixes this as well. Actually, the problem appears to be even more serious than guaranteed loss of resets. As reported by Sergey Soloviev <sol@eqv.ru>, those misrouted resets create a lot of arp traffic and huge amount of unresolved arp entires putting down to knees NAT firewalls which use asymmetric routing. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>	2012-10-12 13:52:40 -04:00
Eric Dumazet	863472454c	ipv6: gro: fix PV6_GRO_CB(skb)->proto problem It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive() if a new skb is allocated (to serve as an anchor for frag_list) We copy NAPI_GRO_CB() only (not the IPV6 specific part) in : NAPI_GRO_CB(nskb) = NAPI_GRO_CB(p); So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead of IPPROTO_TCP (6) ipv6_gro_complete() isnt able to call ops->gro_complete() [ tcp6_gro_complete() ] Fix this by moving proto in NAPI_GRO_CB() and getting rid of IPV6_GRO_CB Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 15:40:43 -04:00
Eric Dumazet	51ec04038c	ipv6: GRO should be ECN friendly IPv4 side of the problem was addressed in commit `a9e050f4e7` (net: tcp: GRO should be ECN friendly) This patch does the same, but for IPv6 : A Traffic Class mismatch doesnt mean flows are different, but instead should force a flush of previous packets. This patch removes artificial packet reordering problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 14:44:36 -04:00
Linus Torvalds	283dbd8205	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking changes from David Miller: "The most important bit in here is the fix for input route caching from Eric Dumazet, it's a shame we couldn't fully analyze this in time for 3.6 as it's a 3.6 regression introduced by the routing cache removal. Anyways, will send quickly to -stable after you pull this in. Other changes of note: 1) Fix lockdep splats in team and bonding, from Eric Dumazet. 2) IPV6 adds link local route even when there is no link local address, from Nicolas Dichtel. 3) Fix ixgbe PTP implementation, from Jacob Keller. 4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya. 5) MAC length computed improperly in VLAN demux, from Antonio Quartulli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt Remove noisy printks from llcp_sock_connect tipc: prevent dropped connections due to rcvbuf overflow silence some noisy printks in irda team: set qdisc_tx_busylock to avoid LOCKDEP splat bonding: set qdisc_tx_busylock to avoid LOCKDEP splat sctp: check src addr when processing SACK to update transport state sctp: fix a typo in prototype of __sctp_rcv_lookup() ipv4: add a fib_type to fib_info can: mpc5xxx_can: fix section type conflict can: peak_pcmcia: fix error return code can: peak_pci: fix error return code cxgb4: Fix build error due to missing linux/vmalloc.h include. bnx2x: fix ring size for 10G functions cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params() ixgbe: add support for X540-AT1 ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit ixgbe: fix PTP ethtool timestamping function ixgbe: (PTP) Fix PPS interrupt code ixgbe: Fix PTP X540 SDP alignment code for PPS signal ...	2012-10-06 03:11:59 +09:00
Andi Kleen	04a6f82cf0	sections: fix section conflicts in net Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:04:45 +09:00
Gao feng	6825a26c2d	ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt as we hold dst_entry before we call __ip6_del_rt, so we should alse call dst_release not only return -ENOENT when the rt6_info is ip6_null_entry. and we already hold the dst entry, so I think it's safe to call dst_release out of the write-read lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-04 16:00:07 -04:00
Nicolas Dichtel	62b54dd915	ipv6: don't add link local route when there is no link local address When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route to fe80::/64 is added in the main table: unreachable fe80::/64 dev lo proto kernel metric 256 error -101 This route does not match any prefix (no fe80:: address on lo). In fact, addrconf_dev_config() will not add link local address because this function filters interfaces by type. If the link local address is added manually, the route to the link local prefix will be automatically added by addrconf_add_linklocal(). Note also, that this route is not deleted when the address is removed. After looking at the code, it seems that addrconf_add_lroute() is redundant with addrconf_add_linklocal(), because this function will add the link local route when the link local address is configured. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-02 22:36:23 -04:00
Eric Dumazet	861b650101	tcp: gro: add checksuming helpers skb with CHECKSUM_NONE cant currently be handled by GRO, and we notice this deep in GRO stack in tcp[46]_gro_receive() But there are cases where GRO can be a benefit, even with a lack of checksums. This preliminary work is needed to add GRO support to tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-01 17:00:27 -04:00
Nicolas Dichtel	64c6d08e64	ipv6: del unreachable route when an addr is deleted on lo When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), two routes are added: - one in the local table: local 2002::1 via :: dev lo proto none metric 0 - one the in main table (for the prefix): unreachable 2002::1 dev lo proto kernel metric 256 error -101 When the address is deleted, the route inserted in the main table remains because we use rt6_lookup(), which returns NULL when dst->error is set, which is the case here! Thus, it is better to use ip6_route_lookup() to avoid this kind of filter. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-01 16:49:23 -04:00
Lin Ming	188c517a05	ipv6: return errno pointers consistently for fib6_add_1() fib6_add_1() should consistently return errno pointers, rather than a mixture of NULL and errno pointers. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-28 18:48:28 -04:00
David S. Miller	6a06e5e1bb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-28 14:40:49 -04:00
Eric Dumazet	bcc452935d	ipv6: gre: remove ip6gre_header_parse() dev_parse_header() callers provide 8 bytes of storage, so it's not possible to store an IPv6 address. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-27 18:49:22 -04:00
Konstantin Khlebnikov	4b7cc7fc26	nf_defrag_ipv6: fix oops on module unloading fix copy-paste error introduced in linux-next commit "ipv6: add a new namespace for nf_conntrack_reasm" Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Amerigo Wang <amwang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-27 18:14:55 -04:00
stephen hemminger	eccc1bb8d4	tunnel: drop packet if ECN present with not-ECT Linux tunnels were written before RFC6040 and therefore never implemented the corner case of ECN getting set in the outer header and the inner header not being ready for it. Section 4.2. Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. This patch moves the ECN decap logic out of the individual tunnels into a common place. It also adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Overloads rx_frame_error to keep track of ECN related error. Thanks to Chris Wright who caught this while reviewing the new VXLAN tunnel. This code was tested by injecting faulty logic in other end GRE to send incorrectly encapsulated packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-27 18:12:37 -04:00
stephen hemminger	b0558ef24a	xfrm: remove extranous rcu_read_lock The handlers for xfrm_tunnel are always invoked with rcu read lock already. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-27 18:12:37 -04:00
stephen hemminger	0c5794a66c	gre: remove unnecessary rcu_read_lock/unlock The gre function pointers for receive and error handling are always called (from gre.c) with rcu_read_lock already held. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-27 18:12:37 -04:00
Eric Dumazet	96af69ea2a	ipv6: mip6: fix mip6_mh_filter() mip6_mh_filter() should not modify its input, or else its caller would need to recompute ipv6_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-25 16:04:44 -04:00

1 2 3 4 5 ...

3244 Commits