Commit Graph

195 Commits

Author SHA1 Message Date
YOSHIFUJI Hideaki / 吉藤英明 3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Nicolas Dichtel f4e0b4c5e1 ip6tnl/sit: drop packet if ECN present with not-ECT
This patch reports the change made by Stephen Hemminger in ipip and gre[6] in
commit eccc1bb8d4 (tunnel: drop packet if ECN present with not-ECT).

Goal is to handle RFC6040, Section 4.2:

Default Tunnel Egress Behaviour.
 o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
      propagate any other ECN codepoint onwards.  This is because the
      inner Not-ECT marking is set by transports that rely on dropped
      packets as an indication of congestion and would not understand or
      respond to any other ECN codepoint [RFC4774].  Specifically:

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.

The patch takes benefits from common function added in net/inet_ecn.h.

Like it was done for Xin4 tunnels, it adds logging to allow detecting broken
systems that set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header). Errors are also tracked via
rx_frame_error.

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:37:11 -05:00
Eric W. Biederman af31f412c7 net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed while
resource control is left unchanged.

Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.

Allow creation of ipv6 raw sockets.

Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.

Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.

Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.

Allow setting the multicast routing socket options on non multicast
routing sockets.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.

Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Nicolas Dichtel 1ff05fb711 ip6tnl: fix sparse warnings in ip6_tnl_netlink_parms()
This change fixes a sparse warning triggered by casting the flowinfo from
netlink messages in an u32 instead of be32. This change corrects that in order
to resolve the sparse warning.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 13:46:29 -05:00
Nicolas Dichtel 0b11245722 ip6tnl: add support of link creation via rtnl
This patch add the support of 'ip link .. type ip6tnl'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Nicolas Dichtel b58d731acc ip6tnl: rename rtnl functions for consistency
Functions in this file start with ip6_tnl_.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Nicolas Dichtel cfa323b6b9 ip6tnl/rtnl: add IFLA_IPTUN_PROTO on dump
IPv6 tunnels can have three mode: 4in6, 6in6 and xin6.
This information was missing in the netlink message.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Amerigo Wang aa0010f880 net: convert __IPTUNNEL_XMIT() to an inline function
__IPTUNNEL_XMIT() is an ugly macro, convert it to a static
inline function, so make it more readable.

IPTUNNEL_XMIT() is unused, just remove it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 18:49:50 -05:00
Nicolas Dichtel c075b13098 ip6tnl: advertise tunnel param via rtnl
It is usefull for daemons that monitor link event to have the full parameters of
these interfaces when a rtnl message is sent.
It allows also to dump them via rtnetlink.

It is based on what is done for GRE tunnels.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-09 19:36:20 -05:00
Amerigo Wang 94e187c015 ipv6: introduce ip6_rt_put()
As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:59:05 -04:00
Dan Carpenter 5ef5d6c569 gre: information leak in ip6_tnl_ioctl()
There is a one byte hole between p->hop_limit and p->flowinfo where
stack memory is leaked to the user.  This was introduced in c12b395a46
"gre: Support GRE over IPv6".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-20 02:21:30 -07:00
xeb@mail.ru c12b395a46 gre: Support GRE over IPv6
GRE over IPv6 implementation.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:28:32 -07:00
Eric Dumazet ddbe503203 ipv6: add ipv6_addr_hash() helper
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 11:28:46 -07:00
David S. Miller 6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
David S. Miller 1ed5c48f23 net: Remove checks for dst_ops->redirect being NULL.
No longer necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:41:25 -07:00
David S. Miller ec18d9a269 ipv6: Add redirect support to all protocol icmp error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:25:15 -07:00
Ben Hutchings 2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
Ville Nuorvala d0087b29f7 ipv6_tunnel: Allow receiving packets on the fallback tunnel if they pass sanity checks
At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
an extra IP header on incoming packets so they can be routed to the backend.
In the v4 tunnel driver, when these packets fall on the default tunl0 device,
the behavior is to decapsulate them and drop them back on the stack. So our
setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
address.

In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
behavior - if you didn't have an explicit tunnel setup, it would drop the
packet.

This patch brings that v4 feature to the v6 driver.

The same IPv6 address checks are performed as with any normal tunnel,
but as the fallback tunnel endpoint addresses are unspecified, the checks
must be performed on a per-packet basis, rather than at tunnel
configuration time.

[Patch description modified by phil@ipom.com]

Signed-off-by: Ville Nuorvala <ville.nuorvala@gmail.com>
Tested-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 00:52:32 -07:00
Eric Dumazet 92113bfde2 ipv6: bool conversions phase1
ipv6_opt_accepted() returns a bool, and can use const pointers

ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 02:24:13 -04:00
Joe Perches 91df42bedc net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debug
Use the current debugging style and enable dynamic_debug.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Joe Perches f32138319c net: ipv6: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv6: " to appropriate files.

Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG).
Standardize on "%s: " not "%s(): " when emitting __func__.
Use "%s: ", __func__ instead of embedding function name.
Coalesce formats, align arguments.

ADDRCONF output is now prefixed with "IPv6: "

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Joe Perches e87cc4728f net: Convert net_ratelimit uses to net_<level>_ratelimited
Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 13:45:03 -04:00
Eric Dumazet 9ff264492f ip6_tunnel: dont drop packet but consume it
When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Eric Dumazet cf778b00e9 net: reintroduce missing rcu_assign_pointer() calls
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).

We miss needed barriers, even on x86, when y is not NULL.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 12:26:56 -08:00
David S. Miller d191854282 ipv6: Kill rt6i_dev and rt6i_expires defines.
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.

And it makes grepping for dst_entry member uses much harder too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28 20:19:20 -05:00
Alexey Dobriyan 4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
David S. Miller efd0bf97de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The forcedeth changes had a conflict with the conversion over
to atomic u64 statistics in net-next.

The libertas cfg.c code had a conflict with the bss reference
counting fix by John Linville in net-next.

Conflicts:
	drivers/net/ethernet/nvidia/forcedeth.c
	drivers/net/wireless/libertas/cfg.c
2011-11-21 13:50:33 -05:00
Josh Boyer 731abb9cb2 ip6_tunnel: copy parms.name after register_netdevice
Commit 1c5cae815d removed an explicit call to dev_alloc_name in ip6_tnl_create
because register_netdevice will now create a valid name.  This works for the
net_device itself.

However the tunnel keeps a copy of the name in the parms structure for the
ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
ip6_tnl_create, but it is done before register_netdevice is called so the name
is set to a bogus value in the parms.name structure.

This shows up if you do a simple tunnel add, followed by a tunnel show:

[root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
[root@localhost ~]# ip -6 tunnel show
ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
[root@localhost ~]#

Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
register_netdevice has successfully returned.

Cc: stable@vger.kernel.org
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14 00:24:06 -05:00
Eric Dumazet 8ce120f118 net: better pcpu data alignment
Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()

percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-08 15:10:59 -05:00
Eric Dumazet d24f22f3df ip6_tunnel: add optional fwmark inherit
Add IP6_TNL_F_USE_ORIG_FWMARK to ip6_tunnel, so that ip6_tnl_xmit2()
makes a route lookup taking into account skb->fwmark and doesnt cache
lookup result.

This permits more flexibility in policies and firewall setups.

To setup such a tunnel, "fwmark inherit" option should be added to "ip
-f inet6 tunnel" command.

Reported-by: Anders Franzen <Anders.Franzen@ericsson.com>
CC: Hans Schillström <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-20 14:50:00 -04:00
Stephen Hemminger a9b3cd7f32 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

// </smpl>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02 04:29:23 -07:00
Eric Dumazet 89b0212697 ip6tnl: avoid touching dst refcount in ip6_tnl_xmit2()
Even using percpu stats, we still hit tunnel dst_entry refcount in
ip6_tnl_xmit2()

Since we are in RCU locked section, we can use skb_dst_set_noref() and
avoid these atomic operations, leaving dst shared on cpus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 00:12:00 -07:00
Arun Sharma 60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Jiri Pirko 1c5cae815d net: call dev_alloc_name from register_netdevice
Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.

The possibility to call dev_alloc_name in advance remains.

This also fixes veth creation regresion caused by
84c49d8c3e

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05 10:57:45 -07:00
David S. Miller 31e4543db2 ipv4: Make caller provide on-stack flow key to ip_route_output_ports().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03 20:25:42 -07:00
Eric Dumazet b71d1d426d inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22 11:04:14 -07:00
David S. Miller 4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller 1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller 78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller 33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
stephen hemminger 6dfbd87a20 ip6ip6: autoload ip6 tunnel
Add necessary alias to autoload ip6ip6 tunnel module.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-10 14:18:48 -08:00
David S. Miller b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
David S. Miller 452edd598f xfrm: Return dst directly from xfrm_lookup()
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 13:27:41 -08:00
David S. Miller fe6c791570 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
	net/llc/af_llc.c
2010-12-08 13:47:38 -08:00
Anders Franzen 381601e5bb Make the ip6_tunnel reflect the true mtu.
The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the
underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is
1460.
However, when creating a tunnel the encap limit option is enabled by default, and it
consumes 8 bytes more, so the true mtu shall be 1452.

I dont really know if this breaks some statement in some RFC, so this is a request for
comments.

Signed-off-by: Anders Franzen <anders.franzen@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 10:55:47 -08:00
Shan Wei d3c15cab21 ipv6: kill two unused macro definition
1. IPV6_TLV_TEL_DST_SIZE
This has not been using for several years since created.

2. RT6_INFO_LEN
commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

commit 33120b30cc
Author: Alexey Dobriyan <adobriyan@sw.ru>
Date:   Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Pavel Emelyanov 74b0b85b88 tunnels: Fix tunnels change rcu protection
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.

The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.

Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).

The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 14:20:08 -07:00
Anders Franzen 7e223de84b ip6_tunnel dont update the mtu on the route.
The ip6_tunnel device did not unset the flag,
IFF_XMIT_DST_RELEASE. This will make the dev layer
to release the dst before calling the tunnel.
The tunnel will not update any mtu/pmtu info, since
it does not have a dst on the skb.
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-24 15:23:36 -07:00
Eric Dumazet caf586e5f2 net: add a core netdev->rx_dropped counter
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)

We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 14:47:55 -07:00
Eric Dumazet 8560f2266b ip6tnl: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-29 13:25:52 -07:00
Eric Dumazet 8990f468ae net: rx_dropped accounting
Under load, netif_rx() can drop incoming packets but administrators dont
have a chance to spot which device needs some tuning (RPS activation for
example)

This patch adds rx_dropped accounting in vlans and tunnels.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-20 10:08:58 -07:00
Eric Dumazet 9476763262 ip6tnl: get rid of ip6_tnl_lock
As RTNL is held while doing tunnels inserts and deletes, we can remove
ip6_tnl_lock spinlock. My initial RCU conversion was conservative and
converted the rwlock to spinlock, with no RTNL requirement.

Use appropriate rcu annotations and modern lockdep checks as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-16 21:58:44 -07:00
Eric Dumazet 3ff2cfa55f ipv6: struct xfrm6_tunnel in read_mostly section
tunnel6_handlers chain being scanned for each incoming packet,
make sure it doesnt share an often dirtied cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:50:46 -07:00
Changli Gao d8d1f30b95 net-next: remove useless union keyword
remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:31:35 -07:00
Eric Dumazet d19d56ddc8 net: Introduce skb_tunnel_rx() helper
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:36:55 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Alexey Dobriyan 3ffe533c87 ipv6: drop unused "dev" arg of icmpv6_send()
Dunno, what was the idea, it wasn't used for a long time.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 14:30:17 -08:00
Alexey Dobriyan d5aa407f59 tunnels: fix netns vs proto registration ordering
Same stuff as in ip_gre patch: receive hook can be called before netns
setup is done, oopsing in net_generic().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16 14:55:25 -08:00
Alexey Dobriyan 2c8c1e7297 net: spread __net_init, __net_exit
__net_init/__net_exit are apparently not going away, so use them
to full extent.

In some cases __net_init was removed, because it was called from
__net_exit code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-17 19:16:02 -08:00
Eric W. Biederman ac31cd3cba net: Simplify ip6_tunnel pernet operations.
Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:59 -08:00
Eric Dumazet f99189b186 netns: net_identifiers should be read_mostly
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:25 -08:00
Eric Dumazet f1a28eab20 ip6tnl: less dev_put() calls
Using dev_get_by_index_rcu() in ip6_tnl_rcv_ctl() & ip6_tnl_xmit_ctl()
avoids touching device refcount.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-02 03:42:40 -08:00
Eric Dumazet cf4432f550 ip6tnl: Optimize multiple unregistration
Speedup module unloading by factorizing synchronize_rcu() calls

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 01:13:48 -07:00
Eric Dumazet 2922bc8aed ip6tnl: convert hash tables locking to RCU
ip6_tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:07:58 -07:00
Eric Dumazet a43912ab19 tunnel: eliminate recursion field
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.

This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24 15:39:22 -07:00
Stephen Hemminger 6fef4c0c8e netdev: convert pseudo-devices to netdev_tx_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:07 -07:00
Patrick McHardy 6ed106549d net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.

Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:16:04 -07:00
Brian Haley d5fdd6babc ipv6: Use correct data types for ICMPv6 type and code
Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23 04:31:07 -07:00
Eric Dumazet adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Jiri Pirko 3a6d54c563 net: remove needless (now buggy) & from dev->dev_addr
Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:47 -07:00
Noriaki TAKAMIYA 20461c1740 IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created.
When the user creates IPv6 over IPv6 tunnel, the device name created
by the kernel isn't set to t->parm.name, which is referred as the
result of ioctl().

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 15:01:19 -08:00
Alexey Dobriyan 52479b623d netns xfrm: lookup in netns
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:35:18 -08:00
Alexey Dobriyan be77e59307 net: fix tunnels in netns after ndo_ changes
dev_net_set() should be the very first thing after alloc_netdev().

"ndo_" changes turned simple assignment (which is OK to do before netns
assignment) into quite non-trivial operation (which is not OK, init_net was
used). This leads to incomplete initialisation of tunnel device in netns.

BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f
*pde = 00000000 
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/lo/operstate

Pid: 10, comm: netns Not tainted (2.6.28-rc6 #1) 
EIP: 0060:[<c02efdb5>] EFLAGS: 00010246 CPU: 0
EIP is at ip6_tnl_exit_net+0x37/0x4f
EAX: 00000000 EBX: 00000020 ECX: 00000000 EDX: 00000003
ESI: c5caef30 EDI: c782bbe8 EBP: c7909f50 ESP: c7909f48
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process netns (pid: 10, ti=c7908000 task=c7905780 task.ti=c7908000)
Stack:
 c03e75e0 c7390bc8 c7909f60 c0245448 c7390bd8 c7390bf0 c7909fa8 c012577a
 00000000 00000002 00000000 c0125736 c782bbe8 c7909f90 c0308fe3 c782bc04
 c7390bd4 c0245406 c084b718 c04f0770 c03ad785 c782bbe8 c782bc04 c782bc0c
Call Trace:
 [<c0245448>] ? cleanup_net+0x42/0x82
 [<c012577a>] ? run_workqueue+0xd6/0x1ae
 [<c0125736>] ? run_workqueue+0x92/0x1ae
 [<c0308fe3>] ? schedule+0x275/0x285
 [<c0245406>] ? cleanup_net+0x0/0x82
 [<c0125ae1>] ? worker_thread+0x81/0x8d
 [<c0128344>] ? autoremove_wake_function+0x0/0x33
 [<c0125a60>] ? worker_thread+0x0/0x8d
 [<c012815c>] ? kthread+0x39/0x5e
 [<c0128123>] ? kthread+0x0/0x5e
 [<c0103b9f>] ? kernel_thread_helper+0x7/0x10
Code: db e8 05 ff ff ff 89 c6 e8 dc 04 f6 ff eb 08 8b 40 04 e8 38 89 f5 ff 8b 44 9e 04 85 c0 75 f0 43 83 fb 20 75 f2 8b 86 84 00 00 00 <8b> 40 04 e8 1c 89 f5 ff e8 98 04 f6 ff 89 f0 e8 f8 63 e6 ff 5b 
EIP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f SS:ESP 0068:c7909f48
---[ end trace 6c2f2328fccd3e0c ]---

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 17:26:26 -08:00
Qinghuang Feng cf005b1d0e net: remove redundant argument comments
Remove redundant argument comments in files of net/*

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 17:15:03 -08:00
Stephen Hemminger 1326c3d5a4 ipv6: convert tunnels to net_device_ops
Like IPV4, convert the tunnel virtual devices to use net_device_ops.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-20 20:33:56 -08:00
Arnaldo Carvalho de Melo 6067804047 net: Use hton[sl]() instead of __constant_hton[sl]() where applicable
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 22:20:49 -07:00
Adrian Bunk 0b04082995 net: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 21:00:38 -07:00
Pavel Emelyanov 3dca02af38 ip6tnl: Use on-device stats instead of private ones.
This tunnel uses its own private structure and requires separate
patch to switch from private stats to on-device ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:17:05 -07:00
Pavel Emelyanov 554eb27782 [IP6TUNNEL]: Allow to create IP6 tunnels in net namespaces.
And no need in some IPPROTO_XXX enabling, since ipv6 code
doesn't have any filtering.

So, just set proper net and mark device with NETNS_LOCAL.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:24:13 -07:00
Pavel Emelyanov 2f7f54b725 [IP6TUNNEL]: Use proper net instead of init_net stubs.
All the ip_route_output_key(), dev_get_by_...() and ipv6_chk_addr()
calls are now stubbed with init_net.

Fortunately, all the places already have where to get the proper
net from.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:44 -07:00
Pavel Emelyanov 3e6c9fb5f5 [IP6TUNNEL]: Make tunnels hashes per-net.
Move hashes in the struct ip6_tnl_net, replace tnls_xxx[] with 
ip6n->tnlx_xxx[] and handle init and exit appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:22 -07:00
Pavel Emelyanov 15820e1290 [IP6TUNNEL]: Make the fallback tunnel device per-net.
All the code, that reference it already has the ip6_tnl_net pointer,
so s/ip6_fb_tnl_dev/ip6n->fb_tnl_dev/ and move creation/releasing
code into net init/exit ops.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:02 -07:00
Pavel Emelyanov 8704ca7e91 [IP6TUNNEL]: Use proper net in hash-lookup functions.
Calls to ip6_tnl_lookup were stubbed with init_net - give them
a proper one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:43 -07:00
Pavel Emelyanov 2dd02c897d [IP6TUNNEL]: Add (ip6_tnl_)net argument to some calls.
Hashes and fallback device used in them will be per-net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:23 -07:00
Pavel Emelyanov 13eeb8e92c [IP6TUNNEL]: Introduce empty ip6_tnl_net structure and net ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:02 -07:00
Harvey Harrison 0dc47877a3 net: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 20:47:47 -08:00
Daniel Lezcano 4591db4f37 [NETNS][IPV6] route6 - add netns parameter to ip6_route_output
Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:48:10 -08:00
Daniel Lezcano 606a2b4862 [NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_lookup
Add a network namespace parameter to rt6_lookup().

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:45:59 -08:00
Pavel Emelyanov b37d428b24 [INET]: Don't create tunnels with '%' in name.
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace.  Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.

Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.

This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 23:51:04 -08:00
Pavel Emelyanov 34cc7ba639 [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
  the generic code noticed the '%' in the name and invokes
  dev_alloc_name() to fully resolve the name.  -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:19:20 -08:00
Denis V. Lunev 9937ded8e4 [IPV6]: dst_entry leak in ip4ip6_err. (resend)
The result of the ip_route_output is not assigned to skb. This means that
- it is leaked
- possible OOPS below dereferrencing skb->dst
- no ICMP message for this case

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:49:36 -08:00
Denis V. Lunev f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Daniel Lezcano bfeade0870 [NETNS][IPV6]: inet6_addr - check ipv6 address per namespace
When a new address is added, we must check if the new address does not
already exists.  This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:44 -08:00
Herbert Xu ef76bc23ef [IPV6]: Add ip6_local_out
Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so.  They also share the same output function dst_output.

This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:47 -08:00
Herbert Xu 29bb43b4ec [INET]: Give outer DSCP directly to ip*_copy_dscp
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Chuck Lever c2636b4d9e [NET]: Treat the sign of the result of skb_headroom() consistently
In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer.  Make
the comparisons consistent and correct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23 21:27:55 -07:00
Ralf Baechle 10d024c1b2 [NET]: Nuke SET_MODULE_OWNER macro.
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it.  The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:13 -07:00
Eric W. Biederman 881d966b48 [NET]: Make the device list and device lookups per namespace.
This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:10 -07:00
Al Viro 704eae1f32 ip6_tunnel - endianness annotations
Convert rel_info to host-endian before calling ip6_tnl_err().
The things become much more straightforward that way.
The key observation (and the reason why that code actually
worked) is that after ip6_tnl_err() we either immediately
bailed out or had rel_info set to 0 or had it set to host-endian
and guaranteed to hit
(rel_type == ICMP_DEST_UNREACH && rel_code == ICMP_FRAG_NEEDED)
case.  So inconsistent endianness didn't really lead to bugs,
but it had been subtle and prone to breakage.  New variant is
saner and obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:11:56 -07:00