Commit Graph

1260 Commits

Author SHA1 Message Date
Ilpo Järvinen 50eb431d6e [IPV6] route: kill some bloat
net/ipv6/route.c:
  ip6_pkt_prohibit_out | -130
  ip6_pkt_discard      | -261
  ip6_pkt_discard_out  | -130
  ip6_pkt_prohibit     | -261
 4 functions changed, 782 bytes removed, diff: -782

net/ipv6/route.c:
  ip6_pkt_drop | +300
 1 function changed, 300 bytes added, diff: +300

net/ipv6/route.o:
 5 functions changed, 300 bytes added, 782 bytes removed, diff: -482

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:49 -08:00
Daniel Lezcano 389f661224 [NETNS][IPV6]: inet6_addr - make ipv6_chk_home_addr namespace aware
Looks if the address is belonging to the network namespace, otherwise
discard the address for the check.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:46 -08:00
Daniel Lezcano 1cab3da6be [NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace aware
The inet6_addr_lst is browsed taking into account the network
namespace specified as parameter. If an address does not belong
to the specified namespace, it is ignored.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano 06bfe655e7 [NETNS][IPV6]: inet6_addr - ipv6_chk_same_addr namespace aware
This patch makes ipv6_chk_same_addr function to be aware of the
network namespace. The addresses not belonging to the network
namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano bfeade0870 [NETNS][IPV6]: inet6_addr - check ipv6 address per namespace
When a new address is added, we must check if the new address does not
already exists.  This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:44 -08:00
Daniel Lezcano 3c40090a0f [NETNS][IPV6]: inet6_addr - isolate inet6 addresses from proc file
Make /proc/net/if_inet6 show only inet6 addresses belonging to the
namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:43 -08:00
Pavel Emelyanov e186932b3d [NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers
Actually the net->ipv6.devconf_all can be used in a few places,
but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently
in the namespace we should use the per-net devconf_all in the
sysctl "forwarding" handler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:41 -08:00
Pavel Emelyanov 441fc2a239 [NETNS]: Use the per-net ipv6_devconf_dflt
All its users are in net/ipv6/addrconf.c's sysctl handlers.
Since they already have the struct net to get from, the
per-net ipv6_devconf_dflt can already be used.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov e0da5a480c [NETNS]: Create ipv6 devconf-s for namespaces
This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov bff16c2f99 [NETNS]: Make the ctl-tables per-namespace
This includes passing the net to __addrconf_sysctl_register
and saving this on the ctl_table->extra2 to be used in
handlers (those, needing it).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:39 -08:00
Pavel Emelyanov 9589731220 [NETNS]: Make the __addrconf_sysctl_register return an error
This error code will be needed to abort the namespace
creation if needed.

Probably, this is to be checked when a new device is
created (currently it is ignored).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Pavel Emelyanov 408c4768cd [NETNS]: Clean out the ipv6-related sysctls creation/destruction
The addrconf sysctls and neigh sysctls are registered and
unregistered always in pairs, so they can be joined into
one (well, two) functions, that accept the struct inet6_dev
and do all the job.

This also get rids of unneeded ifdefs inside the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Denis V. Lunev 4d1169c1e7 [NETNS]: Add netns to nl_info structure.
nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:29 -08:00
Eric W. Biederman 6b175b26c1 [NETNS]: Add netns parameter to inet_(dev_)add_type.
The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.

The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev 868d13ac81 [NETNS]: Pass fib_rules_ops into default_pref method.
fib_rules_ops contains operations and the list of configured rules. ops will
become per/namespace soon, so we need them to be known in the default_pref
callback.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:22 -08:00
Denis V. Lunev f8c26b8d58 [NETNS]: Add netns parameter to fib_rules_(un)register.
The patch extends the different fib rules API in order to pass the
network namespace pointer. That will allow to access the different
tables from a namespace relative object. As usual, the pointer to the
init_net variable is passed as parameter so we don't break the
network.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:21 -08:00
Daniel Lezcano 41a76906b3 [NETNS][IPV6]: Make icmpv6_time sysctl per namespace.
This patch moves the icmpv6_time sysctl to the network namespace
structure.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano 4990509f19 [NETNS][IPV6]: Make sysctls route per namespace.
All the sysctl concerning the routes are moved to the network
namespace structure. A helper function is called to initialize the
variables.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano 7c76509d0d [NETNS][IPV6]: Make mld_max_msf readonly in other namespaces.
The mld_max_msf protects the system with a maximum allowed multicast
source filters. Making this variable per namespace can be potentially
an problem if someone inside a namespace set it to a big value, that
will impact the whole system including other namespaces.

I don't see any benefits to have it per namespace for now, so in order
to keep a directory entry in a newly created namespace, I make it
read-only when we are not in the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:19 -08:00
Daniel Lezcano e71e0349eb [NETNS][IPV6]: Make ip6_frags per namespace.
The ip6_frags is moved to the network namespace structure.  Because
there can be multiple instances of the network namespaces, and the
ip6_frags is no longer a global static variable, a helper function has
been added to facilitate the initialization of the variables.

Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano 99bc9c4e45 [NETNS][IPV6]: Make bindv6only sysctl per namespace.
This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano 760f2d0186 [NETNS][IPV6]: Make multiple instance of sysctl tables.
Each network namespace wants its own set of sysctl value, eg. we
should not be able from a namespace to set a sysctl value for another
namespace , especially for the initial network namespace.

This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the
"template" word to notify the developper the table is cloned.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:17 -08:00
Daniel Lezcano 89918fc270 [NETNS][IPV6]: Make the ipv6 sysctl to be a netns subsystem.
The initialization of the sysctl for the ipv6 protocol is changed to a
network namespace subsystem. That means when a new network namespace
is created the initialization function for the sysctl will be called.

That do not change the behavior of the sysctl in case of the kernel
with the network namespace disabled.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:16 -08:00
Daniel Lezcano 81c1c17804 [NETNS][IPV6]: Make a subsystem for af_inet6.
This patch add a network namespace subsystem for the af_inet6 module.
It does nothing right now, but one of its purpose is to receive the
different variables for sysctl in order to initialize them.

When the sysctl variable will be moved to the network namespace
structure, they will be no longer initialized as global static
variables, so we must find a place to initialize them. Because the
sysctl can be disabled, it has no sense to store them in the
sysctl_net_ipv6 file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:15 -08:00
Daniel Lezcano 291480c09a [NETNS][IPV6]: Make ipv6_sysctl_register to return a value.
This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error and
catch it from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:14 -08:00
Pavel Emelyanov 3d7cc2ba62 [NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules
This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is
very small and simple.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:10 -08:00
Eric Dumazet 65f7651788 [NET]: prot_inuse cleanups and optimizations
1) Cleanups (all functions are prefixed by sock_prot_inuse)

sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse()       -> sock_prot_inuse_get()

New functions :

sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.

This saves 1372 bytes on i386/SMP and some cpu cycles.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:36 -08:00
Eric Dumazet 9a429c4983 [NET]: Add some acquires/releases sparse annotations.
Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:31 -08:00
Hideo Aoki 95766fff6b [UDP]: Add memory accounting.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:19 -08:00
Gui Jianfeng a06b494b61 [IPV6]: Remove useless code from fib6_del_route().
There are useless codes in fib6_del_route(). The following patch has
been tested, every thing looks fine, as usual.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:17 -08:00
Herbert Xu 9dd3245a2a [IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_input
Let's nip the code duplication in the bud :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu 0883ae0e55 [IPSEC]: Fix transport-mode async resume on intput without netfilter
When netfilter is off the transport-mode async resumption doesn't work
because we don't push back the IP header.  This patch fixes that by
moving most of the code outside of ifdef NETFILTER since the only part
that's not common is the short-circuit in the protocol handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Eric W. Biederman 426b5303eb [NETNS]: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved.  At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.

However a couple things turned up while I was reading through the
code.  The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.

I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces.  But this appears good
enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a
limiter.  The default parameter would be straight forward to take care
of.  However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel.  The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup.  So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:03 -08:00
Paul Moore afeb14b490 [XFRM]: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
YOSHIFUJI Hideaki 9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
Masahide NAKAMURA 0aa647746e [XFRM]: Support to increment packet dropping statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA 9473e1f631 [XFRM] MIPv6: Fix to input RO state correctly.
Disable spin_lock during xfrm_type.input() function.
Follow design as IPsec inbound does.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:37 -08:00
Masahide NAKAMURA a1b051405b [XFRM] IPv6: Fix dst/routing check at transformation.
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Herbert Xu 195ad6a3ac [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
It appears that I've managed to create two different functions both
called xfrm6_tunnel_output.  This is because we have the plain tunnel
encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
which lives in the files xfrmX_mode_tunnel.c.

This patch renames functions from the latter to use the xfrmX_mode_tunnel
prefix to avoid name-space conflicts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Patrick McHardy 33b8e77605 [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option
The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter
options when disabled and provides defaults (M) that should allow to
run a distribution firewall without further thinking.

Defaults to 'y' to avoid breaking current configurations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:12 -08:00
Jan Engelhardt e79ec50b95 [NETFILTER]: Parenthesize macro parameters
Parenthesize macro parameters.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:08 -08:00
Patrick McHardy 1e796fda00 [NETFILTER]: constify nf_afinfo
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:05 -08:00
Patrick McHardy 7b2f9631e7 [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:59 -08:00
Patrick McHardy f01ffbd6e7 [NETFILTER]: nf_log: move logging stuff to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:58 -08:00
Patrick McHardy 77236b6e33 [NETFILTER]: ctnetlink: use netlink attribute helpers
Use NLA_PUT_BE32, nla_get_be32() etc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:54 -08:00
Patrick McHardy da4d0f6b3d [NETFILTER]: ip6_tables: use raw_smp_processor_id() in do_add_counters()
Use raw_smp_processor_id() in do_add_counters() as in ip_tables.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:41 -08:00
Patrick McHardy b5dd674b2a [NETFILTER]: ip6_tables: fix stack leagage
Fix leakage of local variable on stack. This already got fixed in
ip_tables silently by the compat patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:40 -08:00
Patrick McHardy c9d8fe1317 [NETFILTER]: {ip,ip6}_tables: fix format strings
Use %zu for sizeof() and remove casts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy 9c54795950 [NETFILTER]: {ip,ip6}_tables: reformat to eliminate differences
Reformat ip_tables.c and ip6_tables.c in order to eliminate non-functional
differences and minimize diff output.

This allows to get a view of the real differences using:

sed -e 's/IP6T/IPT/g' \
    -e 's/IP6/IP/g' \
    -e 's/INET6/INET/g' \
    -e 's/ip6t/ipt/g' \
    -e 's/ip6/ip/g' \
    -e 's/ipv6/ip/g' \
    -e 's/icmp6/icmp/g' \
    net/ipv6/netfilter/ip6_tables.c | \
    diff -wup /dev/stdin net/ipv4/netfilter/ip_tables.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy 3bc3fe5eed [NETFILTER]: ip6_tables: add compat support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:36 -08:00
Patrick McHardy d924357c50 [NETFILTER]: ip6_tables: resync get_entries() with ip_tables
Resync get_entries() with ip_tables.c by moving the checks from the
setsockopt handler to the function itself.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy 433665c9d1 [NETFILTER]: ip6_tables: move IP6T_SO_GET_INFO handling to seperate function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy ed1a6f5e77 [NETFILTER]: ip6_tables: move counter allocation to seperate function
More resyncing with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:34 -08:00
Patrick McHardy 3b84e92b0d [NETFILTER]: ip6_tables: use vmalloc_node()
Consistently use vmalloc_node for all counter allocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:33 -08:00
Patrick McHardy f173c8a1f2 [NETFILTER]: ip6_tables: move entry, match and target checks to seperate functions
Resync with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Patrick McHardy 72f36ec14f [NETFILTER]: ip6_tables: kill a few useless defines/forward declarations
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Herbert Xu 9055e051b8 [UDP]: Move udp_stats_in6 into net/ipv4/udp.c
Now that external users may increment the counters directly, we need
to ensure that udp_stats_in6 is always available.  Otherwise we'd
either have to requrie the external users to be built as modules or
ipv6 to be built-in.

This isn't too bad because udp_stats_in6 is just a pair of pointers
plus an EXPORT, e.g., just 40 (16 + 24) bytes on x86-64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:06 -08:00
Michal Schmidt 8a4a50f98b [IPV6] sit: Rebinding of SIT tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a SIT tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Denis V. Lunev 528c4ceb42 [IPV6]: Always pass a valid nl_info to inet6_rt_notify.
This makes the code in the inet6_rt_notify more straightforward and provides
groud for namespace passing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:55 -08:00
Daniel Lezcano 09f7709f49 [IPV6]: fix section mismatch warnings
Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:46 -08:00
Herbert Xu aebcf82c1f [IPSEC]: Do not let packets pass when ICMP flag is off
This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu bb72845e69 [IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT
This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:42 -08:00
Herbert Xu 7233b9f33e [IPSEC]: Fix reversed ICMP6 policy check
The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:41 -08:00
Herbert Xu 8b7817f3a9 [IPSEC]: Add ICMP host relookup support
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu d5422efe68 [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:22 -08:00
Daniel Lezcano 7f4e4868f3 [IPV6]: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano 87c3efbfdd [IPV6]: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:12 -08:00
Daniel Lezcano 853cbbaaa4 [IPV6]: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:11 -08:00
Daniel Lezcano 248b238dc9 [IPV6]: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Daniel Lezcano 0a3e78ac2c [IPV6]: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
YOSHIFUJI Hideaki c69bce20dd [NET]: Remove unused "mibalign" argument for snmp_mib_init().
With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:02 -08:00
Daniel Lezcano 7e5449c215 [IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano c35b7e72cd [IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano 75314fb383 [IPV6]: create route6 proc init-fini functions
Make the proc creation/destruction to be a separate function. That
allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function
and make them more readable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:58 -08:00
Daniel Lezcano f845ab6b7d [IPV6] route6/fib6: Don't panic a kmem_cache_create.
If the kmem_cache_creation fails, the kernel will panic. It is
acceptable if the system is booting, but if the ipv6 protocol is
compiled as a module and it is loaded after the system has booted, do
we want to panic instead of just failing to initialize the protocol ?

The init function is now returning an error and this one is checked
for protocol initialization. So the ipv6 protocol will safely fails.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:48 -08:00
Daniel Lezcano e2fddf5e96 [IPV6]: Make af_inet6 to check ip6_route_init return value.
The af_inet6 initialization function does not check the return code of
the route initilization, so if something goes wrong, the protocol
initialization will continue anyway.  This patch takes into account
the modification made in the different route's initialization
subroutines to check the return value and to make the protocol
initialization to fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano 433d49c3bb [IPV6]: Make ip6_route_init to return an error code.
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano 9eb87f3f7e [IPV6]: Make fib6_rules_init to return an error code.
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:46 -08:00
Daniel Lezcano 0013cabab3 [IPV6]: Make xfrm6_init to return an error code.
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Daniel Lezcano d63bddbe90 [IPV6]: Make fib6_init to return an error code.
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Denis V. Lunev 5a3e55d68e [NET]: Multiple namespaces in the all dst_ifdown routines.
Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:44 -08:00
Herbert Xu a59322be07 [UDP]: Only increment counter on first peek/recv
The previous move of the the UDP inDatagrams counter caused each
peek of the same packet to be counted separately.  This may be
undesirable.

This patch fixes this by adding a bit to sk_buff to record whether
this packet has already been seen through skb_recv_datagram.  We
then only increment the counter when the packet is seen for the
first time.

The only dodgy part is the fact that skb_recv_datagram doesn't have
a good way of returning this new bit of information.  So I've added
a new function __skb_recv_datagram that does return this and made
skb_recv_datagram a wrapper around it.

The plan is to eventually replace all uses of skb_recv_datagram with
this new function at which time it can be renamed its proper name.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:34 -08:00
Herbert Xu 1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Herbert Xu 27ab256864 [UDP]: Avoid repeated counting of checksum errors due to peeking
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.

This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov c8fecf2242 [IPV6]: Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
The only difference in this case is that updating all.forwarding
causes the update in default.forwarding when done via proc, but
not via the system call.

Besides, this consolidates a good portion of code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov 4d43b78ac2 [IPV6]: Use sysctl paths to register ipv6 sysctl tables
I have already done this for core, ipv4 and tr tables, so repeat this
for the ipv6 ones.

This makes the ipv6.ko smaller and creates the ground needed for net
namespaces support in ipv6.ko ssctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:30 -08:00
Pavel Emelyanov 4a61b586cd [IPV6]: Make the ipv6/sysctl_net_ipv6.c compilation cleaner
Since this file is entirely enclosed with the
#ifdef CONFIG_SYSCTL/#endif pair, it's OK to move this
CONFIG_ into a Makefile.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Patrick McHardy 4b3d15ef4a [NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict
Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:15 -08:00
Patrick McHardy 02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy 7a6c6653b3 [NETFILTER]: ip6_queue: resync dev-index based flushing
Resync dev_cmp to take bridge devices into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy 171b7fc4fc [NETFILTER]: ip6_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:13 -08:00
Patrick McHardy 0ac41e8146 [NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry
Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:11 -08:00
Patrick McHardy c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Patrick McHardy 1999414a4e [NETFILTER]: Mark hooks __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy 1841a4c7ae [NETFILTER]: nf_ct_h323: remove ipv6 module dependency
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.

Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Maciej Soltysiak 17dfc93f6d [NETFILTER]: {ip,ip6}t_LOG: log GID
Log GID in addition to UID

Signed-off-by: Maciej Soltysiak <maciej.soltysiak@ae.poznan.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Jan Engelhardt 4c37799ccf [NETFILTER]: Use lowercase names for matches in Kconfig
Unify netfilter match kconfig descriptions

Consistently use lowercase for matches in kconfig one-line
descriptions and name the match module.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:57 -08:00
Jan Engelhardt 0265ab44ba [NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner
xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match
on socket (non-)existence.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:55 -08:00
Patrick McHardy 9e67d5a739 [NETFILTER]: x_tables: remove obsolete overflow check
We're not multiplying the size with the number of CPUs anymore, so the
check is obsolete.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Eric Dumazet 259d4e41f3 [NETFILTER]: x_tables: struct xt_table_info diet
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids

This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )

In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.

This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Jan Engelhardt d3c5ee6d54 [NETFILTER]: x_tables: consistent and unique symbol names
Give all Netfilter modules consistent and unique symbol names.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:53 -08:00
Herbert Xu 2fcb45b6b8 [IPSEC]: Use the correct family for input state lookup
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb->cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:49 -08:00
Wang Chen b2bf1e2659 [UDP]: Clean up for IS_UDPLITE macro
Since we have macro IS_UDPLITE, we can use it.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:48 -08:00
Wang Chen cb75994ec3 [UDP]: Defer InDataGrams increment until recvmsg() does checksum
Thanks dave, herbert, gerrit, andi and other people for your
discussion about this problem.

UdpInDatagrams can be confusing because it counts packets that
might be dropped later.
Move UdpInDatagrams into recvmsg() as allowed by the RFC.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:47 -08:00
Pavel Emelyanov 1dab62226d [IPV6]: Use ctl paths to register addrconf sysctls
This looks very much like the patch for ipv4's devinet.

This is also intended to help us with the net namespaces
and saves the ipv6.ko size by ~320 bytes.

The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov f52295a9c5 [IPV6]: Unify and cleanup calls to addrconf_sysctl_register
Currently this call is (ab)used similar to devinet one - it
registers sysctls for devices and for the "default" confs, while
the "all" sysctls are registered separately. But unlike its
devinet brother, the passed inet6_device is needed.

The fix is to make a __addrconf_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)

The original addrconf_sysctl_register() calls the introduced
function, passing the inet6_device, device name and ifindex (to
be used as procname and ctl_name) into it.

Thanks to Herbert again for pointing out, that we can shrink the
argument list to 1 :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov f68635e627 [IPV6]: Cleanup the addconf_sysctl_register
This only includes fixing the space-indented lines and
removing one unneeded else after the goto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:26 -08:00
Fred L. Templin c7dc89c0ac [IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.

This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:09 -08:00
Pavel Emelyanov f126734735 [IPV6]: Correct the comment concerning inetsw6 table
It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c,
but one comment was not fixed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Pavel Emelyanov 42a73808ed [RAW]: Consolidate proc interface.
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:32 -08:00
Pavel Emelyanov ab70768ec7 [RAW]: Consolidate proto->unhash callback
Same as the ->hash one, this is easily consolidated.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov 65b4c50b47 [RAW]: Consolidate proto->hash callback
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov b673e4dfc8 [RAW]: Introduce raw_hashinfo structure
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:30 -08:00
Pavel Emelyanov 69d6da0b0f [IPv6] RAW: Compact the API for the kernel
Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.

Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:29 -08:00
Denis V. Lunev 97c53cacf0 [NET]: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:25 -08:00
Denis V. Lunev b854272b3c [NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)
Before I can enable rtnetlink to work in all network namespaces I need
to be certain that something won't break.  So this patch deliberately
disables all of the rtnletlink methods in everything except the
initial network namespace.  After the methods have been audited this
extra check can be disabled.

Changes from v1:
- added IPv6 addrlabel protection

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-28 14:54:24 -08:00
YOSHIFUJI Hideaki 2a8cc6c890 [IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki 303065a854 [IPV6] ADDRCONF: Allow address selection policy with ifindex.
This patch allows ifindex to be a key for address selection policy table.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:57 -08:00
YOSHIFUJI Hideaki c1ee656ccb [IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label().
This patch renames ipv6_saddr_label() to ipv6_addr_label() because
address label is used for both of source address and destination
address.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:56 -08:00
David S. Miller 294b4baf29 [IPSEC]: Kill afinfo->nf_post_routing
After changeset:

	[NETFILTER]: Introduce NF_INET_ hook values

It always evaluates to NF_INET_POST_ROUTING.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Patrick McHardy 6e23ae2a48 [NETFILTER]: Introduce NF_INET_ hook values
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Herbert Xu 1bf06cd2e3 [IPSEC]: Add async resume support on input
This patch adds support for async resumptions on input.  To do so, the
transform would return -EINPROGRESS and subsequently invoke the
function xfrm_input_resume to resume processing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:54 -08:00
Herbert Xu 60d5fcfb19 [IPSEC]: Remove nhoff from xfrm_input
The nhoff field isn't actually necessary in xfrm_input.  For tunnel
mode transforms we now throw away the output IP header so it makes no
sense to fill in the nexthdr field.  For transport mode we can now let
the function transport_finish do the setting and it knows where the
nexthdr field is.

The only other thing that needs the nexthdr field to be set is the
header extraction code.  However, we can simply move the protocol
extraction out of the generic header extraction.

We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:53 -08:00
Herbert Xu d26f398400 [IPSEC]: Make x->lastused an unsigned long
Currently x->lastused is u64 which means that it cannot be
read/written atomically on all architectures.  David Miller observed
that the value stored in it is only an unsigned long which is always
atomic.

So based on his suggestion this patch changes the internal
representation from u64 to unsigned long while the user-interface
still refers to it as u64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:52 -08:00
Herbert Xu 0ebea8ef35 [IPSEC]: Move state lock into x->type->input
This patch releases the lock on the state before calling
x->type->input.  It also adds the lock to the spots where they're
currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:52 -08:00
Herbert Xu 668dc8af31 [IPSEC]: Move integrity stat collection into xfrm_input
Similar to the moving out of the replay processing on the output, this
patch moves the integrity stat collectin from x->type->input into
xfrm_input.

This would eventually allow transforms such as AH/ESP to be lockless.

The error value EBADMSG (currently unused in the crypto layer) is used
to indicate a failed integrity check.  In future this error can be
directly returned by the crypto layer once we switch to aead
algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:51 -08:00
Herbert Xu 716062fd4c [IPSEC]: Merge most of the input path
As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common input code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:50 -08:00
Herbert Xu 862b82c6f9 [IPSEC]: Merge most of the output path
As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:48 -08:00
Herbert Xu ef76bc23ef [IPV6]: Add ip6_local_out
Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so.  They also share the same output function dst_output.

This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:47 -08:00
Herbert Xu 227620e295 [IPSEC]: Separate inner/outer mode processing on input
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the input path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the inner mode
input functions to modify the inner IP header.  In this way the input
function no longer has to know about the outer address family.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:46 -08:00
Herbert Xu 36cf9acf93 [IPSEC]: Separate inner/outer mode processing on output
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the output path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the outer mode
output functions to write the outer IP header.  In this way the output
function no longer has to know about the inner address family.

Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu 29bb43b4ec [INET]: Give outer DSCP directly to ip*_copy_dscp
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu a2deb6d26f [IPSEC]: Move x->outer_mode->output out of locked section
RO mode is the only one that requires a locked output function.  So
it's easier to move the lock into that function rather than requiring
everyone else to run under the lock.

In particular, this allows us to move the size check into the output
function without causing a potential dead-lock should the ICMP error
somehow hit the same SA on transmission.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:44 -08:00
Herbert Xu e40b328615 [IPSEC]: Forbid BEET + ipcomp for now
While BEET can theoretically work with IPComp the current code can't
do that because it tries to construct a BEET mode tunnel type which
doesn't (and cannot) exist.  In fact as it is it won't even attach a
tunnel object at all for BEET which is bogus.

To support this fully we'd also need to change the policy checks on
input to recognise a plain tunnel as a legal variant of an optional
BEET transform.

This patch simply fails such constructions for now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:43 -08:00
Herbert Xu 25ee3286dc [IPSEC]: Merge common code into xfrm_bundle_create
Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common.  This patch extracts that logic and puts it into
xfrm_bundle_create.  The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:43 -08:00
Herbert Xu 66cdb3ca27 [IPSEC]: Move flow construction into xfrm_dst_lookup
This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function.  It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:42 -08:00
Herbert Xu f04e7e8d7f [IPSEC]: Replace x->type->{local,remote}_addr with flags
The functions local_addr and remote_addr are more than what they're
needed for.  The same thing can be done easily with flags on the type
object.  This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:41 -08:00
Herbert Xu fff6938880 [IPSEC]: Make sure idev is consistent with dev in xfrm_dst
Previously we took the device from the bottom route and idev from the
top route.  This is bad because idev may well point to a different
device.  This patch changes it so that we get the idev from the device
directly.

It also makes it an error if either dev or idev is NULL.  This is
consistent with the rest of the routing code which also treats these
cases as errors.

I've removed the err initialisation in xfrm6_policy.c because it
achieves no purpose and hid a bug when an initial version of this
patch neglected to set err to -ENODEV (fortunately the IPv4 version
warned about it).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:40 -08:00
Herbert Xu 45ff5a3f9a [IPSEC]: Set dst->input to dst_discard
The input function should never be invoked on IPsec dst objects.  This
is because we don't apply IPsec on input until after we've made the
routing decision.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:40 -08:00
Herbert Xu 8ce68ceb55 [IPSEC]: Only set neighbour on top xfrm dst
The neighbour field is only used by dst_confirm which only ever happens on
the top-most xfrm dst.  So it's a waste to duplicate for every other xfrm
dst.  This patch moves its setting out of the loop so that only the top one
gets set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:39 -08:00
Herbert Xu 352e512c32 [NET]: Eliminate duplicate copies of dst_discard
We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.

This patch deletes all of them except dst_discard and points all the
users to it.

The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors.  So they should
either all return errors or all return zero.  For now I've stuck with
the majority and picked zero as the return value.

It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Herbert Xu b4ce92775c [IPV6]: Move nfheader_len into rt6_info
The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Herbert Xu 0148894223 [IPV6]: Only set nfheader_len for top xfrm dst
We only need to set nfheader_len in the top xfrm dst.  This is because
we only ever read the nfheader_len from the top xfrm dst.

It is also easier to count nfheader_len as part of header_len which
then lets us remove the ugly wrapper functions for incrementing and
decrementing header lengths in xfrm6_policy.c.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Pavel Emelyanov b24b8a247f [NET]: Convert init_timer into setup_timer
Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Wang Chen a92aa318b4 [IPV6]: Add raw6 drops counter.
Add raw drops counter for IPv6 in /proc/net/raw6 .

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:34 -08:00
Jens Axboe a0974dd3da [TCP] splice: add tcp_splice_read() to IPV6
Thanks to YOSHIFUJI Hideaki for the hint!

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:32 -08:00
Rolf Manderscheid a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Herbert Xu f945fa7ad9 [INET]: Fix truesize setting in ip_append_data
As it is ip_append_data only counts page fragments to the skb that
allocated it.  As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
  noticed by Takahiro Yasui.  -DaveM ]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:43 -08:00