Commit Graph

8486 Commits

Author SHA1 Message Date
Pavel Emelyanov 0a826406d4 [IPIP]: Allow to create IPIP tunnels in net namespaces.
Set the proper net before calling register_netdev and disable 
the tunnel device netns changing.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:06:18 -07:00
Pavel Emelyanov b99f0152e5 [IPIP]: Use proper net in (mostly) routing calls.
There are some ip_route_output_key() calls in there that require
a proper net so give one to them.

Besides - give a proper net to a single __get_dev_by_index call 
in ipip_tunnel_bind_dev().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:57 -07:00
Pavel Emelyanov 44d3c299dc [IPIP]: Make tunnels hashes per net.
Either net or ipip_net already exists in all the required 
places, so just use one.

Besides, tune net_init and net_exit calls to respectively 
initialize the hashes and destroy devices.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:32 -07:00
Pavel Emelyanov cec3ffae1a [IPIP]: Use proper net in hash-lookup functions.
This is the part#2 of the previous patch - get the proper
net for these functions.

I make it in a separate patch, so that this change does not
get lost in a large previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:03 -07:00
Pavel Emelyanov b9fae5c913 [IPIP]: Add net/ipip_net argument to some functions.
The hashes of tunnels will be per-net too, so prepare all the 
functions that uses them for this change by adding an argument.

Use init_net temporarily in places, where the net does not exist
explicitly yet.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:04:35 -07:00
Pavel Emelyanov b9855c54da [IPIP]: Make the fallback tunnel device per-net.
Create on in ipip_init_net(), use it all over the code (the
proper place to get the net from already exists) and destroy
in ipip_net_exit().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:04:13 -07:00
Pavel Emelyanov 10dc4c7bb7 [IPIP]: Introduce empty ipip_net structure and net init/exit ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:03:13 -07:00
Pavel Emelyanov 30688a9a3e [VLAN]: Handle vlan devices net namespace changing.
When van device is moved to another namespace proc files,
related to this device, should also change one.

Use the netdev REGISTER and UNREGISTER event handlers for this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:57:01 -07:00
Pavel Emelyanov 65d292a2ef [VLAN]: Allow vlan devices registration in net namespaces.
This one is similar to what I've done for TUN - set the proper
net after device allocation and clean VLANs on net exit (use the
rtnl_kill_links helper finally).

Plus, drop explicit init_net usage and net != &init_net checks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:56:37 -07:00
Pavel Emelyanov 7a17a2f79f [VLAN]: Make the vlan_name_type per-net.
This includes moving one on the struct vlan_net and
s/vlan_name_type/vn->name_type/ over the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:56:18 -07:00
Pavel Emelyanov 80de2d9821 [VLAN]: Make the /proc/net/vlan/conf file show per-net info.
It is created in a proper net, so make is show info, related
to this particular net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:52:24 -07:00
Pavel Emelyanov a59a8c1c86 [VLAN]: Create proc entries in the proper net.
The proc_vlan_dir and proc_vlan_conf migrate on the struct
vlan_net and their creation uses the struct net.

The devices' entries use the corresponding device's net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:51:51 -07:00
Pavel Emelyanov cd1c701432 [VLAN]: Add a net argument to proc init and cleanup calls.
All proc files will be created in each net, so prepare them for 
this change now, not to mess it with real creation patch.

The net != &init_net checks in them are for git-bisect sanity, 
but I will drop them soon.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:51:12 -07:00
Pavel Emelyanov d9ed0f0e2d [VLAN]: Introduce the vlan_net structure and init/exit net ops.
Unlike TUN, it is empty from the very beginning, and will 
be eventually populated later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:49:09 -07:00
Pavel Emelyanov a9fde26078 [VLAN]: Tag vlan_group_device with net device, not ifindex.
Currently vlan group is searched using one key - the ifindex.
We'll have to lookup the vlan_group by two keys - ifindex and
net. Turning the vlan_group lookup key to struct net_device
pointer will make this process easier.

Besides, this will eliminate one more place in the networking,
that assumes that indexes are unique in the kernel.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:48:04 -07:00
Pavel Emelyanov 669f87baab [RTNL]: Introduce the rtnl_kill_links helper.
This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:46:52 -07:00
Pavel Emelyanov 3a931a80cb [RTNL]: Relax for_each_netdev_safe in __rtnl_link_unregister.
Each potential list_del (happening from inside a ->dellink call)
is followed by goto restart, so there's no need in _safe iteration.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 00:45:56 -07:00
Ilpo Järvinen 17515408a1 [TCP]: Remove superflushious skb == write_queue_tail() check
Needed can only be more strict than what was checked by the
earlier common case check for non-tail skbs, thus
cwnd_len <= needed will never match in that case anyway.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 20:36:55 -07:00
Mandeep Singh Baines b131dd5d65 [ETHTOOL]: Add support for large eeproms
Currently, it is not possible to read/write to an eeprom larger than
128k in size because the buffer used for temporarily storing the
eeprom contents is allocated using kmalloc. kmalloc can only allocate
a maximum of 128k depending on architecture.

Modified ethtool_get/set_eeprom to only allocate a page of memory and
then copy the eeprom a page at a time.

Updated original patch as per suggestions from Joe Perches.

Signed-off-by: Mandeep Singh Baines <msb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:29:17 -07:00
Oliver Hartkopp 73e87e02ec CAN: use hrtimers in can-bcm protocol
Make use of hrtimers to support high resolution capabilities, when
provided by the system clocksource.

The conversion to hrtimers additionally discovered and solved an
unlikely race condition that has been reproduced under (unrealistic)
massive receive load, which can only be produced on vcan software devices.

[ Fix printf format warnings on 64-bit -DaveM ]

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:29:14 -07:00
Allan Stephens 85035568a9 [TIPC]: Enhance validation of format on incoming messages
This patch ensures that TIPC properly handles incoming messages
that have incorrect or unexpected formats.  Most significantly,
it now ensures that each sl_buff has at least as much data as
the message header indicates it should, and that the entire
message header is stored contiguously; this prevents TIPC from
accidentally accessing memory that is not part of the sk_buff.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:04:54 -07:00
Allan Stephens fe13dda2d2 [TIPC]: Force linearization of non-linear sk_buffs
This patch allows TIPC to process incoming messages that are
stored in a fragmented sk_buff, by forcing the linearization
of any such messages it receives.

Note: This is an interim solution to allow TIPC to operate with
Ethernet devices that generate non-linear buffers (such as the
gianfar driver), until such time as the rest of TIPC is enhanced
to handle sk_buffs with multiple data areas.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:03:23 -07:00
Allan Stephens bdc82bee43 [TIPC]: Use fast buffer cloning to improve performance
This patch causes TIPC to allocate fast clonable sk_buffs,
rather than standard ones.  This speeds up the cloning
operation done by the link code each time a message is sent
off-node.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:02:30 -07:00
Allan Stephens 11ecede787 [TIPC]: Remove redundant NULL check when discarding buffers
This patch eliminates a null pointer check when discarding a
TIPC message buffer, since kfree_skb() already handles this
situation.

Acknowledgements to Florian Westphal (fw@strlen.de> for
suggesting this enhancement.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 19:01:43 -07:00
Pavel Emelyanov dec827d174 [NETNS]: The generic per-net pointers.
Add the elastic array of void * pointer to the struct net.
The access rules are simple:

 1. register the ops with register_pernet_gen_device to get
    the id of your private pointer
 2. call net_assign_generic() to put the private data on the
    struct net (most preferably this should be done in the
    ->init callback of the ops registered)
 3. do not store any private reference on the net_generic array;
 4. do not change this pointer while the net is alive;
 5. use the net_generic() to get the pointer.

When adding a new pointer, I copy the old array, replace it
with a new one and schedule the old for kfree after an RCU
grace period.

Since the net_generic explores the net->gen array inside rcu
read section and once set the net->gen->ptr[x] pointer never 
changes, this grants us a safe access to generic pointers.

Quoting Paul: "... RCU is protecting -only- the net_generic 
structure that net_generic() is traversing, and the [pointer]
returned by net_generic() is protected by a reference counter 
in the upper-level struct net."

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:36:08 -07:00
Pavel Emelyanov c93cf61fd1 [NETNS]: The net-subsys IDs generator.
To make some per-net generic pointers, we need some way to address
them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
subsystems.

Addressing questions about potential checkpoint/restart problems: 
these IDs are "lite-offsets" within the net structure and are by no 
means supposed to be exported to the userspace.

Since it will be used in the nearest future by devices only (tun,
vlan, tunnels, bridge, etc), I make it resemble the functionality
of register_pernet_device().

The new ids is stored in the *id pointer _before_ calling the init
callback to make this id available in this callback.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:35:23 -07:00
Adrian Bunk 7ef3abd210 [IRDA]: Remove irlan_eth_send_gratuitous_arp()
Even kernel 2.2.26 (sic) already contains the
  #undef CONFIG_IRLAN_SEND_GRATUITOUS_ARP
with the comment "but for some reason the machine crashes if you use DHCP".

Either someone finally looks into this or it's simply time to remove 
this dead code.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:29:24 -07:00
Allan Stephens 0c3141e910 [TIPC]: Overhaul of socket locking logic
This patch modifies TIPC's socket code to follow the same approach
used by other protocols.  This change eliminates the need for a
mutex in the TIPC-specific portion of the socket protocol data
structure -- in its place, the standard Linux socket backlog queue
and associated locking routines are utilized.  These changes fix
a long-standing receive queue bug on SMP systems, and also enable
individual read and write threads to utilize a socket without
unnecessarily interfering with each other.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:22:02 -07:00
Allan Stephens b89741a0cc [TIPC]: Cosmetic changes to TIPC connect() code
This patch fixes TIPC's connect routine to conform to Linux
kernel style norms of indentation, line length, etc.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:20:37 -07:00
Allan Stephens 4934c69a38 [TIPC]: Add error check to detect non-blocking form of connect()
This patch causes TIPC to return an error indication if the non-
blocking form of connect() is requested (which TIPC does not yet
support).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:16:19 -07:00
Allan Stephens 1819b83718 [TIPC]: Correct "off by 1" error in socket queue limit enforcement
This patch fixes a bug that allowed TIPC to queue 1 more message
than allowed by the socket receive queue threshold limits.  The
patch also improves the threshold code's logic and naming to help
prevent this sort of error from recurring in the future.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:15:50 -07:00
Allan Stephens 7a8036c2b9 [TIPC]: Ignore message padding when receiving stream data
This patch ensures that padding bytes appearing at the end of
an incoming TIPC message are not returned as valid stream data.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:15:15 -07:00
Allan Stephens a198d3a200 [TIPC]: Allow stream receive to read from multiple TIPC messages
This patch allows a stream socket to receive data from multiple
TIPC messages in its receive queue, without requiring the use of
the MSG_WAITALL flag.

Acknowledgements to Florian Westphal <fw-tipc@strlen.de> for
identifying this issue and suggesting how to correct it.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:07:15 -07:00
Allan Stephens 990098068f [TIPC]: Skip connection flow control in connectionless sockets
This patch optimizes the receive path for SOCK_DGRAM and SOCK_RDM
messages by skipping over code that handles connection-based flow
control.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 00:06:12 -07:00
Denis V. Lunev 2c8dd11636 [XFRM]: Compilation warnings in xfrm_user.c.
When CONFIG_SECURITY_NETWORK_XFRM is undefined the following warnings appears:
net/xfrm/xfrm_user.c: In function 'xfrm_add_pol_expire':
net/xfrm/xfrm_user.c:1576: warning: 'ctx' may be used uninitialized in this function
net/xfrm/xfrm_user.c: In function 'xfrm_get_policy':
net/xfrm/xfrm_user.c:1340: warning: 'ctx' may be used uninitialized in this function
(security_xfrm_policy_alloc is noop for the case).

It seems that they are result of the commit
03e1ad7b5d ("LSM: Make the Labeled IPsec
hooks more stack friendly")

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 14:47:48 -07:00
YOSHIFUJI Hideaki 569508c964 [TCP]: Format addresses appropriately in debug messages.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:36 -07:00
YOSHIFUJI Hideaki a7d632b6b4 [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:00 -07:00
David S. Miller 334f8b2afd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26 2008-04-14 03:50:43 -07:00
Pavel Emelyanov 7477fd2e6b [SOCK]: Add some notes about per-bind-bucket sock lookup.
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.

Can we please add a comment about why we do not need one?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 02:42:27 -07:00
Pavel Emelyanov 13f51d82ac [DCCP]: Fix comment about control sockets.
These sockets now have a bit other names and are no longer global.

Shame on me, I haven't provided a good comment for this when
sending DCCP netnsization patches.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 02:38:45 -07:00
David S. Miller df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Patrick McHardy ef1a5a50bb [NETFILTER]: nf_conntrack: fix incorrect check for expectations
The expectation classes changed help->expectations to an array,
fix use as scalar value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:21:01 +02:00
Peter Warasin e7bfd0a1a6 [NETFILTER]: bridge: add ebt_nflog watcher
This patch adds the ebtables nflog watcher to the kernel in order to
allow ebtables log through the nfnetlink_log backend.

Signed-off-by: Peter Warasin <peter@endian.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt 3c9fba656a [NETFILTER]: nf_conntrack: replace NF_CT_DUMP_TUPLE macro indrection by function call
Directly call IPv4 and IPv6 variants where the address family is
easily known.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt 12c33aa20e [NETFILTER]: nf_conntrack: const annotations in nf_conntrack_sctp, nf_nat_proto_gre
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt f2ea825f48 [NETFILTER]: nf_nat: use bool type in nf_nat_proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt 5f2b4c9006 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_tuple.h
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt 09f263cd39 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l4proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt 8ce8439a31 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l3proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy 5e8fbe2ac8 [NETFILTER]: nf_conntrack: add tuplehash l3num/protonum accessors
Add accessors for l3num and protonum and get rid of some overly long
expressions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy 5f7da4d26d [NETFILTER]: nf_conntrack_tcp: catch invalid state updates over ctnetlink
Invalid states can cause out-of-bound memory accesses of the state table.
Also don't insist on having a new state contained in the netlink message.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy dd13b01036 [NETFILTER]: nf_nat: kill helper and seq_adjust hooks
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.

The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy 55871d0479 [NETFILTER]: nf_conntrack_extend: warn on confirmed conntracks
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.

Also change NF_CT_ASSERT to use WARN_ON to get backtraces.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:51 +02:00
Patrick McHardy 8c87238b72 [NETFILTER]: nf_nat: don't add NAT extension for confirmed conntracks
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.

The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:51 +02:00
Patrick McHardy 42cf800c24 [NETFILTER]: nf_nat: remove obsolete check for ICMP redirects
Locally generated ICMP packets have a reference to the conntrack entry
of the original packet manually attached by icmp_send(). Therefore the
check for locally originated untracked ICMP redirects can never be
true.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy 9d908a69a3 [NETFILTER]: nf_nat: add SCTP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy 4910a08799 [NETFILTER]: nf_nat: add DCCP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy 2bc780499a [NETFILTER]: nf_conntrack: add DCCP protocol support
Add DCCP conntrack helper. Thanks to Gerrit Renker <gerrit@erg.abdn.ac.uk>
for review and testing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy d63a650736 [NETFILTER]: Add partial checksum validation helper
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy 6185f870e2 [NETFILTER]: nf_nat: add UDP-Lite support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy 2d2d84c40e [NETFILTER]: nf_nat: remove unused name from struct nf_nat_protocol
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy ca6a507490 [NETFILTER]: nf_conntrack_netlink: clean up NAT protocol parsing
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy 535b57c7c1 [NETFILTER]: nf_nat: move NAT ctnetlink helpers to nf_nat_proto_common
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy 5abd363f73 [NETFILTER]: nf_nat: fix random mode not to overwrite port rover
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy 937e0dfd87 [NETFILTER]: nf_nat: add helpers for common NAT protocol operations
Add generic ->in_range and ->unique_tuple ops to avoid duplicating them
again and again for future NAT modules and save a few bytes of text:

net/ipv4/netfilter/nf_nat_proto_tcp.c:
  tcp_in_range     |  -62 (removed)
  tcp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_udp.c:
  udp_in_range     |  -62 (removed)
  udp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_gre.c:
  gre_in_range |  -62 (removed)
 1 function changed, 62 bytes removed

vmlinux:
 5 functions changed, 704 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy 544473c166 [NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.

Return EAGAIN so userspace can retry the operation.

Fixes (with current iptables SVN version) netfilter bugzilla #104.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Patrick McHardy fa913ddf63 [NETFILTER]: nf_conntrack_sip: clear address in parse_addr()
Some callers pass uninitialized structures, clear the address to make
sure later comparisions work properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Jan Engelhardt c2f9c68398 [NETFILTER]: Explicitly initialize .priority in arptable_filter
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt 3bb0362d2f [NETFILTER]: remove arpt_(un)register_target indirection macros
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt 95eea855af [NETFILTER]: remove arpt_target indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt 4abff0775d [NETFILTER]: remove arpt_table indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt 72b72949db [NETFILTER]: annotate rest of nf_nat_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt 58c0fb0ddd [NETFILTER]: annotate rest of nf_conntrack_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt 5452e425ad [NETFILTER]: annotate {arp,ip,ip6,x}tables with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:35 +02:00
Jan Engelhardt 3cf93c96af [NETFILTER]: annotate xtables targets with const and remove casts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:05 +02:00
Robert P. J. Day fdccecd0cc [NETFILTER]: Use non-deprecated __RW_LOCK_UNLOCKED macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Robert P. J. Day 0718300c06 [NETFILTER]: bridge netfilter: use non-deprecated __RW_LOCK_UNLOCKED macro.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Alexey Dobriyan 666953df35 [NETFILTER]: ip_tables: per-netns FILTER/MANGLE/RAW tables for real
Commit 9335f047fe aka
"[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
added per-netns _view_ of iptables rules. They were shown to user, but
ignored by filtering code. Now that it's possible to at least ping loopback,
per-netns tables can affect filtering decisions.

netns is taken in case of
	PRE_ROUTING, LOCAL_IN -- from in device,
	POST_ROUTING, LOCAL_OUT -- from out device,
	FORWARD -- from in device which should be equal to out device's netns.
		   This code is relatively new, so BUG_ON was plugged.

Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
(overwhelming majority), b) consolidate code in one place -- similar
changes will be done in ipv6 and arp netfilter code.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:02 +02:00
Patrick McHardy 36e2a1b0f7 [NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Alexey Dobriyan b916f7d4b7 [NETFILTER]: nf_conntrack: less hairy ifdefs around proc and sysctl
Patch splits creation of /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack
and net.netfilter hierarchy into their own functions with dummy ones
if PROC_FS or SYSCTL is not set. Also, remove dead "ret = 0" write
while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Patrick McHardy 159d83363b [BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
The bridge netfilter code attaches a fake dst_entry with a pointer to a
fake net_device structure to skbs it passes up to IPv4 netfilter. This
leads to crashes when the skb is passed to __ip_route_output_key when
dereferencing the namespace pointer.

Since bridging can currently only operate in the init_net namespace,
the easiest fix for now is to initialize the nd_net pointer of the
fake net_device struct to &init_net.

Should fix bugzilla 10323: http://bugzilla.kernel.org/show_bug.cgi?id=10323

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:46:01 -07:00
Pavel Emelyanov 4dee959723 [NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
Consider we are putting a clusterip_config entry with the "entries"
count == 1, and on the other CPU there's a clusterip_config_find_get
in progress:

CPU1:							CPU2:
clusterip_config_entry_put:				clusterip_config_find_get:
if (atomic_dec_and_test(&c->entries)) {
	/* true */
							read_lock_bh(&clusterip_lock);
							c = __clusterip_config_find(clusterip);
							/* found - it's still in list */
							...
							atomic_inc(&c->entries);
							read_unlock_bh(&clusterip_lock);

	write_lock_bh(&clusterip_lock);
	list_del(&c->list);
	write_unlock_bh(&clusterip_lock);
	...
	dev_put(c->dev);

Oops! We have an entry returned by the clusterip_config_find_get,
which is a) not in list b) has a stale dev pointer.

The problems will happen when the CPU2 will release the entry - it
will remove it from the list for the 2nd time, thus spoiling it, and
will put a stale dev pointer.

The fix is to make atomic_dec_and_test under the clusterip_lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 00:44:52 -07:00
Gerrit Renker 7de6c03336 [SKB]: __skb_append = __skb_queue_after
This expresses __skb_append in terms of __skb_queue_after, exploiting that

  __skb_append(old, new, list) = __skb_queue_after(list, old, new).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:09 -07:00
Rami Rosen 0912ea38de [IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().
This patches adds a call to increment IPSTATS_MIB_OUTFORWDATAGRAMS
when forwarding the packet in ip6_mr_forward() in the IPv6 multicast
routing module (net/ipv6/ip6mr.c).

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:59:13 -07:00
YOSHIFUJI Hideaki 9625ed72e8 [IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
As far as I can remember, I was going to disable privacy extensions
on all "tunnel" interfaces.  Disable it on ip6-ip6 interface as well.

Also, just remove ifdefs for SIT for simplicity.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:47:11 -07:00
YOSHIFUJI Hideaki b077d7abab [IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:42:18 -07:00
Jan Engelhardt 0b18542b7f [NET]: Sink IPv6 menuoptions into its own submenu
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:30:47 -07:00
YOSHIFUJI Hideaki e7712f1a7c [IPV6]: Share common code-paths for sticky socket options.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:52 -07:00
YOSHIFUJI Hideaki cee8947338 [IPV6] MROUTE: Do not call ipv6_find_idev() directly.
Since NETDEV_REGISTER notifier chain is responsible for creating
inet6_dev{}, we do not need to call ipv6_find_idev() directly here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:16 -07:00
David S. Miller b45e9189c0 [IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
Fixes kernel bugzilla 10437

Based almost entirely upon a patch by Dmitry Butskoy.

When deciding what raw sockets to deliver the ICMPv6
to, we should use the addresses in the ICMPv6 quoted
IPV6 header, not the top-level one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:14:15 -07:00
Patrick McHardy 2ed9926e16 [NET]: Return more appropriate error from eth_validate_addr().
Paul Bolle wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9923 would have been much easier to
> track down if eth_validate_addr() would somehow complain aloud if an address 
> is invalid. Shouldn't it make at least some noise?

I guess it should return -EADDRNOTAVAIL similar to eth_mac_addr()
when validation fails.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:45:40 -07:00
Pavel Emelyanov 671a1c7401 [NETNS][DCCPV6]: Make per-net socket lookup.
The inet6_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.

No more things to do for dccpv6, since routing is OK and the
ipv4-like transport layer filtering is not done for ipv6.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:33:06 -07:00
Pavel Emelyanov 334527d351 [NETNS][DCCPV6]: Actually create ctl socket on each net and use it.
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v6_ctl_send_reset.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:45 -07:00
Pavel Emelyanov 0204774191 [NETNS][DCCPV6]: Move the dccp_v6_ctl_sk on the struct net.
And replace all its usage with init_net's socket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:25 -07:00
Pavel Emelyanov 8231bd270d [NETNS][DCCPV6]: Add dummy per-net operations.
They will be responsible for ctl socket initialization, but
currently they are void.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:02 -07:00
Pavel Emelyanov 68d185980f [NETNS][DCCPV6]: Don't pass NULL to ip6_dst_lookup.
This call uses the sock to get the net to lookup the routing
in. With CONFIG_NET_NS this code will OOPS, since the sk ptr
is NULL.

After looking inside the ip6_dst_lookup and drawing the analogy
with respective ipv6 code, it seems, that the dccp ctl socket 
is a good candidate for the first argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:31:32 -07:00
Pavel Emelyanov fc5f8580d3 [NETNS][DCCPV4]: Enable DCCPv4 in net namespaces.
This enables sockets creation with IPPROTO_DCCP and enables
the ip level to pass DCCP packets to the DCCP level.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:31:05 -07:00
Pavel Emelyanov b9901a84c9 [NETNS][DCCPV4]: Make per-net socket lookup.
The inet_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:30:43 -07:00
Pavel Emelyanov f54873982c [NETNS][DCCPV4]: Use proper net to route the reset packet.
The dccp_v4_route_skb used in dccp_v4_ctl_send_reset, currently
works with init_net's routing tables - fix it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:30:19 -07:00
Pavel Emelyanov b76c4b27fe [NETNS][DCCPV4]: Actually create ctl socket on each net and use it.
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v4_ctl_send_reset.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:29:59 -07:00