Commit Graph

1681 Commits

Author SHA1 Message Date
Alexey Dobriyan 400dad39d1 netfilter: netns nf_conntrack: per-netns conntrack hash
* make per-netns conntrack hash

  Other solution is to add ->ct_net pointer to tuplehashes and still has one
  hash, I tried that it's ugly and requires more code deep down in protocol
  modules et al.

* propagate netns pointer to where needed, e. g. to conntrack iterators.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:03 +02:00
Alexey Dobriyan e10aad9998 netfilter: netns: ip6t_REJECT in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:02 +02:00
Alexey Dobriyan 7dd1b8dad8 netfilter: netns: ip6table_mangle in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:02 +02:00
Alexey Dobriyan 1339dd9171 netfilter: netns: ip6table_raw in netns for real
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:01 +02:00
Alexey Dobriyan 48dc7865aa netfilter: netns: remove nf_*_net() wrappers
Now that dev_net() exists, the usefullness of them is even less. Also they're
a big problem in resolving circular header dependencies necessary for
NOTRACK-in-netns patch. See below.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:01 +02:00
Jan Engelhardt ee999d8b95 netfilter: x_tables: use NFPROTO_* in extensions
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:01 +02:00
Jan Engelhardt 76108cea06 netfilter: Use unsigned types for hooknum and pf vars
and (try to) consistently use u_int8_t for the L3 family.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:00 +02:00
Denis V. Lunev be713a443e netns: make uplitev6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:50:06 -07:00
Denis V. Lunev 0c7ed677fb netns: make udpv6 mib per/namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:49:36 -07:00
Denis V. Lunev e43291cb37 netns: add stub functions for per/namespace mibs allocation
The content of init_ipv6_mibs/cleanup_ipv6_mibs will be moved to new
calls one by one next.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:48:53 -07:00
Denis V. Lunev ab38dc7a70 netns: allow per device ipv6 snmp statistics in non-initial namespace
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:47:55 -07:00
Denis V. Lunev 2b4209e4b7 netns: register global ipv6 mibs statistics in each namespace
Unused net variable will become used very soon.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:47:37 -07:00
Denis V. Lunev 7b43ccecc7 ipv6: separate seq_ops for global & per/device ipv6 statistics
idev has been stored on seq->private. NULL has been stored for global
statistics.

The situation is changed with net namespace. We need to store pointer to
struct net and the only place is seq->private. So, we'll have for
/proc/net/dev_snmp6/* and for /proc/net/snmp6 pointers of two different
types stored in the same field.

This effectively requires to separate seq_ops of these files.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:47:12 -07:00
Denis V. Lunev 35f0a5df6c ipv6: consolidate ipv6 sock_stat code at the beginning of net/ipv6/proc.c
Simple, comsolidate sockstat6 staff in one place, at the beginning of
the file. Right now sockstat6_seq_open/sockstat6_seq_fops looks like an
intrusion in the middle of snmp6 code.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:46:47 -07:00
Denis V. Lunev 06f38527de netns: register /proc/net/dev_snmp6/* in each ns
Do the same for /proc/net/snmp6.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:46:18 -07:00
Denis V. Lunev 835bcc0497 netns: move /proc/net/dev_snmp6 to struct net
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:45:55 -07:00
Peter Zijlstra b339a47c37 ipv6: initialize ip6_route sysctl vars in ip6_route_net_init()
This makes that ip6_route_net_init() does all of the route init code.
There used to be a race between ip6_route_net_init() and ip6_net_init()
and someone relying on the combined result was left out cold.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:15:00 -07:00
Peter Zijlstra 68fffc6796 ipv6: clean up ip6_route_net_init() error handling
ip6_route_net_init() error handling looked less than solid, fix 'er up.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 14:12:10 -07:00
KOVACS Krisztian 23542618de inet: Don't lookup the socket if there's a socket attached to the skb
Use the socket cached in the skb if it's present.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 12:41:01 -07:00
KOVACS Krisztian 607c4aaf03 inet: Add udplib_lookup_skb() helpers
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 12:38:32 -07:00
Arnaldo Carvalho de Melo 9a1f27c480 inet_hashtables: Add inet_lookup_skb helpers
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 11:41:57 -07:00
KOVACS Krisztian 1668e010cb ipv4: Make inet_sock.h independent of route.h
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif()
usually require other route.h functionality anyway this patch moves
inet_iif() to route.h.

Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 07:33:10 -07:00
David S. Miller b262e60309 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/ath9k/core.c
	drivers/net/wireless/ath9k/main.c
	net/core/dev.c
2008-10-01 06:12:56 -07:00
Ilpo Järvinen 93c8b90f01 ipv6: almost identical frag hashing funcs combined
$ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c
 --- reassembly.c:ip6qhashfn()
 +++ netfilter/nf_conntrack_reasm.c:ip6qhashfn()
@@ -1,5 +1,5 @@
-static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr,
-			       struct in6_addr *daddr)
+static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr,
+			       const struct in6_addr *daddr)
 {
 	u32 a, b, c;

@@ -9,7 +9,7 @@

 	a += JHASH_GOLDEN_RATIO;
 	b += JHASH_GOLDEN_RATIO;
-	c += ip6_frags.rnd;
+	c += nf_frags.rnd;
 	__jhash_mix(a, b, c);

 	a += (__force u32)saddr->s6_addr32[3];

And codiff xx.o.old xx.o.new:

net/ipv6/netfilter/nf_conntrack_reasm.c:
  ip6qhashfn         | -512
  nf_hashfn          |   +6
  nf_ct_frag6_gather |  +36
 3 functions changed, 42 bytes added, 512 bytes removed, diff: -470
net/ipv6/reassembly.c:
  ip6qhashfn    | -512
  ip6_hashfn    |   +7
  ipv6_frag_rcv |  +89
 3 functions changed, 96 bytes added, 512 bytes removed, diff: -416

net/ipv6/reassembly.c:
  inet6_hash_frag | +510
 1 function changed, 510 bytes added, diff: +510

Total: -376

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:48:31 -07:00
Arnaud Ebalard 5dc121e9a7 XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep
ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
be initialized) when dst_alloc() is called from ip6_dst_blackhole().
Otherwise, it results in the following (xfrm_larval_drop is now set to
1 by default):

[   78.697642] Unable to handle kernel paging request for data at address 0x0000004c
[   78.703449] Faulting instruction address: 0xc0097f54
[   78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
[   78.792791] PowerMac
[   78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
[   78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430
[   78.809997] REGS: eef19ad0 TRAP: 0300   Not tainted  (2.6.27-rc5)
[   78.815743] MSR: 00001032 <ME,IR,DR>  CR: 22242482  XER: 20000000
[   78.821550] DAR: 0000004c, DSISR: 40000000
[   78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000
[   78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
[   78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
[   78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
[   78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
[   78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94
[   78.862581] LR [c0334a28] dst_alloc+0x40/0xc4
[   78.868451] Call Trace:
[   78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
[   78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4
[   78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc
[   78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88
[   78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78
[   78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4
[   78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0
[   78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210
[   78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38
[   78.927295] --- Exception: c01 at 0xfe2d730
[   78.927297]     LR = 0xfe2d71c
[   78.939019] Instruction dump:
[   78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
[   78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050
[   78.956464] ---[ end trace 05fa1ed7972487a1 ]---

As commented by Benjamin Thery, the bug was introduced by
f2fc6a5458, while adding network
namespaces support to ipv6 routes.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:37:56 -07:00
Denis V. Lunev 2a5b82751f ipv6: NULL pointer dereferrence in tcp_v6_send_ack
The following actions are possible:
tcp_v6_rcv
  skb->dev = NULL;
  tcp_v6_do_rcv
    tcp_v6_hnd_req
      tcp_check_req
        req->rsk_ops->send_ack == tcp_v6_send_ack

So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace
from dst entry.

Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding
in IPv4 code.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:13:16 -07:00
Yasuyuki Kozakai 8ca31ce52a netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion
The current code ignores rules for internal options in HBH/DST options
header in packet processing if 'Not strict' mode is specified (which is not
implemented). Clearly it is not expected by user.

Kernel should reject HBH/DST rule insertion with 'Not strict' mode
in the first place.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-24 15:53:39 -07:00
Arnaldo Carvalho de Melo 6067804047 net: Use hton[sl]() instead of __constant_hton[sl]() where applicable
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 22:20:49 -07:00
Vegard Nossum 78d15e8275 tcp_ipv6: fix use of uninitialized memory
inet6_rsk() is called on a struct request_sock * before we
have checked whether the socket is an ipv6 socket or a ipv6-
mapped ipv4 socket. The access that triggers this is the
inet_rsk(rsk)->inet6_rsk_offset dereference in inet6_rsk().

This is arguably not a critical error as the inet6_rsk_offset
is only used to compute a pointer which is never really used
(in the code path in question) anyway. But it might be a
latent error, so let's fix it.

Spotted by kmemcheck.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-12 16:17:43 -07:00
David S. Miller 1e493d1946 ipv6: On interface down/unregister, purge icmp routes too.
Johannes Berg reported that occaisionally, bringing an interface
down or unregistering it would hang for up to 30 seconds.  Using
debugging output he provided it became clear that ICMP6 routes
were the culprit.

The problem is that ICMP6 routes live in their own world totally
separate from normal ipv6 routes.  So there are all kinds of special
cases throughout the ipv6 code to handle this.

While we should really try to unify all of this stuff somehow,
for the time being let's fix this by purging the ICMP6 routes
that match the device in question during rt6_ifdown().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-10 23:39:28 -07:00
Neil Horman e550dfb0c2 ipv6: Fix OOPS in ip6_dst_lookup_tail().
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours:
ip6_dst_lookup_tail NULL crash"

dst->neighbour is not necessarily hooked up at this point
in the processing path, so blindly dereferencing it is
the wrong thing to do.  This NULL check exists in other
similar paths and this case was just an oversight.

Also fix the completely wrong and confusing indentation
here while we're at it.

Based upon a patch by Evgeniy Polyakov.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09 13:51:35 -07:00
Daniel Lezcano d315492b1a netns : fix kernel panic in timewait socket destruction
How to reproduce ?
 - create a network namespace
 - use tcp protocol and get timewait socket
 - exit the network namespace
 - after a moment (when the timewait socket is destroyed), the kernel
   panics.

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d


Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007

This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to 
the namespace like for example in:
	inet_twdr_do_twkill_work
		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

Purging the timewait sockets at the network namespace destruction will:
 1) speed up memory freeing for the namespace
 2) fix kernel panic on asynchronous timewait destruction

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-08 13:17:27 -07:00
Yang Hongyang 3cc76caa98 ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 14:27:51 -07:00
Al Viro ce3113ec57 ipv6: sysctl fixes
Braino: net.ipv6 in ipv6 skeleton has no business in rotable
class

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-25 15:18:15 -07:00
Stephen Hemminger f410a1fba7 ipv6: protocol for address routes
This fixes a problem spotted with zebra, but not sure if it is
necessary a kernel problem.  With IPV6 when an address is added to an
interface, Zebra creates a duplicate RIB entry, one as a connected
route, and other as a kernel route.

When an address is added to an interface the RTN_NEWADDR message
causes Zebra to create a connected route. In IPV4 when an address is
added to an interface a RTN_NEWROUTE message is set to user space with
the protocol RTPROT_KERNEL. Zebra ignores these messages, because it
already has the connected route.

The problem is that route created in IPV6 has route protocol ==
RTPROT_BOOT.  Was this a design decision or a bug? This fixes it. Same
patch applies to both net-2.6 and stable.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 05:16:46 -07:00
Denis V. Lunev fdc0bde90a icmp: icmp_sk() should not use smp_processor_id() in preemptible code
Pass namespace into icmp_xmit_lock, obtain socket inside and return
it as a result for caller.

Thanks Alexey Dobryan for this report:

Steps to reproduce:

	CONFIG_PREEMPT=y
	CONFIG_DEBUG_PREEMPT=y
	tracepath <something>

BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
caller is icmp_sk+0x15/0x30
Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1

Call Trace:
 [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0
 [<ffffffff80409405>] icmp_sk+0x15/0x30
 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110
 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40
 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900
 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290
 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60
 [<ffffffff803e6650>] ? dst_output+0x0/0x10
 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60
 [<ffffffff803e92e3>] ip_output+0xa3/0xf0
 [<ffffffff803e68d0>] ip_local_out+0x20/0x30
 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400
 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0
 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0
 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80
 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110
 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010
 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450
 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590
 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80
 [<ffffffff803ba50a>] sys_sendto+0xea/0x120
 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80
 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0
 [<ffffffff8024e0c6>] ? up_read+0x26/0x30
 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b

icmp6_sk() is similar.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-23 04:43:33 -07:00
Yang Hongyang 13601cd8e4 ipv6: Fix the return interface index when get it while no message is received.
When get receiving interface index while no message is received,
the bounded device's index of the socket should be returned.

RFC 3542:
   Issuing getsockopt() for the above options will return the sticky
   option value i.e., the value set with setsockopt().  If no sticky
   option value has been set getsockopt() will return the following
   values:

   -  For the IPV6_PKTINFO option, it will return an in6_pktinfo
      structure with ipi6_addr being in6addr_any and ipi6_ifindex being
      zero.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-17 23:21:52 -07:00
Brian Haley 191cd58250 netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr()
ipv6_dev_get_saddr() blindly de-references dst_dev to get the network
namespace, but some callers might pass NULL.  Change callers to pass a
namespace pointer instead.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-14 15:33:21 -07:00
Brian Haley 5e0115e500 ipv6: Fix OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, ip6_route_output, rt6_fill_node+0x175
Alexey Dobriyan wrote:
> On Thu, Aug 07, 2008 at 07:00:56PM +0200, John Gumb wrote:
>> Scenario: no ipv6 default route set.
> 
>> # ip -f inet6 route get fec0::1
>>
>> BUG: unable to handle kernel NULL pointer dereference at 00000000
>> IP: [<c0369b85>] rt6_fill_node+0x175/0x3b0
>> EIP is at rt6_fill_node+0x175/0x3b0
> 
> 0xffffffff80424dd3 is in rt6_fill_node (net/ipv6/route.c:2191).
> 2186                    } else
> 2187    #endif
> 2188                            NLA_PUT_U32(skb, RTA_IIF, iif);
> 2189            } else if (dst) {
> 2190                    struct in6_addr saddr_buf;
> 2191      ====>         if (ipv6_dev_get_saddr(ip6_dst_idev(&rt->u.dst)->dev,
>					       ^^^^^^^^^^^^^^^^^^^^^^^^
>											NULL
> 
> 2192                                           dst, 0, &saddr_buf) == 0)
> 2193                            NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
> 2194            }

The commit that changed this can't be reverted easily, but the patch
below works for me.

Fix NULL de-reference in rt6_fill_node() when there's no IPv6 input
device present in the dst entry.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 01:58:57 -07:00
Herbert Xu d97106ea52 udp: Drop socket lock for encapsulated packets
The socket lock is there to protect the normal UDP receive path.
Encapsulation UDP sockets don't need that protection.  In fact
the locking is deadly for them as they may contain another UDP
packet within, possibly with the same addresses.

Also the nested bit was copied from TCP.  TCP needs it because
of accept(2) spawning sockets.  This simply doesn't apply to UDP
so I've removed it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-09 00:35:05 -07:00
Gui Jianfeng 6edafaaf6f tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup
If the following packet flow happen, kernel will panic.
MathineA			MathineB
		SYN
	---------------------->    
        	SYN+ACK
	<----------------------
		ACK(bad seq)
	---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[  302.817075] Oops: 0000 [#1] SMP 
[  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[  302.849946] 
[  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
[  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
[  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
[  302.900140] Call Trace:
[  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
[  302.910082]  [<c04250a3>] printk+0x14/0x18
[  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
[  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
[  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
[  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
[  302.948999]  [<c040564b>] do_softirq+0x55/0x88
[  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[  302.954986]  [<c04284da>] irq_exit+0x35/0x69
[  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
[  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
[  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
[  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
[  302.972169]  =======================
[  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
[  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[  303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 23:50:04 -07:00
Joakim Koskela abf5cdb89d ipsec: Interfamily IPSec BEET, ipv4-inner ipv6-outer
Here's a revised version, based on Herbert's comments, of a fix for
the ipv4-inner, ipv6-outer interfamily ipsec beet mode. It fixes the
network header adjustment during interfamily, as well as makes sure
that we reserve enough room for the new ipv6 header if we might have
something else as the inner family. Also, the ipv4 pseudo header
construction was added.

Signed-off-by: Joakim Koskela <jookos@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 02:40:25 -07:00
Rami Rosen 1ca615fb81 ipv6: replace dst_metric() with dst_mtu() in net/ipv6/route.c.
This patch replaces dst_metric() with dst_mtu() in net/ipv6/route.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 02:34:21 -07:00
Wei Yongjun 283d07ac20 ipv6: Do not drop packet if skb->local_df is set to true
The old code will drop IPv6 packet if ipfragok is not set, since
ipfragok is obsoleted, will be instead by used skb->local_df, so this
check must be changed to skb->local_df.

This patch fix this problem and not drop packet if skb->local_df is
set to true.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03 21:15:59 -07:00
Yang Hongyang cfb266c0ee ipv6: Fix the return value of Set Hop-by-Hop options header with NULL data pointer
When Set Hop-by-Hop options header with NULL data 
pointer and optlen is not zero use setsockopt(),
the kernel successfully return 0 instead of 
return error EINVAL or EFAULT.

This patch fix the problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03 18:16:15 -07:00
Florian Westphal 1730554f25 ipv6: syncookies: free reqsk on xfrm_lookup error
cookie_v6_check() did not call reqsk_free() if xfrm_lookup() fails,
leaking the request sock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-03 18:13:44 -07:00
Adam Langley 00b1304c4c tcp: MD5: Fix IPv6 signatures
Reported by Stefanos Harhalakis; although 2.6.27-rc1 talks to itself using IPv6
TCP MD5 packets just fine, Stefanos noted that tcpdump claimed that the
signatures were invalid.

I broke this in 49a72dfb88 ("tcp: Fix MD5
signatures for non-linear skbs"), it was just a typo.

Note that tcpdump will still sometimes claim that the signatures are incorrect.
A patch to tcpdump has been submitted for this[1].

[1] http://tinyurl.com/6a4fl2

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 21:36:07 -07:00
Adam Langley 90b7e1120b tcp: MD5: Fix MD5 signatures on certain ACK packets
I noticed, looking at tcpdumps, that timewait ACKs were getting sent
with an incorrect MD5 signature when signatures were enabled.

I broke this in 49a72dfb88 ("tcp: Fix
MD5 signatures for non-linear skbs"). I didn't take into account that
the skb passed to tcp_*_send_ack was the inbound packet, thus the
source and dest addresses need to be swapped when calculating the MD5
pseudoheader.

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 20:49:48 -07:00
Wei Yongjun 77e2f14f71 ipv6: Fix ip6_xmit to send fragments if ipfragok is true
SCTP used ip6_xmit() to send fragments after received ICMP packet too
big message. But while send packet used ip6_xmit, the skb->local_df is
not initialized. So when skb if enter ip6_fragment(), the following
code will discard the skb.

ip6_fragment(...)
{
    if (!skb->local_df) {
        ...
        return -EMSGSIZE;
    }
    ...
}

SCTP do the following step:
1. send packet ip6_xmit(skb, ipfragok=0)
2. received ICMP packet too big message
3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1)

This patch fixed the problem by set local_df if ipfragok is true.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 20:46:47 -07:00
Daniel Lezcano 17ef51fce0 ipv6: Fix useless proc net sockstat6 removal
This call is no longer needed, sockstat6 is per namespace so it is
removed at the namespace subsystem destruction.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-30 03:27:52 -07:00
David S. Miller 785957d3e8 tcp: MD5: Use MIB counter instead of warning for MD5 mismatch.
From a report by Matti Aarnio, and preliminary patch by Adam Langley.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-30 03:27:25 -07:00
Miao Xie 4a36702e01 IPv6: datagram_send_ctl() should exit immediately when an error occured
When an error occured, datagram_send_ctl() should exit immediately rather than
continue to run the for loop. Otherwise, the variable err might be changed and
the error might be hidden.

Fix this bug by using "goto" instead of "break".

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-29 23:57:58 -07:00
Al Viro 6f9f489a4e net: missing bits of net-namespace / sysctl
Piss-poor sysctl registration API strikes again, film at 11...
What we really need is _pathname_ required to be present in
already registered table, so that kernel could warn about bad
order.  That's the next target for sysctl stuff (and generally
saner and more explicit order of initialization of ipv[46]
internals wouldn't hurt either).

For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and
make sure that sufficient skeleton is there before we start registering
per-net sysctls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27 04:40:51 -07:00
David S. Miller 15d3b4a262 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-07-27 04:40:08 -07:00
David S. Miller 2c3abab7c9 ipcomp: Fix warnings after ipcomp consolidation.
net/ipv4/ipcomp.c: In function ‘ipcomp4_init_state’:
net/ipv4/ipcomp.c:109: warning: unused variable ‘calg_desc’
net/ipv4/ipcomp.c:108: warning: unused variable ‘ipcd’
net/ipv4/ipcomp.c:107: warning: ‘err’ may be used uninitialized in this function
net/ipv6/ipcomp6.c: In function ‘ipcomp6_init_state’:
net/ipv6/ipcomp6.c:139: warning: unused variable ‘calg_desc’
net/ipv6/ipcomp6.c:138: warning: unused variable ‘ipcd’
net/ipv6/ipcomp6.c:137: warning: ‘err’ may be used uninitialized in this function

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27 03:59:24 -07:00
Linus Torvalds 2284284281 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netns: fix ip_rt_frag_needed rt_is_expired
  netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
  netfilter: fix double-free and use-after free
  netfilter: arptables in netns for real
  netfilter: ip{,6}tables_security: fix future section mismatch
  selinux: use nf_register_hooks()
  netfilter: ebtables: use nf_register_hooks()
  Revert "pkt_sched: sch_sfq: dump a real number of flows"
  qeth: use dev->ml_priv instead of dev->priv
  syncookies: Make sure ECN is disabled
  net: drop unused BUG_TRAP()
  net: convert BUG_TRAP to generic WARN_ON
  drivers/net: convert BUG_TRAP to generic WARN_ON
2008-07-26 20:17:56 -07:00
Alexey Dobriyan f858b4869a netfilter: ip{,6}tables_security: fix future section mismatch
Currently not visible, because NET_NS is mutually exclusive with SYSFS
which is required by SECURITY.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-26 17:48:38 -07:00
Florian Westphal 16df845f45 syncookies: Make sure ECN is disabled
ecn_ok is not initialized when a connection is established by cookies.
The cookie syn-ack never sets ECN, so ecn_ok must be set to 0.

Spotted using ns-3/network simulation cradle simulator and valgrind.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-26 02:21:54 -07:00
Ilpo Järvinen 547b792cac net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 21:43:18 -07:00
Linus Torvalds 1ff8419871 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ipsec: ipcomp - Decompress into frags if necessary
  ipsec: ipcomp - Merge IPComp implementations
  pkt_sched: Fix locking in shutdown_scheduler_queue()
2008-07-25 17:40:16 -07:00
Paul E. McKenney 696adfe84c list_for_each_rcu must die: networking
All uses of list_for_each_rcu() can be profitably replaced by the
easier-to-use list_for_each_entry_rcu().  This patch makes this change for
networking, in preparation for removing the list_for_each_rcu() API
entirely.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Herbert Xu 6fccab671f ipsec: ipcomp - Merge IPComp implementations
This patch merges the IPv4/IPv6 IPComp implementations since most
of the code is identical.  As a result future enhancements will no
longer need to be duplicated.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 02:54:40 -07:00
Patrick McHardy 70eed75d76 netfilter: make security table depend on NETFILTER_ADVANCED
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 16:42:42 -07:00
Stephen Hemminger 3d0f24a74e ipv6: icmp6_dst_gc return change
Change icmp6_dst_gc to return the one value the caller cares about rather
than using call by reference.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:35:50 -07:00
Stephen Hemminger 75307c0fe7 ipv6: use kcalloc
Th fib_table_hash is an array, so use kcalloc.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:35:07 -07:00
Stephen Hemminger a76d7345a3 ipv6: use spin_trylock_bh
Now there is spin_trylock_bh, use it rather than open coding.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:34:35 -07:00
Stephen Hemminger c8a4522245 ipv6: use round_jiffies
This timer normally happens once a minute, there is no need to cause an
early wakeup for it, so align it to next second boundary to safe power.
It can't be deferred because then it could take too long on cleanup or DoS.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:34:09 -07:00
Stephen Hemminger 417f28bb34 netns: dont alloc ipv6 fib timer list
FIB timer list is a trivial size structure, avoid indirection and just
put it in existing ns.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:33:45 -07:00
Adrian Bunk 888c848ed3 ipv6: make struct ipv6_devconf static
struct ipv6_devconf can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:21:58 -07:00
Stephen Hemminger 847499ce71 ipv6: use timer pending
This fixes the bridge reference count problem and cleanups ipv6 FIB
timer management.  Don't use expires field, because it is not a proper
way to test, instead use timer_pending().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 13:21:35 -07:00
David Miller 702beb87d6 ipv6: Fix warning in addrconf code.
Reported by Linus.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 21:18:26 -07:00
YOSHIFUJI Hideaki a6ffb404dc ipv6 mcast: Omit redundant address family checks in ip6_mc_source().
The caller has alredy checked for them.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:36:07 -07:00
YOSHIFUJI Hideaki 53b7997fd5 ipv6 netns: Make several "global" sysctl variables namespace aware.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:35:03 -07:00
YOSHIFUJI Hideaki 721499e893 netns: Use net_eq() to compare net-namespaces for optimization.
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:34:43 -07:00
Denis V. Lunev 725a8ff04a ipv6: remove unused parameter from ip6_ra_control
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:28:58 -07:00
Adam Langley 49a72dfb88 tcp: Fix MD5 signatures for non-linear skbs
Currently, the MD5 code assumes that the SKBs are linear and, in the case
that they aren't, happily goes off and hashes off the end of the SKB and
into random memory.

Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy
Polyakov. Also includes a couple of missed route_caps from Stephen's patch
in [2].

[1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2
[2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:01:42 -07:00
Pavel Emelyanov b6fcbdb4f2 proc: consolidate per-net single-release callers
They are symmetrical to single_open ones :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:07:44 -07:00
Pavel Emelyanov de05c557b2 proc: consolidate per-net single_open callers
There are already 7 of them - time to kill some duplicate code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:07:21 -07:00
Pavel Emelyanov de0744af1f mib: add net to NET_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:16 -07:00
Pavel Emelyanov ca12a1a443 inet: prepare net on the stack for NET accounting macros
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:28:42 -07:00
Pavel Emelyanov 63231bddf6 mib: add net to TCP_INC_STATS_BH
Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on 
the stack.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:25 -07:00
Pavel Emelyanov a86b1e3019 inet: prepare struct net for TCP MIB accounting
This is the same as the first patch in the set, but preparing
the net for TCP_XXX_STATS - save the struct net on the stack
where required and possible.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:58 -07:00
Wang Chen 7af3db78a9 ipv6: Fix using after dev_put()
Patrick McHardy pointed it out.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:54:54 -07:00
Wang Chen 5ae7b44413 ipv6: Check return of dev_set_allmulti
allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for allmulti to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net> 
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:54:23 -07:00
David S. Miller 2aec609fb4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/netfilter/nf_conntrack_proto_tcp.c
2008-07-14 20:23:54 -07:00
Denis V. Lunev 0ce28553cc ipv6: missed namespace context in ipv6_rthdr_rcv
Signed-off-by: Denis V. Lunev <den@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:54:50 -07:00
David S. Miller 052979499c pkt_sched: Add qdisc_tx_is_noop() helper and use in IPV6.
This indicates if the NOOP scheduler is what is active for TX on a
given device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:01:27 -07:00
David S. Miller b0e1e6462d netdev: Move rest of qdisc state into struct netdev_queue
Now qdisc, qdisc_sleeping, and qdisc_list also live there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:42:10 -07:00
David S. Miller 7c3ceb4a40 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl-3945.c
	net/mac80211/mlme.c
2008-07-08 16:30:17 -07:00
Andrey Vagin b223856640 ipv6: fix race between ipv6_del_addr and DAD timer
Consider the following scenario:

ipv6_del_addr(ifp)
  ipv6_ifa_notify(RTM_DELADDR, ifp)
    ip6_del_rt(ifp->rt)

after returning from the ipv6_ifa_notify and enabling BH-s
back, but *before* calling the addrconf_del_timer the 
ifp->timer fires and:

addrconf_dad_timer(ifp)
  addrconf_dad_completed(ifp)
    ipv6_ifa_notify(RTM_NEWADDR, ifp)
      ip6_ins_rt(ifp->rt)

then return back to the ipv6_del_addr and:

in6_ifa_put(ifp)
  inet6_ifa_finish_destroy(ifp)
    dst_release(&ifp->rt->u.dst)

After this we have an ifp->rt inserted into fib6 lists, but 
queued for gc, which in turn can result in oopses in the
fib6_run_gc. Maybe some other nasty things, but we caught 
only the oops in gc so far.

The solution is to disarm the ifp->timer before flushing the
rt from it.

Signed-off-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 15:13:31 -07:00
Alexey Dobriyan 43de9dfeaa netfilter: ip6table_filter in netns for real
One still needs to remove checks in nf_hook_slow() and nf_sockopt_find()
to test this, though.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:36:18 -07:00
Alexey Dobriyan d2789312cc netfilter: use correct namespace in ip6table_security
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:34:52 -07:00
Pavel Emelyanov ef28d1a20f MIB: add struct net to UDP6_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:40 -07:00
Pavel Emelyanov 235b9f7ac5 MIB: add struct net to UDP6_INC_STATS_USER
As simple as the patch #1 in this set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:20 -07:00
YOSHIFUJI Hideaki e0835f8fa5 ipv4,ipv6 mroute: Add some helper inline functions to remove ugly ifdefs.
ip{,v6}_mroute_{set,get}sockopt() should not matter by optimization but
it would be better not to depend on optimization semantically.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:57 +09:00
Wang Chen 623d1a1af7 ipv6: Do cleanup for ip6_mr_init.
If do not do it, we will get following issues:
1. Leaving junks after inet6_init failing halfway.
2. Leaving proc and notifier junks after ipv6 modules unloading.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki dd3abc4ef5 ipv6 route: Prefer outgoing interface with source address assigned.
Outgoing interface is selected by the route decision if unspecified.
Let's prefer routes via interface(s) with the address assigned if we
have multiple routes with same cost.
With help from Naohiro Ooiwa <nooiwa@miraclelinux.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki 1b34be74cb ipv6 addrconf: add accept_dad sysctl to control DAD operation.
- If 0, disable DAD.
- If 1, perform DAD (default).
- If >1, perform DAD and disable IPv6 operation if DAD for MAC-based
  link-local address has been failed (RFC4862 5.4.5).

We do not follow RFC4862 by default.  Refer to the netdev thread entitled
"Linux IPv6 DAD not full conform to RFC 4862 ?"
	http://www.spinics.net/lists/netdev/msg52027.html

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki 778d80be52 ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
YOSHIFUJI Hideaki 5ce83afaac ipv6: Assume the loopback address in link-local scope.
Handle interface property strictly when looking up a route
for the loopback address (RFC4291 2.5.3).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00