linux

Commit Graph

Author	SHA1	Message	Date
Ilpo Järvinen	c137f3dda0	[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb Fixes a long-standing bug which makes NewReno recovery crippled. With GSO the whole head skb was marked as LOST which is in violation of NewReno procedure that only wants to mark one packet and ended up breaking our TCP code by causing counter overflow because our code was built on top of assumption about valid NewReno procedure. This manifested as triggering a WARN_ON for the overflow in a number of places. It seems relatively safe alternative to just do nothing if tcp_fragment fails due to oom because another duplicate ACK is likely to be received soon and the fragmentation will be retried. Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was lucky enough to be able to reproduce this so that the warning for the overflow was hit. It's not as easy task as it seems even if this bug happens quite often because the amount of outstanding data is pretty significant for the mismarkings to lead to an overflow. Because it's very late in 2.6.25-rc cycle (if this even makes in time), I didn't want to touch anything with SACK enabled here. Fragmenting might be useful for it as well but it's more or less a policy decision rather than mandatory fix. Thus there's no need to rush and we can postpone considering tcp_fragment with SACK for 2.6.26. In 2.6.24 and earlier, this very same bug existed but the effect is slightly different because of a small changes in the if conditions that fit to the patch's context. With them nothing got lost marker and thus no retransmissions happened. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:32:38 -07:00
Ilpo Järvinen	1b69d74539	[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack The fast retransmission can be forced locally to the rfc3517 branch in tcp_update_scoreboard instead of making such fragile constructs deeper in tcp_mark_head_lost. This is necessary for the next patch which must not have loopholes for cnt > packets check. As one can notice, readability got some improvements too because of this :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:31:38 -07:00
Pavel Emelyanov	23556323b2	[VLAN]: Fix egress priority mappings leak. These entries are allocated in vlan_dev_set_egress_priority, but are never released and leaks on vlan device removal. Drop these in vlan's ->uninit callback - after the device is brought down and everyone is notified about it is going to be unregistered. Found during testing vlan netnsization patchset. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-04 12:45:12 -07:00
Denis V. Lunev	84f59370c5	[IPV6]: Fix refcounting for anycast dst entries. Anycast DST entries allocated inside ipv6_dev_ac_inc are leaked when network device is stopped without removing IPv6 addresses from it. The bug has been observed in the reality on 2.6.18-rhel5 kernel. In the above case addrconf_ifdown marks all entries as obsolete and ip6_del_rt called from __ipv6_dev_ac_dec returns ENOENT. The referrence is not dropped. The fix is simple. DST entry should not keep referrence when stored in the FIB6 tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 13:33:00 -07:00
Denis V. Lunev	eb86757931	[IPV6]: inet6_dev on loopback should be kept until namespace stop. In the other case it will be destroyed when last address will be removed from lo inside a namespace. This will break IPv6 in several places. The most obvious one is ip6_dst_ifdown. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 13:31:53 -07:00
Denis V. Lunev	439e23857a	[IPV6]: Event type in addrconf_ifdown is mis-used. addrconf_ifdown is broken in respect to the usage of how parameter. This function is called with (event != NETDEV_DOWN) and (2) on the IPv6 stop. It the latter case inet6_dev from loopback device should be destroyed. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 13:30:17 -07:00
Herbert Xu	af2681828a	[ICMP]: Ensure that ICMP relookup maintains status quo The ICMP relookup path is only meant to modify behaviour when appropriate IPsec policies are in place and marked as requiring relookups. It is certainly not meant to modify behaviour when IPsec policies don't exist at all. However, due to an oversight on the error paths existing behaviour may in fact change should one of the relookup steps fail. This patch corrects this by redirecting all errors on relookup failures to the previous code path. That is, if the initial xfrm_lookup let the packet pass, we will stand by that decision should the relookup fail due to an error. This should be safe from a security point-of-view because compliant systems must install a default deny policy so the packet would'nt have passed in that case. Many thanks to Julian Anastasov for pointing out this error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 12:52:19 -07:00
Linus Torvalds	2f819ae881	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [VLAN]: Proc entry is not renamed when vlan device name changes. [IPV6]: Fix ICMP relookup error path dst leak [ATM] drivers/atm/iphase.c: compilation warning fix IPv6: do not create temporary adresses with too short preferred lifetime IPv6: only update the lifetime of the relevant temporary address bluetooth : __rfcomm_dlc_close lock fix bluetooth : use lockdep sub-classes for diffrent bluetooth protocol [ROSE/AX25] af_rose: rose_release() fix mac80211: correct use_short_preamble handling b43: Fix PCMCIA IRQ routing b43: Add DMA mapping failure messages mac80211: trigger ieee80211_sta_work after opening interface [LLC]: skb allocation size for responses [IP] UDP: Use SEQ_START_TOKEN. [NET]: Remove Documentation/networking/sk98lin.txt [ATM] atm/idt77252.c: Make 2 functions static [ATM]: Make atm/he.c:read_prom_byte() static [IPV6] MCAST: Ensure to check multicast listener(s). [LLC]: Kill llc_station_mac_sa symbol export. forcedeth: fix locking bug with netconsole ...	2008-04-02 07:46:18 -07:00
Pavel Emelyanov	802fb176d8	[VLAN]: Proc entry is not renamed when vlan device name changes. This may lead to situations, when each of two proc entries produce data for the other's device. Looks like a BUG, so this patch is for net-2.6. It will not apply to net-2.6.26 since dev->nd_net access is replaced with dev_net(dev) one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-02 00:08:01 -07:00
Herbert Xu	f32c5f2c38	[IPV6]: Fix ICMP relookup error path dst leak When we encounter an error while looking up the dst the second time we need to drop the first dst. This patch is pretty much the same as the one for IPv4. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-02 00:06:09 -07:00
Benoit Boissinot	eac55bf970	IPv6: do not create temporary adresses with too short preferred lifetime From RFC341: A temporary address is created only if this calculated Preferred Lifetime is greater than REGEN_ADVANCE time units. In particular, an implementation must not create a temporary address with a zero Preferred Lifetime. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-02 00:01:35 -07:00
Benoit Boissinot	c6fbfac2e6	IPv6: only update the lifetime of the relevant temporary address When receiving a prefix information from a routeur, only update the lifetimes of the temporary address associated with that prefix. Otherwise if one deprecated prefix is advertized, all your temporary addresses will become deprecated. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-02 00:00:58 -07:00
Dave Young	1905f6c736	bluetooth : __rfcomm_dlc_close lock fix Lockdep warning will be trigged while rfcomm connection closing. The locks taken in rfcomm_dev_add: rfcomm_dev_lock --> d->lock In __rfcomm_dlc_close: d->lock --> rfcomm_dev_lock (in rfcomm_dev_state_change) There's two way to fix it, one is in rfcomm_dev_add we first locking d->lock then the rfcomm_dev_lock The other (in this patch), remove the locking of d->lock for rfcomm_dev_state_change because just locking "d->state = BT_CLOSED;" is enough. [ 295.002046] ======================================================= [ 295.002046] [ INFO: possible circular locking dependency detected ] [ 295.002046] 2.6.25-rc7 #1 [ 295.002046] ------------------------------------------------------- [ 295.002046] krfcommd/2705 is trying to acquire lock: [ 295.002046] (rfcomm_dev_lock){-.--}, at: [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [ 295.002046] but task is already holding lock: [ 295.002046] (&d->lock){--..}, at: [<f899c533>] __rfcomm_dlc_close+0x43/0xd0 [rfcomm] [ 295.002046] [ 295.002046] which lock already depends on the new lock. [ 295.002046] [ 295.002046] [ 295.002046] the existing dependency chain (in reverse order) is: [ 295.002046] [ 295.002046] -> #1 (&d->lock){--..}: [ 295.002046] [<c0149b23>] check_prev_add+0xd3/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<c03d6b99>] _spin_lock+0x39/0x80 [ 295.002046] [<f89a01c0>] rfcomm_dev_add+0x240/0x360 [rfcomm] [ 295.002046] [<f89a047e>] rfcomm_create_dev+0x6e/0xe0 [rfcomm] [ 295.002046] [<f89a0823>] rfcomm_dev_ioctl+0x33/0x60 [rfcomm] [ 295.002046] [<f899facc>] rfcomm_sock_ioctl+0x2c/0x50 [rfcomm] [ 295.002046] [<c0363d38>] sock_ioctl+0x118/0x240 [ 295.002046] [<c0194196>] vfs_ioctl+0x76/0x90 [ 295.002046] [<c0194446>] do_vfs_ioctl+0x56/0x140 [ 295.002046] [<c0194569>] sys_ioctl+0x39/0x60 [ 295.002046] [<c0104faa>] syscall_call+0x7/0xb [ 295.002046] [<ffffffff>] 0xffffffff [ 295.002046] [ 295.002046] -> #0 (rfcomm_dev_lock){-.--}: [ 295.002046] [<c0149a84>] check_prev_add+0x34/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<c03d6639>] _read_lock+0x39/0x80 [ 295.002046] [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f899c548>] __rfcomm_dlc_close+0x58/0xd0 [rfcomm] [ 295.002046] [<f899d44f>] rfcomm_recv_ua+0x6f/0x120 [rfcomm] [ 295.002046] [<f899e061>] rfcomm_recv_frame+0x171/0x1e0 [rfcomm] [ 295.002046] [<f899e357>] rfcomm_run+0xe7/0x550 [rfcomm] [ 295.002046] [<c013c18c>] kthread+0x5c/0xa0 [ 295.002046] [<c0105c07>] kernel_thread_helper+0x7/0x10 [ 295.002046] [<ffffffff>] 0xffffffff [ 295.002046] [ 295.002046] other info that might help us debug this: [ 295.002046] [ 295.002046] 2 locks held by krfcommd/2705: [ 295.002046] #0: (rfcomm_mutex){--..}, at: [<f899e2eb>] rfcomm_run+0x7b/0x550 [rfcomm] [ 295.002046] #1: (&d->lock){--..}, at: [<f899c533>] __rfcomm_dlc_close+0x43/0xd0 [rfcomm] [ 295.002046] [ 295.002046] stack backtrace: [ 295.002046] Pid: 2705, comm: krfcommd Not tainted 2.6.25-rc7 #1 [ 295.002046] [<c0128a38>] ? printk+0x18/0x20 [ 295.002046] [<c014927f>] print_circular_bug_tail+0x6f/0x80 [ 295.002046] [<c0149a84>] check_prev_add+0x34/0x200 [ 295.002046] [<c0149ce5>] check_prevs_add+0x95/0xe0 [ 295.002046] [<c0149f6f>] validate_chain+0x23f/0x320 [ 295.002046] [<c014b7b1>] __lock_acquire+0x1c1/0x760 [ 295.002046] [<c014c349>] lock_acquire+0x79/0xb0 [ 295.002046] [<f89a090a>] ? rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<c03d6639>] _read_lock+0x39/0x80 [ 295.002046] [<f89a090a>] ? rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f89a090a>] rfcomm_dev_state_change+0x6a/0xd0 [rfcomm] [ 295.002046] [<f899c548>] __rfcomm_dlc_close+0x58/0xd0 [rfcomm] [ 295.002046] [<f899d44f>] rfcomm_recv_ua+0x6f/0x120 [rfcomm] [ 295.002046] [<f899e061>] rfcomm_recv_frame+0x171/0x1e0 [rfcomm] [ 295.002046] [<c014abd9>] ? trace_hardirqs_on+0xb9/0x130 [ 295.002046] [<c03d6e89>] ? _spin_unlock_irqrestore+0x39/0x70 [ 295.002046] [<f899e357>] rfcomm_run+0xe7/0x550 [rfcomm] [ 295.002046] [<c03d4559>] ? __sched_text_start+0x229/0x4c0 [ 295.002046] [<c0120000>] ? cpu_avg_load_per_task+0x20/0x30 [ 295.002046] [<f899e270>] ? rfcomm_run+0x0/0x550 [rfcomm] [ 295.002046] [<c013c18c>] kthread+0x5c/0xa0 [ 295.002046] [<c013c130>] ? kthread+0x0/0xa0 [ 295.002046] [<c0105c07>] kernel_thread_helper+0x7/0x10 [ 295.002046] ======================= Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-01 23:59:06 -07:00
Dave Young	68845cb2c8	bluetooth : use lockdep sub-classes for diffrent bluetooth protocol 'rfcomm connect' will trigger lockdep warnings which is caused by locking diffrent kinds of bluetooth sockets at the same time. So using sub-classes per AF_BLUETOOTH sub-type for lockdep. Thanks for the hints from dave jones. --- > From: Dave Jones <davej@codemonkey.org.uk> > Date: Thu, 27 Mar 2008 12:21:56 -0400 > > > Mar 27 08:10:57 localhost kernel: Pid: 3611, comm: obex-data-serve Not tainted 2.6.25-0.121.rc5.git4.fc9 #1 > > Mar 27 08:10:57 localhost kernel: [__lock_acquire+2287/3089] __lock_acquire+0x8ef/0xc11 > > Mar 27 08:10:57 localhost kernel: [sched_clock+8/11] ? sched_clock+0x8/0xb > > Mar 27 08:10:57 localhost kernel: [lock_acquire+106/144] lock_acquire+0x6a/0x90 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] ? l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [lock_sock_nested+182/198] lock_sock_nested+0xb6/0xc6 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] ? l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [security_socket_post_create+22/27] ? security_socket_post_create+0x16/0x1b > > Mar 27 08:10:57 localhost kernel: [__sock_create+388/472] ? __sock_create+0x184/0x1d8 > > Mar 27 08:10:57 localhost kernel: [<f8bd9321>] l2cap_sock_bind+0x29/0x108 [l2cap] > > Mar 27 08:10:57 localhost kernel: [kernel_bind+10/13] kernel_bind+0xa/0xd > > Mar 27 08:10:57 localhost kernel: [<f8dad3d7>] rfcomm_dlc_open+0xc8/0x294 [rfcomm] > > Mar 27 08:10:57 localhost kernel: [lock_sock_nested+187/198] ? lock_sock_nested+0xbb/0xc6 > > Mar 27 08:10:57 localhost kernel: [<f8dae18c>] rfcomm_sock_connect+0x8b/0xc2 [rfcomm] > > Mar 27 08:10:57 localhost kernel: [sys_connect+96/125] sys_connect+0x60/0x7d > > Mar 27 08:10:57 localhost kernel: [__lock_acquire+1370/3089] ? __lock_acquire+0x55a/0xc11 > > Mar 27 08:10:57 localhost kernel: [sys_socketcall+140/392] sys_socketcall+0x8c/0x188 > > Mar 27 08:10:57 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb --- Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-01 23:58:35 -07:00
Jarek Poplawski	4965291acf	[ROSE/AX25] af_rose: rose_release() fix rose_release() doesn't release sockets properly, e.g. it skips sock_orphan(), so OOPSes are triggered in sock_def_write_space(), which was observed especially while ROSE skbs were kfreed from ax25_frames_acked(). There is also sock_hold() and lock_sock() added - similarly to ax25_release(). Thanks to Bernard Pidoux for substantial help in debugging this problem. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Reported-and-tested-by: Bernard Pidoux <bpidoux@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-01 23:56:17 -07:00
Vladimir Koutny	d43c7b37ad	mac80211: correct use_short_preamble handling ERP IE bit for preamble mode is 0 for short and 1 for long, not the other way around. This fixes the value reported to the driver via bss_conf->use_short_preamble field. Signed-off-by: Vladimir Koutny <vlado@ksp.sk> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-04-01 15:44:08 -04:00
Jan Niehusmann	64f851e410	mac80211: trigger ieee80211_sta_work after opening interface ieee80211_sta_work is disabled while network interface is down. Therefore, if you configure wireless parameters before bringing the interface up, these configurations are not yet effective and association fails. A workaround from userspace is calling a command like 'iwconfig wlan0 ap any' after the interface is brought up. To fix this behaviour, trigger execution of ieee80211_sta_work from ieee80211_open when in STA or IBSS mode. Signed-off-by: Jan Niehusmann <jan@gondor.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-04-01 15:44:07 -04:00
Joonwoo Park	f83f1768f8	[LLC]: skb allocation size for responses Allocate the skb for llc responses with the received packet size by using the size adjustable llc_frame_alloc. Don't allocate useless extra payload. Cleanup magic numbers. So, this fixes oops. Reported by Jim Westfall: kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0 kernel: ------------[ cut here ]------------ kernel: kernel BUG at net/core/skbuff.c:95! Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 21:02:47 -07:00
YOSHIFUJI Hideaki	b50660f1fe	[IP] UDP: Use SEQ_START_TOKEN. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:38:15 -07:00
YOSHIFUJI Hideaki	4c7966b86b	[IPV6] MCAST: Ensure to check multicast listener(s). In ip6_mc_input(), we need to check whether we have listener(s) for the packet. After commit `ae7bf20a63`, all packets for multicast destinations are delivered to upper layer if IFF_PROMISC or IFF_ALLMULTI is set. In fact, bug was rather ancient; the original (before the commit) intent of the dev->flags check was to skip the ipv6_chk_mcast_addr() call, assuming L2 filters packets appropriately, but it was even not true. Let's explicitly check our multicast list. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:30:45 -07:00
Al Viro	91e916cffe	net/rxrpc trivial annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:23 -07:00
David S. Miller	9f09243890	[LLC]: Kill llc_station_mac_sa symbol export. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 19:51:40 -07:00
David S. Miller	e8e16b706e	[INET]: inet_frag_evictor() must run with BH disabled Based upon a lockdep trace from Dave Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 17:30:18 -07:00
Joonwoo Park	a5a04819c5	[LLC]: station source mac address kill unnecessary llc_station_mac_sa. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:28:36 -07:00
Joonwoo Park	27785d83e4	[LLC]: bogus llc packet length discard llc packet which has bogus packet length. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:27:33 -07:00
Herbert Xu	2ba2506ca7	[NET]: Add preemption point in qdisc_run The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:25:26 -07:00
Rusty Russell	32aced7509	[NET]: Don't send ICMP_FRAG_NEEDED for GSO packets Commit `9af3912ec9` ("[NET] Move DF check to ip_forward") added a new check to send ICMP fragmentation needed for large packets. Unlike the check in ip_finish_output(), it doesn't check for GSO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:23:19 -07:00
Robert P. J. Day	d5fb2962c6	bluetooth: replace deprecated RW_LOCK_UNLOCKED macros The older RW_LOCK_UNLOCKED macros defeat lockdep state tracing so replace them with the newer __RW_LOCK_UNLOCKED macros. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:17:38 -07:00
Andrew Morton	3387b804d8	net/9p/trans_fd.c:p9_trans_fd_init(): module_init functions should return 0 on success Mar 23 09:06:31 opensuse103 kernel: Installing 9P2000 support Mar 23 09:06:31 opensuse103 kernel: sys_init_module: '9pnet_fd'->init suspiciously returned 1, it should follow 0/-E convention Mar 23 09:06:31 opensuse103 kernel: sys_init_module: loading module anyway... Mar 23 09:06:31 opensuse103 kernel: Pid: 5323, comm: modprobe Not tainted 2.6.25-rc6-git7-default #1 Mar 23 09:06:31 opensuse103 kernel: [<c013c253>] sys_init_module+0x172b/0x17c9 Mar 23 09:06:31 opensuse103 kernel: [<c0108a6a>] sys_mmap2+0x62/0x77 Mar 23 09:06:31 opensuse103 kernel: [<c01059c4>] sysenter_past_esp+0x6d/0xa9 Mar 23 09:06:31 opensuse103 kernel: ======================= Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@opteron.(none)> Cc: David S. Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: <devzero@web.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-28 14:45:22 -07:00
Patrick McHardy	3480c63bdf	[LLC]: Restrict LLC sockets to root LLC currently allows users to inject raw frames, including IP packets encapsulated in SNAP. While Linux doesn't handle IP over SNAP, other systems do. Restrict LLC sockets to root similar to packet sockets. [ Modified Patrick's patch to use CAP_NEW_RAW --DaveM ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 20:28:10 -07:00
Denis V. Lunev	8eeee8b152	[NETFILTER]: Replate direct proc_fops assignment with proc_create call. This elliminates infamous race during module loading when one could lookup proc entry without proc_fops assigned. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 16:55:53 -07:00
Thomas Graf	920fc941a9	[ESP]: Ensure IV is in linear part of the skb to avoid BUG() due to OOB access ESP does not account for the IV size when calling pskb_may_pull() to ensure everything it accesses directly is within the linear part of a potential fragment. This results in a BUG() being triggered when the both the IPv4 and IPv6 ESP stack is fed with an skb where the first fragment ends between the end of the esp header and the end of the IV. This bug was found by Dirk Nehring <dnehring@gmx.net> . Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 16:08:03 -07:00
Linus Torvalds	ee20a0dd54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits) [IPSEC]: Fix BEET output [ICMP]: Dst entry leak in icmp_send host re-lookup code (v2). [AX25]: Remove obsolete references to BKL from TODO file. [NET]: Fix multicast device ioctl checks [IRDA]: Store irnet_socket termios properly. [UML]: uml-net: don't set IFF_ALLMULTI in set_multicast_list [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device netxen, phy/marvell, skge: minor checkpatch fixes S2io: Handle TX completions on the same CPU as the sender for MIS-X interrupts b44: Truncate PHY address skge napi->poll() locking bug rndis_host: fix oops when query for OID_GEN_PHYSICAL_MEDIUM fails cxgb3: Fix lockdep problems with sge.reg_lock ehea: Fix IPv6 support dm9000: Support promisc and all-multi modes dm9601: configure MAC to drop invalid (crc/length) packets dm9601: add Hirose USB-100 device ID Marvell PHY m88e1111 driver fix netxen: fix rx dropped stats netxen: remove low level tx lock ...	2008-03-26 18:35:50 -07:00
Herbert Xu	732c8bd590	[IPSEC]: Fix BEET output The IPv6 BEET output function is incorrectly including the inner header in the payload to be protected. This causes a crash as the packet doesn't actually have that many bytes for a second header. The IPv4 BEET output on the other hand is broken when it comes to handling an inner IPv6 header since it always assumes an inner IPv4 header. This patch fixes both by making sure that neither BEET output function touches the inner header at all. All access is now done through the protocol-independent cb structure. Two new attributes are added to make this work, the IP header length and the IPv4 option length. They're filled in by the inner mode's output function. Thanks to Joakim Koskela for finding this problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 16:51:09 -07:00
Tom Tucker	c8237a5fce	SVCRDMA: Check num_sge when setting LAST_CTXT bit The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly when the last chunk in the read-list spanned multiple pages. This resulted in a kernel panic when the wrong context was used to build the RPC iovec page list. RDMA_READ is used to fetch RPC data from the client for NFS_WRITE requests. A scatter-gather is used to map the advertised client side buffer to the server-side iovec and associated page list. WR contexts are used to convey which scatter-gather entries are handled by each WR. When the write data is large, a single RPC may require multiple RDMA_READ requests so the contexts for a single RPC are chained together in a linked list. The last context in this list is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes, the CQ handler code can enqueue the RPC for processing. The code in rdma_read_xdr was setting this bit on the last two contexts on this list when the last read-list chunk spanned multiple pages. This caused the svc_rdma_recvfrom logic to incorrectly build the RPC and caused the kernel to crash because the second-to-last context doesn't contain the iovec page list. Modified the condition that sets this bit so that it correctly detects the last context for the RPC. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Tested-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-26 11:24:19 -07:00
Pavel Emelyanov	7c0ecc4c4f	[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2). Commit `8b7817f3a9` ([IPSEC]: Add ICMP host relookup support) introduced some dst leaks on error paths: the rt pointer can be forgotten to be put. Fix it bu going to a proper label. Found after net namespace's lo refused to unregister :) Many thanks to Den for valuable help during debugging. Herbert pointed out, that xfrm_lookup() will put the rtable in case of error itself, so the first goto fix is redundant. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:27:09 -07:00
Robert P. J. Day	5c2e2e239e	[AX25]: Remove obsolete references to BKL from TODO file. Given that there are no apparent calls to lock_kernel() or unlock_kernel() under net/ax25, delete the TODO reference related to that. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:14:38 -07:00
Patrick McHardy	61ee6bd487	[NET]: Fix multicast device ioctl checks SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 02:12:11 -07:00
David S. Miller	8c7230f781	[IRDA]: Store irnet_socket termios properly. It should be a "struct ktermios" not a "struct termios". Based upon a build warning reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 00:55:50 -07:00
Patrick McHardy	0ed21b321a	[VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity or dev_change_flags. Setting it directly causes two unwanted effects: - the next dev_change_flags call will notice a difference between dev->gflags and the actual flags, enable promisc/allmulti mode and incorrectly update dev->gflags - this keeps the underlying device in promisc/allmulti mode until the VLAN device is deleted Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-26 00:15:17 -07:00
Kazunori MIYAZAWA	df9dcb4588	[IPSEC]: Fix inter address family IPsec tunnel handling. Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:51:51 -07:00
Pavel Emelyanov	fa86d322d8	[NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3). Proxy neighbors do not have any reference counting, so any caller of pneigh_lookup (unless it's a netlink triggered add/del routine) should _not_ perform any actions on the found proxy entry. There's one exception from this rule - the ipv6's ndisc_recv_ns() uses found entry to check the flags for NTF_ROUTER. This creates a race between the ndisc and pneigh_delete - after the pneigh is returned to the caller, the nd_tbl.lock is dropped and the deleting procedure may proceed. One of the fixes would be to add a reference counting, but this problem exists for ndisc only. Besides such a patch would be too big for -rc4. So I propose to introduce a __pneigh_lookup() which is supposed to be called with the lock held and use it in ndisc code to check the flags on alive pneigh entry. Changes from v2: As David noticed, Exported the __pneigh_lookup() to ipv6 module. The checkpatch generates a warning on it, since the EXPORT_SYMBOL does not follow the symbol itself, but in this file all the exports come at the end, so I decided no to break this harmony. Changes from v1: Fixed comments from YOSHIFUJI - indentation of prototype in header and the pndisc_check_router() name - and a compilation fix, pointed by Daniel - the is_routed was (falsely) considered as uninitialized by gcc. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-24 14:48:59 -07:00
Linus Torvalds	ca1a6ba57c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: sch_htb: fix "too many events" situation connector: convert to single-threaded workqueue [ATM]: When proc_create() fails, do some error handling work and return -ENOMEM. [SUNGEM]: Fix NAPI assertion failure. BNX2X: prevent ethtool from setting port type [9P] net/9p/trans_fd.c: remove unused variable [IPV6] net/ipv6/ndisc.c: remove unused variable [IPV4] fib_trie: fix warning from rcu_assign_poinger [TCP]: Let skbs grow over a page on fast peers [DLCI]: Fix tiny race between module unload and sock_ioctl. [SCTP]: Fix build warnings with IPV6 disabled. [IPV4]: Fix null dereference in ip_defrag	2008-03-24 13:07:24 -07:00
Roland Dreier	d3073779f8	SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters The iWARP protocol limits RDMA read requests to a single scatter entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to limit the sge_count for RDMA read requests to 1, but the code to do that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a preprocessor #define, so the #ifdef'ed code is never compiled. In my test of a kernel build with -j8 on an NFS/RDMA mount, this problem eventually leads to trouble starting with: svcrdma: Error posting send = -22 svcrdma : RDMA_READ error = -22 and things go downhill from there. The trivial fix is to delete the #ifdef guard. The check seems to be a remnant of when the NFS/RDMA code was not merged and needed to compile against multiple kernel versions, although I don't think it ever worked as intended. In any case now that the code is upstream there's no need to test whether the RDMA_TRANSPORT_IWARP constant is defined or not. Without this patch, my kernel build on an NFS/RDMA mount using NetEffect adapters quickly and 100% reproducibly failed with an error like: ld: final link failed: Software caused connection abort With the patch applied I was able to complete a kernel build on the same setup. (Tom Tucker says this is "actually an _ancient_ remnant when it had to compile against iWARP vs. non-iWARP enabled OFA trees.") Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-24 11:25:25 -07:00
Martin Devera	8f3ea33a50	sch_htb: fix "too many events" situation HTB is event driven algorithm and part of its work is to apply scheduled events at proper times. It tried to defend itself from livelock by processing only limited number of events per dequeue. Because of faster computers some users already hit this hardcoded limit. This patch limits processing up to 2 jiffies (why not 1 jiffie ? because it might stop prematurely when only fraction of jiffie remains). Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:00:38 -07:00
Wang Chen	dbee0d3f46	[ATM]: When proc_create() fails, do some error handling work and return -ENOMEM. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 21:45:36 -07:00
Julia Lawall	53a6201fdf	[9P] net/9p/trans_fd.c: remove unused variable The variable cb is initialized but never used otherwise. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ type T; identifier i; constant C; @@ ( extern T i; \| - T i; <+... when != i - i = C; ...+> ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 18:05:33 -07:00
Julia Lawall	421f099bc5	[IPV6] net/ipv6/ndisc.c: remove unused variable The variable hlen is initialized but never used otherwise. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ type T; identifier i; constant C; @@ ( extern T i; \| - T i; <+... when != i - i = C; ...+> ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 18:04:16 -07:00
Stephen Hemminger	6440cc9e0f	[IPV4] fib_trie: fix warning from rcu_assign_poinger This gets rid of a warning caused by the test in rcu_assign_pointer. I tried to fix rcu_assign_pointer, but that devolved into a long set of discussions about doing it right that came to no real solution. Since the test in rcu_assign_pointer for constant NULL would never succeed in fib_trie, just open code instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 17:59:58 -07:00
Herbert Xu	69d1506731	[TCP]: Let skbs grow over a page on fast peers While testing the virtio-net driver on KVM with TSO I noticed that TSO performance with a 1500 MTU is significantly worse compared to the performance of non-TSO with a 16436 MTU. The packet dump shows that most of the packets sent are smaller than a page. Looking at the code this actually is quite obvious as it always stop extending the packet if it's the first packet yet to be sent and if it's larger than the MSS. Since each extension is bound by the page size, this means that (given a 1500 MTU) we're very unlikely to construct packets greater than a page, provided that the receiver and the path is fast enough so that packets can always be sent immediately. The fix is also quite obvious. The push calls inside the loop is just an optimisation so that we don't end up doing all the sending at the end of the loop. Therefore there is no specific reason why it has to do so at MSS boundaries. For TSO, the most natural extension of this optimisation is to do the pushing once the skb exceeds the TSO size goal. This is what the patch does and testing with KVM shows that the TSO performance with a 1500 MTU easily surpasses that of a 16436 MTU and indeed the packet sizes sent are generally larger than 16436. I don't see any obvious downsides for slower peers or connections, but it would be prudent to test this extensively to ensure that those cases don't regress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 15:47:05 -07:00

1 2 3 4 5 ...

7855 Commits