linux

Commit Graph

Author	SHA1	Message	Date
Patrick McHarrdy	3c158f7f57	[NETFILTER]: nf_conntrack: fix helper module unload races When a helper module is unloaded all conntracks refering to it have their helper pointer NULLed out, leading to lots of races. In most places this can be fixed by proper use of RCU (they do already check for != NULL, but in a racy way), additionally nf_conntrack_expect_related needs to bail out when no helper is present. Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy and not worth fixing. Signed-off-by: Patrick McHarrdy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:26 -07:00
Patrick McHardy	ef7c79ed64	[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:10 -07:00
David S. Miller	14a49e1fd2	[TCP] tcp_probe: Attach printf attribute properly to printl(). GCC doesn't like the way Stephen initially did it: net/ipv4/tcp_probe.c:83: warning: empty declaration Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:09 -07:00
Eric Dumazet	274707cff9	[TCP]: Use LIMIT_NETDEBUG in tcp_retransmit_timer(). LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0 >/proc/sys/net/core/warnings). The "TCP: Treason uncloaked!" message can use this facility. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:08 -07:00
Herbert Xu	71e27da961	[IPV4]: Restore old behaviour of default config values Previously inet devices were only constructed when addresses are added (or rarely in ipmr). Therefore the default config values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default config values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:39:26 -07:00
Herbert Xu	31be308541	[IPV4]: Add default config support after inetdev_init Previously once inetdev_init has been called on a device any changes made to ipv4_devconf_dflt would have no effect on that device's configuration. This creates a problem since we have moved the point where inetdev_init is called from when an address is added to where the device is registered. This patch is the first half of a set that tries to mimic the old behaviour while still calling inetdev_init. It propagates any changes to ipv4_devconf_dflt to those devices that have not had the corresponding attribute set. The next patch will forcibly set all values at the point where inetdev_init was previously called. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:39:19 -07:00
Herbert Xu	42f811b8bc	[IPV4]: Convert IPv4 devconf to an array This patch converts the ipv4_devconf config members (everything except sysctl) to an array. This allows easier manipulation which will be needed later on to provide better management of default config values. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:39:13 -07:00
Herbert Xu	8d76527e72	[IPV4]: Only panic if inetdev_init fails for loopback When I made the inetdev_init call work on all devices I incorrectly left in the panic call as well. It is obviously undesirable to panic on an allocation failure for a normal network device. This patch moves the panic call under the loopback if clause. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:39:03 -07:00
Patrick McHardy	f0e48dbfc5	[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:38:51 -07:00
Patrick McHardy	6e1d91039b	[ICMP]: Fix icmp_errors_use_inbound_ifaddr sysctl Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is sent after the packet passed through ip_output(), an address from the outgoing interface is chosen as ICMP source address since skb->dev doesn't point to the incoming interface anymore. Fix this by doing an interface lookup on rt->dst.iif and using that device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:51 -07:00
Wei Dong	584bdf8cbd	[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP Signed-off-by: Wei Dong <weidong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:50 -07:00
Ilpo Järvinen	6418204f91	[TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) The code used to ignore GSO completely, passing either way too small or zero pkts_acked when GSO skb or part of it got ACKed. In addition, there is no need to calculate the value in the loop but simple arithmetics after the loop is sufficient. There is no need to handle SYN case specially because congestion control modules are not yet initialized when FLAG_SYN_ACKED is set. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:48 -07:00
Mark Glines	3f196eb519	[TCP]: Use default 32768-61000 outgoing port range in all cases. This diff changes the default port range used for outgoing connections, from "use 32768-61000 in most cases, but use N-4999 on small boxes (where N is a multiple of 1024, depending on just how small the box is)" to just "use 32768-61000 in all cases". I don't believe there are any drawbacks to this change, and it keeps outgoing connection ports farther away from the mess of IANA-registered ports. Signed-off-by: Mark Glines <mark@glines.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:43 -07:00
Stephen Hemminger	67403754bc	[TCP] tcp_probe: use GCC printf attribute The function in tcp_probe is printf like, use GCC to check the args. Sighed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:37 -07:00
Sangtae Ha	63313494c4	[TCP] tcp_probe: a trivial fix for mismatched number of printl arguments. Just a fix to correct the number of printl arguments. Now, srtt is logging correctly. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:36 -07:00
Pavel Emelianov	e4fd5da39f	[TCP]: Consolidate checking for tcp orphan count being too big. tcp_out_of_resources() and tcp_close() perform the same checking of number of orphan sockets. Move this code into common place. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:34 -07:00
David S. Miller	ddc31ce311	[IPV4]: Kill references to bogus non-existent CONFIG_IP_NOSIOCRT Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:29 -07:00
Kazunori MIYAZAWA	f282d45cb4	[IPSEC]: Fix panic when using inter address familiy IPsec on loopback. Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:28 -07:00
David S. Miller	14e50e57ae	[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 18:17:54 -07:00
Jing Min Zhao	1ff75ed254	[NETFILTER]: nf_nat_h323: call set_h225_addr instead of set_h225_addr_hook They're the same. Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:44:40 -07:00
Patrick McHardy	25b86e0546	[NETFILTER]: nf_conntrack_ftp: fix newline sequence number calculation When the packet size is changed by the FTP NAT helper, the connection tracking helper adjusts the sequence number of the newline character by the size difference. This is wrong because NAT sequence number adjustment happens after helpers are called, so the unadjusted number is compared to the already adjusted one. Based on report by YU, Haitao <yuhaitao@tsinghua.org.cn> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:41:50 -07:00
Milan Kocian	b8f5583135	[RTNETLINK]: Fix sending netlink message when replace route. When you replace route via ip r r command the netlink multicast message is not send. This patch corrects it. NL message is sent with NLM_F_REPLACE flag. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8320 Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:53 -07:00
Jan Engelhardt	a6938a1e0e	[IPVS]: Use menuconfig objects. Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:47 -07:00
Patrick McHardy	d8cf27287a	[IPV4]: icmp: fix crash with sysctl_icmp_errors_use_inbound_ifaddr When icmp_send is called on the local output path before the packet hits ip_output, skb->dev is not set, causing a crash when sysctl_icmp_errors_use_inbound_ifaddr is set. This can happen with the netfilter REJECT target or IPsec tunnels. Let routing decide the ICMP source address in that case, since the packet is locally generated there is no inbound interface and the sysctl should not apply. The option actually seems to be unfixable broken, on the path after ip_output() skb->dev points to the outgoing device and we don't know the incoming device anymore, so its going to do the absolute wrong thing and pick the address of the outgoing interface. Add a comment about this. Reported by Curtis Doty <Curtis@GreenKey.net>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-19 14:44:15 -07:00
Patrick McHardy	3ad2a6fb6b	[NETFILTER]: nf_conntrack_ipv4: fix incorrect #ifdef config name The option is named CONFIG_NF_NAT not CONFIG_IP_NF_NAT. Remove the ifdef completely since helpers also expect defragmented packet even without NAT. Noticed by Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-19 14:24:16 -07:00
Ilpo Järvinen	580e572a4a	[TCP] FRTO: Prevent state inconsistency in corner cases State could become inconsistent in two cases: 1) Userspace disabled FRTO by tuning sysctl when one of the TCP flows was in the middle of FRTO algorithm (and then RTO is again triggered) 2) SACK reneging occurs during FRTO algorithm A simple solution is just to abort the previous FRTO when such obscure condition occurs... Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-19 13:56:57 -07:00
Ilpo Järvinen	463236557d	[TCP] FRTO: Add missing ECN CWR sending to one of the responses The conservative spurious RTO response did not queue CWR even though the sending rate was lowered. Whenever reduction happens regardless of reason, CWR should be sent (forgetting to send it is not very fatal though). A better approach would be to queue CWR when one of the sending rate reducing responses (rate-halving one or this conservative response) is used already at RTO. Doing that would allow CWR to be sent along with the two new data segments that are sent during FRTO. However, it's a bit "racy" because userland could tune the response sysctl to a more aggressive one in between. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-19 13:56:23 -07:00
David S. Miller	f6c5d736af	[IPV4]: Remove IPVS icmp hack from route.c for now. Revert: `2d771cd86d` This is dangerous if enabled and a better solution to the problem is being worked on. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-18 02:07:50 -07:00
Dave Jones	d739437207	[IPV4]: Correct rp_filter help text. As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015 The helptext implies that this is on by default. This may be true on some distros (Fedora/RHEL have it enabled in /etc/sysctl.conf), but the kernel defaults to it off. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-17 15:02:21 -07:00
David S. Miller	2ff011efa4	[TCP]: TCP_CONG_YEAH requires TCP_CONG_VEGAS These two congestion control modules share code. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-17 14:20:32 -07:00
Stephen Hemminger	a02ba04166	[TCP] slow start: Make comments and code logic clearer. Add more comments to describe our version of tcp_slow_start(). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-17 14:20:31 -07:00
Mitsuru Chinen	d831666e98	[IPV4] SNMP: Display new statistics at /proc/net/netstat This displays the statistics specified in the updated IP-MIB RFC (RFC4293) in /proc/net/netstat. The reason why these are not displayed in /proc/net/snmp is that some existing utilities are developed under the assumption which ipstat items in /proc/net/snmp is unchanged. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-14 03:07:30 -07:00
Patrick McHardy	802169a4b0	[NETFILTER]: iptable_raw: ignore short packets sent by SOCK_RAW sockets iptables matches and targets expect packets to have at least a full IP header and a valid header length. Ignore packets sent through raw sockets for which this isn't true as in the other tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:59 -07:00
Patrick McHardy	4a176c1a61	[NETFILTER]: iptable_{filter,mangle}: more descriptive "happy cracking" message Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:59 -07:00
Yasuyuki Kozakai	ba4c7cbadd	[NETFILTER]: nf_nat: remove unused argument of function allocating binding nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed do not use the argument 'info', which is actually ct->nat.info. If they are necessary to access it again, we can use the argument 'ct' instead. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:44 -07:00
Patrick McHardy	3c2ad469c3	[NETFILTER]: Clean up table initialization - move arp_tables initial table structure definitions to arp_tables.h similar to ip_tables and ip6_tables - use C99 initializers - use initializer macros where possible Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:43 -07:00
David S. Miller	fc038410b4	[UDP]: Fix AF-specific references in AF-agnostic code. __udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:22 -07:00
Linus Torvalds	9a9136e270	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.	2007-05-09 12:54:17 -07:00
Oleg Nesterov	28e53bddf8	unify flush_work/flush_work_keventd and rename it to cancel_work_sync flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	c214b2cc5f	ipvs: flush defense_work before module unload net/ipv4/ipvs/ip_vs_core.c module_exit ip_vs_cleanup ip_vs_control_cleanup cancel_rearming_delayed_work // done This is unsafe. The module may be unloaded and the memory may be freed while defense_work's handler is still running/preempted. Do flush_work(&defense_work.work) after cancel_rearming_delayed_work(). Alternatively, we could add flush_work() to cancel_rearming_delayed_work(), but note that we can't change cancel_delayed_work() in the same manner because it may be called from atomic context. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Michael Opdenacker	59c51591a0	Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:57:56 +02:00
David Sterba	3dde6ad8fc	Fix trivial typos in Kconfig* files Fix several typos in help text in Kconfig* files. Signed-off-by: David Sterba <dave@jikos.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:12:20 +02:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Srinivas Aji	b40b4f79ce	[TCP]: zero out rx_opt in tcp_disconnect() When the server drops its connection, NFS client reconnects using the same socket after disconnecting. If the new connection's SYN,ACK doesn't contain the TCP timestamp option and the old connection's did, tp->tcp_header_len is recomputed assuming no timestamp header but tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options() adds in a timestamp option past the end of the allocated TCP header, overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[], overwriting skb_shinfo(skb) causing a crash soon after. (The issue was debugged from such a crash.) Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK packet but not reset on disconnect, since they are zeroed out at initialization. The patch zeroes out the entire tp->rx_opt struct in tcp_disconnect() to avoid this sort of problem. Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 17:32:28 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Ilpo Järvinen	03fba04796	[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start Reuse limited slow-start (RFC3742) included into tcp_cong instead of having another implementation in High Speed TCP. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 13:28:35 -07:00
Herbert Xu	cfd6c38096	[NETFILTER]: sip: Fix RTP address NAT I needed to use this recently to talk to a Cisco server. In my case I only did SNAT while the Cisco server used a different address for RTP traffic than the one for SIP. I discovered that nf_nat_sip NATed the RTP address to the SIP one which was unnecessary but OK. However, in doing so it did not DNAT the destination address on the RTP traffic to the Cisco back to the original RTP address. This patch corrects this by noting down the RTP address and using it when the expectation fires. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:35:31 -07:00
Jorge Boncompte	c2a1910b06	[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack and nat modules to a 2.4.32 kernel I noticed that the gre_key function returns a wrong pointer to the GRE key of a version 0 packet thus corrupting the packet payload. The intended behaviour for GREv0 packets is to act like nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the offending functions (not used anymore) and modified the nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets. Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:34:42 -07:00
Patrick McHardy	327850070b	[NETFILTER]: ipt_DNAT: accept port randomization option Also accept the --random option for DNAT to allow randomly selecting a destination port from the given range. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:34:03 -07:00
Robert P. J. Day	825e7d45cf	[TCP]: Delete unused header file net/ipv4/tcp_yeah.h. Delete the apparently unused header file net/ipv4/tcp_yeah.h. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:13:35 -07:00
David S. Miller	de34ed91c4	[UDP]: Do not allow specific bind when wildcard bind exists. When allocating local ports, do not allow a bind to a port with a specific local address when a bind to that port with a wildcard local address already exists. Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 14:51:58 -07:00
David S. Miller	b7b5f487ab	[IPV4] UDP: Fix endianness bugs in hashing changes. I accidently applied an earlier version of Eric Dumazet's patch, from March 21st. His version from March 30th didn't have these bugs, so this just interdiffs to the correct patch. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 13:35:29 -07:00
Mitsuru Chinen	80787ebc2b	[IPV4] SNMP: Support OutMcastPkts and OutBcastPkts A transmitted IP multicast datagram should be counted as OutMcastPkts. By the same token, a transmitted IP broadcast datagram should be counted as OutBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:32 -07:00
Mitsuru Chinen	5506b54b36	[IPV4] SNMP: Support InMcastPkts and InBcastPkts A received IP multicast datagram should be counted as InMcastPkts. By the same token, a received IP broadcast datagram should be counted as InBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:29 -07:00
Mitsuru Chinen	704aed53b4	[IPV4] SNMP: Support InTruncatedPkts An IP datagram which is being discarded because the datagram frame didn't carry enough data should be counted as InTruncatedPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:26 -07:00
Mitsuru Chinen	e91a47ebb1	[IPV4] SNMP: Support InNoRoutes An IP datagram which is being discarded because of no routes in the forwarding path should be counted as InNoRoutes. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:22 -07:00
Ilpo Järvinen	d551e4541d	[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:16 -07:00
Ilpo Järvinen	575ee7140d	[TCP] FRTO: Delay skb available check until it's mandatory No new data is needed until the first ACK comes, so no need to check for application limitedness until then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:12 -07:00
Eric Dumazet	6aaf47fa48	[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo Some people want to have many UDP sockets, binded to a single port but many different addresses. We currently hash all those sockets into a single chain. Processing of incoming packets is very expensive, because the whole chain must be examined to find the best match. I chose in this patch to hash UDP sockets with a hash function that take into account both their port number and address : This has a drawback because we need two lookups : one with a given address, one with a wildcard (null) address. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:26:00 -07:00
Gerrit Renker	65bb723c95	[TCP]: Update references in two old comments This updates references to drafts in comments which must be about 10 years old. Internet draft draft-ietf-tcpimpl-prob-03.txt expired in 1998 and was replaced by RFC 2525 in March 1999. Section 3.10 of the draft maps almost identically into section 2.17 of RFC 2525: both are entitled "Failure to RST on close with data pending", the differences in text body amount to a typo and minor sentence change. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:21:46 -07:00
Linus Torvalds	a205752d1a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: selinux: preserve boolean values across policy reloads selinux: change numbering of boolean directory inodes in selinuxfs selinux: remove unused enumeration constant from selinuxfs selinux: explicitly number all selinuxfs inodes selinux: export initial SID contexts via selinuxfs selinux: remove userland security class and permission definitions SELinux: move security_skb_extlbl_sid() out of the security server MAINTAINERS: update selinux entry SELinux: rename selinux_netlabel.h to netlabel.h SELinux: extract the NetLabel SELinux support from the security server NetLabel: convert a BUG_ON in the CIPSO code to a runtime check NetLabel: cleanup and document CIPSO constants	2007-04-27 10:47:29 -07:00
Sergey Vlasov	912a41a4ab	[IPV4] nl_fib_lookup: Initialise res.r before fib_res_put(&res) When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup() needs to initialize the res.r field before fib_res_put(&res) - unlike fib_lookup(), a direct call to ->tb_lookup does not set this field. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 02:17:19 -07:00
Paul Moore	128c6b6cbf	NetLabel: convert a BUG_ON in the CIPSO code to a runtime check This patch changes a BUG_ON in the CIPSO code to a runtime check. It should also increase the readability of the code as it replaces an unexplained constant with a well defined macro. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2007-04-26 01:35:47 -04:00
Paul Moore	f998e8cb52	NetLabel: cleanup and document CIPSO constants This patch collects all of the CIPSO constants and puts them in one place; it also documents each value explaining how the value is derived. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2007-04-26 01:35:45 -04:00
YOSHIFUJI Hideaki	5056a1ef9e	[IPV4] IP_GRE: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:56 -07:00
YOSHIFUJI Hideaki	87d1a164df	[IPV4] IPIP: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:55 -07:00
Herbert Xu	5e0f04351d	[IPV4]: Consolidate common SNMP code This patch moves the SNMP code shared between IPv4/IPv6 from proc.c into net/ipv4/af_inet.c. This makes sense because these functions aren't specific to /proc. As a result we can again skip proc.o if /proc is disabled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:51 -07:00
YOSHIFUJI Hideaki	bb7ec6dfb5	[IPV4]: Fix build without procfs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:50 -07:00
YOSHIFUJI Hideaki	84299b3bc4	[TCP]: Fix linkage errors on i386. To avoid raw division, use ktime_to_timeval() to get usec. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:49 -07:00
Stephen Hemminger	7752237e9f	[TCP] TCP YEAH: Use vegas dont copy it. Rather than using a copy of vegas code, the YEAH code should just have it exported so there is common code. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:46 -07:00
Stephen Hemminger	164891aadf	[TCP]: Congestion control API update. Do some simple changes to make congestion control API faster/cleaner. * use ktime_t rather than timeval * merge rtt sampling into existing ack callback this means one indirect call versus two per ack. * use flags bits to store options/settings Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:45 -07:00
Stephen Hemminger	65d1b4a7e7	[TCP]: TCP Illinois update. This version more closely matches the paper, and fixes several math errors. The biggest difference is that it updates alpha/beta once per RTT Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:44 -07:00
Ilpo Järvinen	9e412ba763	[TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...) This is (mostly) automated change using magic: sed -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e 's\|struct sock \sk,[\n\t ]struct tcp_sock \tp$[^{]\n{\n$\| struct sock \sk\1\tstruct tcp_sock tp = tcp_sk(sk);\n\|g' -e 's\|struct sock \sk, struct tcp_sock \tp\| struct sock \*sk\|g' -e 's\|sk, tp$[^-]$\|sk\1\|g' Fixed four unused variable (tp) warnings that were introduced. In addition, manually added newlines after local variables and tweaked function arguments positioning. $ gcc --version gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1) ... $ codiff -fV built-in.o.old built-in.o.new net/ipv4/route.c: rt_cache_flush \| +14 1 function changed, 14 bytes added net/ipv4/tcp.c: tcp_setsockopt \| -5 tcp_sendpage \| -25 tcp_sendmsg \| -16 3 functions changed, 46 bytes removed net/ipv4/tcp_input.c: tcp_try_undo_recovery \| +3 tcp_try_undo_dsack \| +2 tcp_mark_head_lost \| -12 tcp_ack \| -15 tcp_event_data_recv \| -32 tcp_rcv_state_process \| -10 tcp_rcv_established \| +1 7 functions changed, 6 bytes added, 69 bytes removed, diff: -63 net/ipv4/tcp_output.c: update_send_head \| -9 tcp_transmit_skb \| +19 tcp_cwnd_validate \| +1 tcp_write_wakeup \| -17 __tcp_push_pending_frames \| -25 tcp_push_one \| -8 tcp_send_fin \| -4 7 functions changed, 20 bytes added, 63 bytes removed, diff: -43 built-in.o.new: 18 functions changed, 40 bytes added, 178 bytes removed, diff: -138 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:34 -07:00
Andi Kleen	4ac02bab77	[TCP]: Uninline tcp_done(). The function is quite big and has several call sites and nothing to collapse by compiler optimization on inlining. Besides it's nicer to read in a in .c file. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:25 -07:00
Stephen Hemminger	3ff50b7997	[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:24 -07:00
Stephen Hemminger	c462238d6a	[TCP]: TCP Illinois congestion control (rev3) This is an implementation of TCP Illinois invented by Shao Liu at University of Illinois. It is a another variant of Reno which adapts the alpha and beta parameters based on RTT. The basic idea is to increase window less rapidly as delay approaches the maximum. See the papers and talks to get a more complete description. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:23 -07:00
YOSHIFUJI Hideaki	334901700f	[IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c. This also fixes memory leak in error path. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:12 -07:00
John Heffner	628a5c5618	[INET]: Add IP(V6)_PMTUDISC_RPOBE Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER. This option forces us not to fragment, but does not make use of the kernel path MTU discovery. That is, it allows for user-mode MTU probing (or, packetization-layer path MTU discovery). This is particularly useful for diagnostic utilities, like traceroute/tracepath. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:10 -07:00
Patrick McHardy	6313c1e099	[RTNETLINK]: Remove unnecessary locking in dump callbacks Since we're now holding the rtnl during the entire dump operation, we can remove additional locking for rtnl protected data. This patch does that for all simple cases (dev_base_lock for dev_base walking, RCU protection for FIB rule dumping). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:05 -07:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Patrick McHardy	b076deb849	[NETFILTER]: ipt_ULOG: add compat conversion functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:02 -07:00
Patrick McHardy	3b5018d676	[NETFILTER]: {eb,ip6,ip}t_LOG: remove remains of LOG target overloading All LOG targets always use their internal logging function nowadays, so remove the incorrect error message and handle real errors (!= -EEXIST) by failing to load. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:00 -07:00
Patrick McHardy	fe6092ea00	[NETFILTER]: nf_nat: use HW checksumming when possible When mangling packets forwarded to a HW checksumming capable device, offload recalculation of the checksum instead of doing it in software. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:59 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Herbert Xu	663ead3bb8	[NET]: Use csum_start offset instead of skb_transport_header The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:40 -07:00
Patrick McHardy	ac758e3c55	[XFRM]: beet: fix worst case header_len calculation esp_init_state doesn't account for the beet pseudo header in the header_len calculation, which may result in undersized skbs hitting xfrm4_beet_output, causing unnecessary reallocations in ip_finish_output2. The skbs should still always have enough room to avoid causing skb_under_panic in skb_push since we have at least 16 bytes available from LL_RESERVED_SPACE in xfrm_state_check_space. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:39 -07:00
Patrick McHardy	c5c2523893	[XFRM]: Optimize MTU calculation Replace the probing based MTU estimation, which usually takes 2-3 iterations to find a fitting value and may underestimate the MTU, by an exact calculation. Also fix underestimation of the XFRM trailer_len, which causes unnecessary reallocations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:38 -07:00
Patrick McHardy	557922584d	[XFRM]: esp: fix skb_tail_pointer conversion bug Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer conversion: - (u8)(trailer->tail - 1) = top_iph->protocol; + (skb_tail_pointer(skb) - 1) = top_iph->protocol; - (u8 )(trailer->tail - 1) = skb_network_header(skb); + (skb_tail_pointer(skb) - 1) = skb_network_header(skb); Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:37 -07:00
Patrick McHardy	ea2f10a3c8	[XFRM]: beet: minor cleanups Remove unnecessary initialization/variable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:34 -07:00
Arnaldo Carvalho de Melo	1a4e2d093f	[SK_BUFF]: Some more conversions to skb_copy_from_linear_data Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-04-25 22:28:30 -07:00
Arnaldo Carvalho de Melo	27d7ff46a3	[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-04-25 22:28:29 -07:00
Thomas Graf	4b19ca44cb	[NET] fib_rules: delay route cache flush by ip_rt_min_delay Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:24 -07:00
Arnaldo Carvalho de Melo	d626f62b11	[SK_BUFF]: Introduce skb_copy_from_linear_data{_offset} To clearly state the intent of copying from linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:28:23 -07:00
Eric Dumazet	03d4f879b9	[IPV4]: align inet_protos[] on SMP As IPPROTO_TCP is 6, it makes sense to make sure inet_protos[] array is properly cache line aligned to avoid false sharing on SMP. c0680540 b peer_total c0680544 b inet_peer_unused_head c0680560 B inet_protos On i386 this example, we can see that inet_protos[IPPROTO_TCP] shares a potentially hot (and modified) cache line. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:20 -07:00
Eric Dumazet	4103f8cd5c	[TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated. (Very hot cache line) It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP. ffffffff8081d9c0 B tcp_orphan_count ffffffff8081d9c4 B tcp_memory_allocated ffffffff8081d9c8 B tcp_sockets_allocated ffffffff8081d9cc B tcp_memory_pressure ffffffff8081d9d0 b tcp_md5sig_users ffffffff8081d9d8 b tcp_md5sig_pool ffffffff8081d9e0 b warntime.31570 ffffffff8081d9e8 b tcp_socket Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:19 -07:00
Thomas Graf	73417f617a	[NET] fib_rules: Flush route cache after rule modifications The results of FIB rules lookups are cached in the routing cache except for IPv6 as no such cache exists. So far, it was the responsibility of the user to flush the cache after modifying any rules. This lead to many false bug reports due to misunderstanding of this concept. This patch automatically flushes the route cache after inserting or deleting a rule. Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug in the previous patch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:18 -07:00
Eric Dumazet	be776281ae	[NET]: inet_ehash_secret should be __read_mostly and set only once There is a very tiny probability that build_ehash_secret() is called at the same time by different CPUS. Also, using __read_mostly is a must for inet_ehash_secret Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:17 -07:00
Herbert Xu	35fc92a9de	[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE Right now Xen has a horrible hack that lets it forward packets with partial checksums. One of the reasons that CHECKSUM_PARTIAL and CHECKSUM_COMPLETE were added is so that we can get rid of this hack (where it creates two extra bits in the skbuff to essentially mirror ip_summed without being destroyed by the forwarding code). I had forgotten that I've already gone through all the deivce drivers last time around to make sure that they're looking at ip_summed == CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit. In any case, I've now done that again so it should definitely be safe. Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE values on forward so we I'm setting that to CHECKSUM_NONE. This should be safe to remove for bridging but I'd like to check that code path first. So here is the patch that lets us get rid of the hack by preserving ip_summed (mostly) on forwarded packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:16 -07:00
Janusz Krzysztofik	2d771cd86d	[IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed this is a small patch by Janusz Krzysztofik to ip_route_output_slow() that allows VIP-less LVS linux director to generate packets originating >From VIP if sysctl_ip_nonlocal_bind is set. In a nutshell, the intention is for an LVS linux director to be able to send ICMP unreachable responses to end-users when real-servers are removed. http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00106.html Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:15 -07:00
Stephen Hemminger	85795d64ed	[TCP] tcp_probe: improvements for net-2.6.22 Change tcp_probe to use ktime (needed to add one export). Add option to only get events when cwnd changes - from Doug Leith Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:10 -07:00
Stephen Hemminger	e1c3e7ab6d	[TCP]: cubic update for net-2.6.22 The following update received from Injong updates TCP cubic to the latest version. I am running more complete tests and will have results after 4/1. According to Injong: the new version improves on its scalability, fairness and stability. So in all properties, we confirmed it shows better performance. NCSU results (for 2.6.18 and 2.6.20) available: http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing This version is described in a new Internet draft for CUBIC. http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txt Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:09 -07:00
John Heffner	9af3912ec9	[NET] Move DF check to ip_forward Do fragmentation check in ip_forward, similar to ipv6 forwarding. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:07 -07:00
David S. Miller	b3da2cf37c	[INET]: Use jhash + random secret for ehash. The days are gone when this was not an issue, there are folks out there with huge bot networks that can be used to attack the established hash tables on remote systems. So just like the routing cache and connection tracking hash, use Jenkins hash with random secret input. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:06 -07:00
Patrick McHardy	e6f689db51	[NETFILTER]: Use setup_timer Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:43 -07:00
Patrick McHardy	1b53d9042c	[NETFILTER]: Remove changelogs and CVS IDs Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:35 -07:00
Thomas Graf	c702e8047f	[NETLINK]: Directly return -EINTR from netlink_dump_start() Now that all users of netlink_dump_start() use netlink_run_queue() to process the receive queue, it is possible to return -EINTR from netlink_dump_start() directly, therefore simplying the callers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Thomas Graf	ead592ba24	[IPv4] diag: Use netlink_run_queue() to process the receive queue Makes use of netlink_run_queue() to process the receive queue and converts inet_diag_rcv_msg() to use the type safe netlink interface. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:32 -07:00
Thomas Graf	267281058c	[TCP] westwood: Use type safe netlink interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:26 -07:00
Thomas Graf	e9195d677d	[TCP] vegas: Use type safe netlink interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:26 -07:00
Stephen Hemminger	7e58886b45	[TCP]: cubic optimization Use willy's work in optimizing cube root by having table for small values. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:19 -07:00
Thomas Graf	c454673da7	[NET] rules: Unified rules dumping Implements a unified, protocol independant rules dumping function which is capable of both, dumping a specific protocol family or all of them. This speeds up dumping as less lookups are required. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:17 -07:00
Thomas Graf	63f3444fb9	[IPv4]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:08 -07:00
Arnaldo Carvalho de Melo	dc5fc579b9	[NETLINK]: Use nlmsg_trim() where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Robert Olsson	965ffea43d	[IPV4]: fib_trie root node settings The threshold for root node can be more aggressive set to get better tree compression. The new setting mekes the root grow from 16 to 19 bits and substansial improvemnt in Aver depth this with the current table of 214393 prefixes But really the dynamic resize should need more investigation both in terms convergence and performance and maybe it should be possible to change... Maybe just for the brave to start with or we may have to back this out.	2007-04-25 22:26:32 -07:00
Robert Olsson	05eee48c5a	[IPV4]: fib_trie resize break The patch below adds break condition for the resize operations. If we don't achieve the desired fill factor a warning is printed. Trie should still be operational but new thresholds should be considered. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:32 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	2e07fa9cd3	[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used in further shrinking work, likely with the offsetization of other pointers, such as ->{data,tail,end}, at the cost of adds, that were minimized by the usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then accessed multiple times in each function, it also is not more expensive than before with regards to most of the handling of such headers, like setting one of these headers to another (transport to network, etc), or subtracting, adding to/from it, comparing them, etc. Now we have this layout for sk_buff on a x86_64 machine: [acme@mica net-2.6.22]$ pahole vmlinux sk_buff struct sk_buff { struct sk_buff * next; /* 0 8 / struct sk_buff prev; /* 8 8 / struct rb_node rb; / 16 24 / struct sock sk; /* 40 8 / ktime_t tstamp; / 48 8 / struct net_device dev; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct net_device input_dev; /* 64 8 / sk_buff_data_t transport_header; / 72 4 / sk_buff_data_t network_header; / 76 4 / sk_buff_data_t mac_header; / 80 4 / / XXX 4 bytes hole, try to pack / struct dst_entry dst; /* 88 8 / struct sec_path sp; /* 96 8 / char cb[48]; / 104 48 / / cacheline 2 boundary (128 bytes) was 24 bytes ago/ unsigned int len; / 152 4 / unsigned int data_len; / 156 4 / unsigned int mac_len; / 160 4 / union { __wsum csum; / 4 / __u32 csum_offset; / 4 / }; / 164 4 / __u32 priority; / 168 4 / __u8 local_df:1; / 172 1 / __u8 cloned:1; / 172 1 / __u8 ip_summed:2; / 172 1 / __u8 nohdr:1; / 172 1 / __u8 nfctinfo:3; / 172 1 / __u8 pkt_type:3; / 173 1 / __u8 fclone:2; / 173 1 / __u8 ipvs_property:1; / 173 1 / / XXX 2 bits hole, try to pack / __be16 protocol; / 174 2 / void (destructor)(struct sk_buff ); / 176 8 / struct nf_conntrack nfct; /* 184 8 / / --- cacheline 3 boundary (192 bytes) --- / struct sk_buff nfct_reasm; /* 192 8 / struct nf_bridge_info nf_bridge; /* 200 8 / __u16 tc_index; / 208 2 / __u16 tc_verd; / 210 2 / dma_cookie_t dma_cookie; / 212 4 / __u32 secmark; / 216 4 / __u32 mark; / 220 4 / unsigned int truesize; / 224 4 / atomic_t users; / 228 4 / unsigned char head; /* 232 8 / unsigned char data; /* 240 8 / unsigned char tail; /* 248 8 / / --- cacheline 4 boundary (256 bytes) --- / unsigned char end; /* 256 8 / }; / size: 264, cachelines: 5 / / sum members: 260, holes: 1, sum holes: 4 / / bit holes: 1, sum bit holes: 2 bits / / last cacheline: 8 bytes */ On 32 bits nothing changes, and pointers continue to be used with the compiler turning all this abstraction layer into dust. But there are some sk_buff validation tricks that are now possible, humm... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:21 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00
Arnaldo Carvalho de Melo	cfe1fc7759	[SK_BUFF]: Introduce skb_network_header_len For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len, that is precalculated tho, don't think we need to bloat skb with one more member, so just use this new helper, reducing the number of non-skbuff.h references to the layer headers even more. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:19 -07:00
Arnaldo Carvalho de Melo	bff9b61ce3	[SK_BUFF]: Use the helpers to get the layer header pointer Some more cases... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:18 -07:00
Arnaldo Carvalho de Melo	ddc7b8e32b	[SK_BUFF]: Some more layer header conversions Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:03 -07:00
Arnaldo Carvalho de Melo	d10ba34b00	[SK_BUFF]: More skb_put related skb_reset_transport_header This time we have to set it to skb->tail that is not anymore equal to skb->data, so we either add a new helper or just add the skb->tail - skb->data offset, for now do the later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:01 -07:00
Yasuyuki Kozakai	e7ac05f340	[NETFILTER]: nf_conntrack: add nf_copy() to safely copy members in skb This unifies the codes to copy netfilter related datas. Before copying, nf_copy() puts original members in destination skb. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:55 -07:00
Patrick McHardy	587aa64163	[NETFILTER]: Remove IPv4 only connection tracking/NAT Remove the obsolete IPv4 only connection tracking/NAT as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:34 -07:00
David S. Miller	239254fedc	[IPV4] xfrm4_mode_beet: Use skb_transport_header(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:32 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	bd82393ca2	[SK_BUFF]: More skb_reset_transport_header conversions These are a bit more subtle, they are of this type: - skb->h.raw = payload; __skb_pull(skb, payload - skb->data); + skb_reset_transport_header(skb); __skb_pull results in: skb->data = skb->data + payload - skb->data; skb->data = payload; So after __skb_pull we have skb->data pointing to payload and we can just call skb_reset_transport_header(skb), that will do: skb->h.raw = payload; The others are similar, allowing us to get rid of some more cases where a pointer was being attributed to the layer headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:29 -07:00
Arnaldo Carvalho de Melo	b0061ce49c	[SK_BUFF]: Introduce ipip_hdr(), remove skb->h.ipiph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:27 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	ab6a5bb6b2	[TCP]: Introduce tcp_hdrlen() and tcp_optlen() The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to avoid the longer, open coded equivalent. Ditched a no-op in bnx2 in the process. I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:24 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	d9edf9e2be	[SK_BUFF]: Introduce igmp_hdr() & friends, remove skb->h.igmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:21 -07:00
Arnaldo Carvalho de Melo	967b05f64e	[SK_BUFF]: Introduce skb_set_transport_header For the cases where the transport header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:17 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	d0a92be05e	[SK_BUFF]: Introduce arp_hdr(), remove skb->nh.arph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:12 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	e023dd6437	[IPMR]: Fix bug introduced when converting to skb_network_reset_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:08 -07:00
Arnaldo Carvalho de Melo	c9bdd4b525	[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl4 (skb->nh.iph)->ihl sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo	0272ffc46f	[SK_BUFF] ipmr: Missed one conversion to skb_network_header() We can't access skb->nh.raw directly anymore, it will become an offset. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:05 -07:00
Stephen Hemminger	f690808e17	[NET]: make seq_operations const The seq_file operations stuff can be marked constant to get it out of dirty cache. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:03 -07:00
Arnaldo Carvalho de Melo	c14d2450cb	[SK_BUFF]: Introduce skb_set_network_header For the cases where the network header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:01 -07:00
Arnaldo Carvalho de Melo	878c814500	[SK_BUFF] ipmr: Another skb_push related conversion to skb_reset_network_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:00 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	bbe735e424	[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:58 -07:00
Arnaldo Carvalho de Melo	7f5c0cb05f	[SK_BUFF] xfrm4: use skb_reset_network_header Setting it to skb->h.raw, which is valid, in the (to become) old pointer based world order and in the new world of offset based layer headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:55 -07:00
Arnaldo Carvalho de Melo	8856dfa3e9	[SK_BUFF]: Use skb_reset_network_header after skb_push Some more cases where skb->nh.iph was being set that were converted to using skb_reset_network_header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:53 -07:00
Arnaldo Carvalho de Melo	04b964dbad	[SK_BUFF] ipconfig: Another conversion to skb_reset_network_header related to skb_put boot_pkt->iph is the first member, that is at skb->data, so just use skb_reset_network_header(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:52 -07:00
Arnaldo Carvalho de Melo	2ca9e6f2c2	[SK_BUFF]: Some more skb_put cases converted to skb_reset_network_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:51 -07:00
Arnaldo Carvalho de Melo	31c7711b50	[SK_BUFF]: Some more simple skb_reset_network_header conversions This time of the type: skb->nh.iph = (struct iphdr *)skb->data; That is completely equivalent to: skb->nh.raw = skb->data; Wonder why people love casts... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:50 -07:00
Arnaldo Carvalho de Melo	4209fb601c	[SK_BUFF]: Use skb_reset_network_header where the return of __pskb_pull was being used It returns skb->data, so we can just use skb_reset_network_header after it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:49 -07:00
Arnaldo Carvalho de Melo	7e28ecc282	[SK_BUFF]: Use skb_reset_network_header where the skb_pull return was being used But only in the cases where its a newly allocated skb, i.e. one where skb->tail is equal to skb->data, or just after skb_reserve, where this requirement is maintained. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:48 -07:00
Arnaldo Carvalho de Melo	e2d1bca7e6	[SK_BUFF]: Use skb_reset_network_header in skb_push cases skb_push updates and returns skb->data, so we can just call skb_reset_network_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:47 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	98e399f82a	[SK_BUFF]: Introduce skb_mac_header() For the places where we need a pointer to the mac header, it is still legal to touch skb->mac.raw directly if just adding to, subtracting from or setting it to another layer header. This one also converts some more cases to skb_reset_mac_header() that my regex missed as it had no spaces before nor after '=', ugh. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:41 -07:00
Arnaldo Carvalho de Melo	31713c333d	[TCP]: Use skb_set_mac_header in tcp_collapse Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:39 -07:00
Arnaldo Carvalho de Melo	c51957dafa	[TCP]: Do the layer header setting in tcp_collapse relative to skb->data That is equal to skb->head before skb_reserve, to help in the layer header changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:38 -07:00
Arnaldo Carvalho de Melo	39f69c6f92	[SK_BUFF] xfrm: Use skb_set_mac_header in the memmove cases Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:37 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Arnaldo Carvalho de Melo	c7a3c5da35	[UDP]: Use __skb_pull since we have checked it won't fail with pskb_may_pull Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:20 -07:00
Stephen Hemminger	3fbe070a42	[UDP]: deinline A couple of functions are exported or used indirectly so it is pointless to mark them as inline. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:15 -07:00
Stephen Hemminger	2de979bd7d	[TCP]: whitespace cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:13 -07:00
Stephen Hemminger	132adf5463	[IPV4]: cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:11 -07:00
Stephen Hemminger	6516c65573	[UDP]: ipv4 whitespace cleanup Fix whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:07 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
David S. Miller	fe067e8ab5	[TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:02 -07:00
YOSHIFUJI Hideaki	4412ec4948	[NET] IPV4: Use hton{s,l}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:58 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
Eric Dumazet	243bbcaa09	[IPV4]: Optimize inet_getpeer() 1) Some sysctl vars are declared __read_mostly 2) We can avoid updating stack[] when doing an AVL lookup only. lookup() macro is extended to receive a second parameter, that may be NULL in case of a pure lookup (no need to save the AVL path). This removes unnecessary instructions, because compiler knows if this _stack parameter is NULL or not. text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:49 -07:00
Stephen Hemminger	43e683926f	[TCP] TCP Yeah: cleanup Eliminate need for full 6/4/64 divide to compute queue. Variable maxqueue was really a constant. Fix indentation. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:48 -07:00
Stephen Hemminger	c5f5877c04	[TCP] tcp_cubic: faster cube root The Newton-Raphson method is quadratically convergent so only a small fixed number of steps are necessary. Therefore it is faster to unroll the loop. Since div64_64 is no longer inline it won't cause code explosion. Also fixes a bug that can occur if x^2 was bigger than 32 bits. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:47 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
Stephen Hemminger	3927f2e8f9	[NET]: div64_64 consolidate (rev3) Here is the current version of the 64 bit divide common code. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:33 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
Ilpo Järvinen	e317f6f69c	[TCP]: FRTO undo response falls back to ratehalving one if ECEd Undoing ssthresh is disabled in fastretrans_alert whenever FLAG_ECE is set by clearing prior_ssthresh. The clearing does not protect FRTO because FRTO operates before fastretrans_alert. Moving the clearing of prior_ssthresh earlier seems to be a suboptimal solution to the FRTO case because then FLAG_ECE will cause a second ssthresh reduction in try_to_open (the first occurred when FRTO was entered). So instead, FRTO falls back immediately to the rate halving response, which switches TCP to CA_CWR state preventing the latter reduction of ssthresh. If the first ECE arrived before the ACK after which FRTO is able to decide RTO as spurious, prior_ssthresh is already cleared. Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be set also in the following ACKs resulting in rate halving response that sees TCP is already in CA_CWR, which again prevents an extra ssthresh reduction on that round-trip. If the first ECE arrived before RTO, ssthresh has already been adapted and prior_ssthresh remains cleared on entry because TCP is in CA_CWR (the same applies also to a case where FRTO is entered more than once and ECE comes in the middle). High_seq must not be touched after tcp_enter_cwr because CWR round-trip calculation depends on it. I believe that after this patch, FRTO should be ECN-safe and even able to take advantage of synergy benefits. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:26 -07:00
Ilpo Järvinen	e01f9d7793	[TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr) A local variable for icsk was created but this change was missing. Spotted by Jarek Poplawski. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:25 -07:00
Ilpo Järvinen	3cfe3baaf0	[TCP]: Add two new spurious RTO responses to FRTO New sysctl tcp_frto_response is added to select amongst these responses: - Rate halving based; reuses CA_CWR state (default) - Very conservative; used to be the only one available (=1) - Undo cwr; undoes ssthresh and cwnd reductions (=2) The response with rate halving requires a new parameter to tcp_enter_cwr because FRTO has already reduced ssthresh and doing a second reduction there has to be prevented. In addition, to keep things nice on 80 cols screen, a local variable was added. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:23 -07:00
Ilpo Järvinen	c5e7af0df5	[TCP]: Correct reordering detection change (no FRTO case) The reordering detection must work also when FRTO has not been used at all which was the original intention of mine, just the expression of the idea was flawed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:22 -07:00
Eric Dumazet	54287cc178	[TCP]: Keep copied_seq, rcv_wup and rcv_next together. I noticed in oprofile study a cache miss in tcp_rcv_established() to read copied_seq. ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293 2.0400 */ 55493 0.0281 :ffffffff80400bc9: mov 0x4c8(%r12),%eax copied_seq 543103 0.2746 :ffffffff80400bd1: cmp 0x3e0(%r12),%eax rcv_nxt if (tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len) { In this function, the cache line 0x4c0 -> 0x500 is used only for this reading 'copied_seq' field. rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...) As you suggested, I changed tcp_create_openreq_child() so that these fields are changed together, to avoid adding a new store buffer stall. Patch is 64bit friendly (no new hole because of alignment constraints) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:21 -07:00
Ilpo Järvinen	cf4c6bf83d	[TCP]: struct *sock argument renamed: sp -> sk In general, TCP code uses "sk" for struct sock pointer. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:20 -07:00
John Heffner	886236c124	[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:19 -07:00
Angelo P. Castellani	5ef814753e	[TCP] YeAH-TCP: algorithm implementation YeAH-TCP is a sender-side high-speed enabled TCP congestion control algorithm, which uses a mixed loss/delay approach to compute the congestion window. It's design goals target high efficiency, internal, RTT and Reno fairness, resilience to link loss while keeping network elements load as low as possible. For further details look here: http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf Signed-off-by: Angelo P. Castellani <angelo.castellani@gmail.con> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:18 -07:00
Ilpo Järvinen	4dc2665e36	[TCP]: SACK enhanced FRTO Implements the SACK-enhanced FRTO given in RFC4138 using the variant given in Appendix B. RFC4138, Appendix B: "This means that in order to declare timeout spurious, the TCP sender must receive an acknowledgment for non-retransmitted segment between SND.UNA and RecoveryPoint in algorithm step 3. RecoveryPoint is defined in conservative SACK-recovery algorithm [RFC3517]" The basic version of the FRTO algorithm can still be used also when SACK is enabled. To enabled SACK-enhanced version, tcp_frto sysctl is set to 2. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:16 -07:00
Ilpo Järvinen	288035f915	[TCP]: Prevent reordering adjustments during FRTO To be honest, I'm not too sure how the reord stuff works in the first place but this seems necessary. When FRTO has been active, the one and only retransmission could be unnecessary but the state and sending order might not be what the sacktag code expects it to be (to work correctly). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:15 -07:00
Ilpo Järvinen	66e93e45c0	[TCP] FRTO: Fake cwnd for ssthresh callback TCP without FRTO would be in Loss state with small cwnd. FRTO, however, leaves cwnd (typically) to a larger value which causes ssthresh to become too large in case RTO is triggered again compared to what conventional recovery would do. Because consecutive RTOs result in only a single ssthresh reduction, RTO+cumulative ACK+RTO pattern is required to trigger this event. A large comment is included for congestion control module writers trying to figure out what CA_EVENT_FRTO handler should do because there exists a remote possibility of incompatibility between FRTO and module defined ssthresh functions. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:14 -07:00
Ilpo Järvinen	d1a54c6a0a	[TCP] FRTO: Reverse RETRANS bit clearing logic Previously RETRANS bits were cleared on the entry to FRTO. We postpone that into tcp_enter_frto_loss, which is really the place were the clearing should be done anyway. This allows simplification of the logic from a clearing loop to the head skb clearing only. Besides, the other changes made in the previous patches to tcp_use_frto made it impossible for the non-SACKed FRTO to be entered if other than the head has been rexmitted. With SACK-enhanced FRTO (and Appendix B), however, there can be a number retransmissions in flight when RTO expires (same thing could happen before this patchset also with non-SACK FRTO). To not introduce any jumpiness into the packet counting during FRTO, instead of clearing RETRANS bits from skbs during entry, do it later on. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:13 -07:00
Ilpo Järvinen	46d0de4ed9	[TCP] FRTO: Entry is allowed only during (New)Reno like recovery This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:12 -07:00
Ilpo Järvinen	7c9a4a5b67	[TCP]: Prevent unrelated cwnd adjustment while using FRTO FRTO controls cwnd when it still processes the ACK input or it has just reverted back to conventional RTO recovery; the normal rules apply when FRTO has reverted to standard congestion control. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:11 -07:00
Ilpo Järvinen	94d0ea7786	[TCP] FRTO: frto_counter modulo-op converted to two assignments Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:10 -07:00
Ilpo Järvinen	52c63f1e86	[TCP]: Don't enter to fast recovery while using FRTO Because TCP is not in Loss state during FRTO recovery, fast recovery could be triggered by accident. Non-SACK FRTO is more robust than not yet included SACK-enhanced version (that can receiver high number of duplicate ACKs with SACK blocks during FRTO), at least with unidirectional transfers, but under extraordinary patterns fast recovery can be incorrectly triggered, e.g., Data loss+ACK losses => cumulative ACK with enough SACK blocks to meet sacked_out >= dupthresh condition). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:09 -07:00
Ilpo Järvinen	aa8b6a7ad1	[TCP] FRTO: Response should reset also snd_cwnd_cnt Since purpose is to reduce CWND, we prevent immediate growth. This is not a major issue nor is "the correct way" specified anywhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:08 -07:00
Ilpo Järvinen	95c4922bf9	[TCP] FRTO: fixes fallback to conventional recovery The FRTO detection did not care how ACK pattern affects to cwnd calculation of the conventional recovery. This caused incorrect setting of cwnd when the fallback becames necessary. The knowledge tcp_process_frto() has about the incoming ACK is now passed on to tcp_enter_frto_loss() in allowed_segments parameter that gives the number of segments that must be added to packets-in-flight while calculating the new cwnd. Instead of snd_una we use FLAG_DATA_ACKED in duplicate ACK detection because RFC4138 states (in Section 2.2): If the first acknowledgment after the RTO retransmission does not acknowledge all of the data that was retransmitted in step 1, the TCP sender reverts to the conventional RTO recovery. Otherwise, a malicious receiver acknowledging partial segments could cause the sender to declare the timeout spurious in a case where data was lost. If the next ACK after RTO is duplicate, we do not retransmit anything, which is equal to what conservative conventional recovery does in such case. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:07 -07:00
Ilpo Järvinen	6408d206c7	[TCP] FRTO: Ignore some uninteresting ACKs Handles RFC4138 shortcoming (in step 2); it should also have case c) which ignores ACKs that are not duplicates nor advance window (opposite dir data, winupdate). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:06 -07:00
Ilpo Järvinen	7b0eb22b1d	[TCP] FRTO: Use Disorder state during operation instead of Open Retransmission counter assumptions are to be changed. Forcing reason to do this exist: Using sysctl in check would be racy as soon as FRTO starts to ignore some ACKs (doing that in the following patches). Userspace may disable it at any moment giving nice oops if timing is right. frto_counter would be inaccessible from userspace, but with SACK enhanced FRTO retrans_out can include other than head, and possibly leaving it non-zero after spurious RTO, boom again. Luckily, solution seems rather simple: never go directly to Open state but use Disorder instead. This does not really change much, since TCP could anyway change its state to Disorder during FRTO using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when a SACK block makes ACK dubious). Besides, Disorder seems to be the state where TCP should be if not recovering (in Recovery or Loss state) while having some retransmissions in-flight (see tcp_try_to_open), which is exactly what happens with FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:05 -07:00
Ilpo Järvinen	7487c48c4f	[TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh In case a latency spike causes more than one RTO, the later should not cause the already reduced ssthresh to propagate into the prior_ssthresh since FRTO declares all such RTOs spurious at once or none of them. In treating of ssthresh, we mimic what tcp_enter_loss() does. The previous state (in frto_counter) must be available until we have checked it in tcp_enter_frto(), and also ACK information flag in process_frto(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:04 -07:00
Ilpo Järvinen	30935cf4f9	[TCP] FRTO: Comment cleanup & improvement Moved comments out from the body of process_frto() to the head (preferred way; see Documentation/CodingStyle). Bonus: it's much easier to read in this compacted form. FRTO algorithm and implementation is described in greater detail. For interested reader, more information is available in RFC4138. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:03 -07:00
Ilpo Järvinen	bdaae17da8	[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:02 -07:00
Ilpo Järvinen	9ead9a1d38	[TCP] FRTO: Separated response from FRTO detection algorithm FRTO spurious RTO detection algorithm (RFC4138) does not include response to a detected spurious RTO but can use different response algorithms. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:01 -07:00
Ilpo Järvinen	522e7548a9	[TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit FRTO was slightly too brave... Should only clear TCPCB_SACKED_RETRANS bit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:00 -07:00
Alexey Kuznetsov	1194ed0a3e	[NETLINK]: Infinite recursion in netlink. Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 13:07:28 -07:00
Patrick McHardy	05d224468a	[XFRM]: beet: fix pseudo header length value draft-nikander-esp-beet-mode-07.txt is not entirely clear on how the length value of the pseudo header should be calculated, it states "The Header Length field contains the length of the pseudo header, IPv4 options, and padding in 8 octets units.", but also states "Length in octets (Header Len + 1) * 8". draft-nikander-esp-beet-mode-08-pre1.txt [1] clarifies this, the header length should not include the first 8 byte. This change affects backwards compatibility, but option encapsulation didn't work until very recently anyway. [1] http://users.piuha.net/jmelen/BEET/draft-nikander-esp-beet-mode-08-pre1.txt Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:39:02 -07:00
Stephen Hemminger	4d4d3d1e88	[TCP]: Congestion control initialization. Change to defer congestion control initialization. If setsockopt() was used to change TCP_CONGESTION before connection is established, then protocols that use sequence numbers to keep track of one RTT interval (vegas, illinois, ...) get confused. Change the init hook to be called after handshake. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:32:11 -07:00
David S. Miller	49688c8431	[NETFILTER] arp_tables: Fix unaligned accesses. There are two device string comparison loops in arp_packet_match(). The first one goes byte-by-byte but the second one tries to be clever and cast the string to a long and compare by longs. The device name strings in the arp table entries are not guarenteed to be aligned enough to make this value, so just use byte-by-byte for both cases. Based upon a report by <drraid@gmail.com>. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:37:54 -07:00
Patrick McHardy	01102e7ca2	[NETFILTER]: ipt_ULOG: use put_unaligned Use put_unaligned to fix warnings about unaligned accesses. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-12 14:27:03 -07:00
Jaroslav Kysela	50c9cc2e54	[NETFILTER]: ipt_CLUSTERIP: fix oops in checkentry function The clusterip_config_find_get() already increases entries reference counter, so there is no reason to do it twice in checkentry() callback. This causes the config to be freed before it is removed from the list, resulting in a crash when adding the next rule. Signed-off-by: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-10 13:26:48 -07:00
David S. Miller	15d33c070d	[TCP]: slow_start_after_idle should influence cwnd validation too For the cases that slow_start_after_idle are meant to deal with, it is almost a certainty that the congestion window tests will think the connection is application limited and we'll thus decrease the cwnd there too. This defeats the whole point of setting slow_start_after_idle to zero. So test it there too. We do not cancel out the entire tcp_cwnd_validate() function so that if the sysctl is changed we still have the validation state maintained. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-09 13:31:15 -07:00
Patrick McHardy	254d0d24e3	[XFRM]: beet: fix IP option decapsulation Beet mode looks for the beet pseudo header after the outer IP header, which is wrong since that is followed by the ESP header. Additionally it needs to adjust the packet length after removing the pseudo header and point the data pointer to the real data location. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 16:03:33 -07:00
Patrick McHardy	d4b1e84062	[XFRM]: beet: fix beet mode decapsulation Beet mode decapsulation fails to properly set up the skb pointers, which only works by accident in combination with CONFIG_NETFILTER, since in that case the skb is fixed up in xfrm4_input before passing it to the netfilter hooks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:59:41 -07:00
Patrick McHardy	04fef9893a	[XFRM]: beet: use IPOPT_NOP for option padding draft-nikander-esp-beet-mode-07.txt states "The padding MUST be filled with NOP options as defined in Internet Protocol [1] section 3.1 Internet header format.", so do that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:54:39 -07:00
Patrick McHardy	c5027c9a89	[XFRM]: beet: fix IP option encapsulation Beet mode calculates an incorrect value for the transport header location when IP options are present, resulting in encapsulation errors. The correct location is 4 or 8 bytes before the end of the original IP header, depending on whether the pseudo header is padded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:54:02 -07:00
John Heffner	84565070e4	[TCP]: Do receiver-side SWS avoidance for rcvbuf < MSS. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:56:32 -07:00
Robert Olsson	d562f1f8a9	[IPV4] fib_trie: Document locking. Paul E. McKenney writes: > Those of use who dive into networking only occasionally would much > appreciate this. ;-) No problem here... Acked-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (but trivial) Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-26 14:22:22 -07:00
Thomas Graf	a0ee18b9b7	[IPv4] fib: Fix out of bound access of fib_props[] Fixes a typo which caused fib_props[] to have the wrong size and makes sure the value used to index the array which is provided by userspace via netlink is checked to avoid out of bound access. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:03 -07:00
Thomas Graf	e1701c68c1	[NET]: Fix fib_rules compatibility breakage Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:00 -07:00
Patrick McHardy	848c29fd64	[NETFILTER]: nat: avoid rerouting packets if only XFRM policy key changed Currently NAT not only reroutes packets in the OUTPUT chain when the routing key changed, but also if only the non-routing part of the IPsec policy key changed. This breaks ping -I since it doesn't use SO_BINDTODEVICE but IP_PKTINFO cmsg to specify the output device, and this information is lost. Only do full rerouting if the routing key changed, and just do a new policy lookup with the old route if only the ports changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:30:29 -07:00
John Heffner	53cdcc04c1	[TCP]: Fix tcp_mem[] initialization. Change tcp_mem initialization function. The fraction of total memory is now a continuous function of memory size, and independent of page size. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-16 15:04:03 -07:00
Robert Olsson	d5cc4a73a5	[IPV4]: Do not disable preemption in trie_leaf_remove(). Hello, Just discussed this Patrick... We have two users of trie_leaf_remove, fn_trie_flush and fn_trie_delete both are holding RTNL. So there shouldn't be need for this preempt stuff. This is assumed to a leftover from an older RCU-take. > Mhh .. I think I just remembered something - me incorrectly suggesting > to add it there while we were talking about this at OLS :) IIRC the > idea was to make sure tnode_free (which at that time didn't use > call_rcu) wouldn't free memory while still in use in a rcu read-side > critical section. It should have been synchronize_rcu of course, > but with tnode_free using call_rcu it seems to be completely > unnecessary. So I guess we can simply remove it. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-16 15:00:07 -07:00
Geert Uytterhoeven	08882669e0	[IPV4]: Fix warning in ip_mc_rejoin_group. Kill warning about unused variable `in_dev' when CONFIG_IP_MULTICAST is not set. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-12 17:02:37 -07:00
Paul Moore	38c8947c1b	[NetLabel]: parse the CIPSO ranged tag on incoming packets Commit `484b366932` added support for the CIPSO ranged categories tag. However, it appears that I made a mistake when rebasing then patch to the latest upstream sources for submission and dropped the part of the patch that actually parses the tag on incoming packets. This patch fixes this mistake by adding the required function call to the cipso_v4_skbuff_getattr() function. I've run this patch over the weekend and have not noticed any problems. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-12 14:38:02 -07:00
Evgeniy Polyakov	c4e38f41e3	[IPV4]: Fix rtm_to_ifaddr() error handling. Return negative error value (embedded in the pointer) instead of returning NULL. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-09 13:43:24 -08:00
Herbert Xu	d644329bc9	[UDP]: Reread uh pointer after pskb_trim The header may have moved when trimming. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:04 -08:00
Linus Torvalds	5b3c1184e7	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [DCCP]: Set RTO for newly created child socket [DCCP]: Correctly split CCID half connections [NET]: Fix compat_sock_common_getsockopt typo. [NET]: Revert incorrect accept queue backlog changes. [INET]: twcal_jiffie should be unsigned long, not int [GIANFAR]: Fix compile error in latest git [PPPOE]: Use ifindex instead of device pointer in key lookups. [NETFILTER]: ip6_route_me_harder should take into account mark [NETFILTER]: nfnetlink_log: fix reference counting [NETFILTER]: nfnetlink_log: fix module reference counting [NETFILTER]: nfnetlink_log: fix possible NULL pointer dereference [NETFILTER]: nfnetlink_log: fix NULL pointer dereference [NETFILTER]: nfnetlink_log: fix use after free [NETFILTER]: nfnetlink_log: fix reference leak [NETFILTER]: tcp conntrack: accept SYN\|URG as valid [NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops	2007-03-06 19:53:34 -08:00
Jay Vosburgh	a816c7c712	bonding: Improve IGMP join processing In active-backup mode, the current bonding code duplicates IGMP traffic to all slaves, so that switches are up to date in case of a failover from an active to a backup interface. If bonding then fails back to the original active interface, it is likely that the "active slave" switch's IGMP forwarding for the port will be out of date until some event occurs to refresh the switch (e.g., a membership query). This patch alters the behavior of bonding to no longer flood IGMP to all ports, and to issue IGMP JOINs to the newly active port at the time of a failover. This insures that switches are kept up to date for all cases. "GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally reported this problem, and included a patch. His original patch was modified by Jay Vosburgh to additionally remove the existing IGMP flood behavior, use RCU, streamline code paths, fix trailing white space, and adjust for style. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-03-06 06:08:11 -05:00
Patrick McHardy	d3ab4298aa	[NETFILTER]: tcp conntrack: accept SYN\|URG as valid Some stacks apparently send packets with SYN\|URG set. Linux accepts these packets, so TCP conntrack should to. Pointed out by Martijn Posthuma <posthuma@sangine.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:20 -08:00
Patrick McHardy	e281db5cdf	[NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK, but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or CONFIG_NF_CONNTRACK_NETLINK for ifdefs. Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:19 -08:00
Patrick McHardy	ec68e97ded	[NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: - unconfirmed entries can not be killed manually, they are removed on confirmation or final destruction of the conntrack entry, which means we might iterate forever without making forward progress. This can happen in combination with the conntrack event cache, which holds a reference to the conntrack entry, which is only released when the packet makes it all the way through the stack or a different packet is handled. - taking references to an unconfirmed entry and using it outside the locked section doesn't work, the list entries are not refcounted and another CPU might already be waiting to destroy the entry What the code really wants to do is make sure the references of the hash table to the selected conntrack entries are released, so they will be destroyed once all references from skbs and the event cache are dropped. Since unconfirmed entries haven't even entered the hash yet, simply mark them as dying and skip confirmation based on that. Reported and tested by Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:18 -08:00
Paul Moore	c6387a8694	[NetLabel]: Verify sensitivity level has a valid CIPSO mapping The current CIPSO engine has a problem where it does not verify that the given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is used. The end result is that bad packets are sent on the wire which should have never been sent in the first place. This patch corrects this problem by verifying the sensitivity level mapping similar to what is done with the category mapping. This patch also changes the returned error code in this case to -EPERM to better match what the category mapping verification code returns. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-02 20:37:36 -08:00
Arnaldo Carvalho de Melo	a9948a7e15	[TCP]: Fix minisock tcp_create_openreq_child() typo. On 2/28/07, KOVACS Krisztian <hidden@balabit.hu> wrote: > > Hi, > > While reading TCP minisock code I've found this suspiciously looking > code fragment: > > - 8< - > struct sock tcp_create_openreq_child(struct sock sk, struct request_sock req, struct sk_buff skb) > { > struct sock newsk = inet_csk_clone(sk, req, GFP_ATOMIC); > > if (newsk != NULL) { > const struct inet_request_sock ireq = inet_rsk(req); > struct tcp_request_sock treq = tcp_rsk(req); > struct inet_connection_sock newicsk = inet_csk(sk); > struct tcp_sock *newtp; > - 8< - > > The above code initializes newicsk to inet_csk(sk), isn't that supposed > to be inet_csk(newsk)? As far as I can tell this might leave > icsk_ack.last_seg_size zero even if we do have received data. Good catch! David, please apply the attached patch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 11:05:56 -08:00
Bernhard Walle	aef8811abb	[XFRM]: Fix oops in xfrm4_dst_destroy() With 2.6.21-rc1, I get an oops when running 'ifdown eth0' and an IPsec connection is active. If I shut down the connection before running 'ifdown eth0', then there's no problem. The critical operation of this script is to kill dhcpd. The problem is probably caused by commit with git identifier `4337226228` (Linus tree) "[IPSEC]: IPv4 over IPv6 IPsec tunnel". This patch fixes that oops. I don't know the network code of the Linux kernel in deep, so if that fix is wrong, please change it. But please fix the oops. :) Signed-off-by: Bernhard Walle <bwalle@suse.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 12:10:32 -08:00
Arnaldo Carvalho de Melo	e4396b544f	[XFRM_TUNNEL]: Reload header pointer after pskb_may_pull/pskb_expand_head Please consider applying, this was found on your latest net-2.6 tree while playing around with that ip_hdr() + turn skb->nh/h/mac pointers as offsets on 64 bits idea :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:43:01 -08:00
Joe Perches	4c3ae4d7e7	[IPV4]: Use random32() in net/ipv4/multipath Removed local random number generator function Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:43:00 -08:00
Herbert Xu	8030f54499	[IPV4] devinet: Register inetdev earlier. This patch allocates inetdev at registration for all devices in line with IPv6. This allows sysctl configuration on the devices to occur before they're brought up or addresses are added. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:56 -08:00
Baruch Even	f4b9479dc5	[IPV4]: Correct links in net/ipv4/Kconfig Correct dead/indirect links in net/ipv4/Kconfig Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:51 -08:00
David S. Miller	2c4f6219ac	[TCP]: Fix MD5 signature pool locking. The locking calls assumed that these code paths were only invoked in software interrupt context, but that isn't true. Therefore we need to use spin_{lock,unlock}_bh() throughout. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:48 -08:00
Adrian Bunk	936bb14ce9	correct a dead URL in the IP_MULTICAST help text Reported in kernel Bugzilla #6216. Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 19:49:13 +01:00
Robert P. J. Day	d08df601a3	Various typo fixes. Correct mis-spellings of "algorithm", "appear", "consistent" and (shame, shame) "kernel". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 19:07:33 +01:00
Eric W. Biederman	3fbfa98112	[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Kazunori MIYAZAWA	c0d56408e3	[IPSEC]: Changing API of xfrm4_tunnel_register. This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:54:47 -08:00
Ilpo Järvinen	600ff0c24b	[TCP]: Prevent pseudo garbage in SYN's advertized window TCP may advertize up to 16-bits window in SYN packets (no window scaling allowed). At the same time, TCP may have rcv_wnd (32-bits) that does not fit to 16-bits without window scaling resulting in pseudo garbage into advertized window from the low-order bits of rcv_wnd. This can happen at least when mss <= (1<<wscale) (see tcp_select_initial_window). This patch fixes the handling of SYN advertized windows (compile tested only). In worst case (which is unlikely to occur though), the receiver advertized window could be just couple of bytes. I'm not sure that such situation would be handled very well at all by the receiver!? Fortunately, the situation normalizes after the first non-SYN ACK is received because it has the correct, scaled window. Alternatively, tcp_select_initial_window could be changed to prevent too large rcv_wnd in the first place. [ tcp_make_synack() has the same bug, and I've added a fix for that to this patch -DaveM ] Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:42:11 -08:00
Herbert Xu	bbf4a6bc8c	[NETFILTER]: Clear GSO bits for TCP reset packet The TCP reset packet is copied from the original. This includes all the GSO bits which do not apply to the new packet. So we should clear those bits. Spotted by Patrick McHardy. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:32:58 -08:00
Patrick McHardy	e2e01bfef8	[XFRM]: Fix IPv4 tunnel mode decapsulation with IPV6=n Add missing break when CONFIG_IPV6=n. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 20:27:10 -08:00
Stephen Hemminger	9121c77706	[TCP]: cleanup of htcp (resend) Minor non-invasive cleanups: * white space around operators and line wrapping * use const * use __read_mostly Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 13:34:03 -08:00
Stephen Hemminger	59758f4459	[TCP]: Use read mostly for CUBIC parameters. These module parameters should be in the read mostly area to avoid cache pollution. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 13:15:20 -08:00
Patrick McHardy	a3c941b08d	[NETFILTER]: Kconfig: improve dependency handling Instead of depending on internally needed options and letting users figure out what is needed, select them when needed: - IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select NETFILTER_XTABLES - NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK - NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:15:02 -08:00
Patrick McHardy	982d9a9ce3	[NETFILTER]: nf_conntrack: properly use RCU for nf_conntrack_destroyed callback Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:14:11 -08:00
Patrick McHardy	6b48a7d08d	[NETFILTER]: ip_conntrack: properly use RCU for ip_conntrack_destroyed callback Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:58 -08:00
Patrick McHardy	abbaccda4c	[NETFILTER]: ip_conntrack: fix invalid conntrack statistics RCU assumption CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding ip_conntrack_lock. Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:14 -08:00
Patrick McHardy	923f4902fe	[NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:57 -08:00
Patrick McHardy	642d628b2c	[NETFILTER]: ip_conntrack: properly use RCU API for ip_ct_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:40 -08:00
Patrick McHardy	e22a054869	[NETFILTER]: nf_nat: properly use RCU API for nf_nat_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:26 -08:00
Patrick McHardy	a441dfdbb2	[NETFILTER]: ip_nat: properly use RCU API for ip_nat_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:09 -08:00
Patrick McHardy	e92ad99c78	[NETFILTER]: nf_log: minor cleanups - rename nf_logging to nf_loggers since its an array of registered loggers - rename nf_log_unregister_logger() to nf_log_unregister() to make it symetrical to nf_log_register() and convert all users Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:11:55 -08:00
Patrick McHardy	c3a47ab3e5	[NETFILTER]: Properly use RCU in nf_ct_attach Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead of self-made RCU and use rcu_read_lock to make sure the conntrack module doesn't disappear below us while calling it, since this function can be called from outside the netfilter hooks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:09:19 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Linus Torvalds	cb18eccff4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...	2007-02-11 11:38:13 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Eric Dumazet	5ef213f684	[IPV4]: Restore multipath routing after rt_next changes. I forgot to test build this part of the networking code... Sorry guys. This patch renames u.rt_next to u.dst.rt_next Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:50 -08:00
Eric Dumazet	093c2ca416	[IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer This patch removes the rt_next pointer from 'struct rtable.u' union, and renames u.rt_next to u.dst_rt_next. It also moves 'struct flowi' right after 'struct dst_entry' to prepare the gain on lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:38 -08:00
Eric Dumazet	95f30b336b	[UDP]: UDP can use sk_hash to speedup lookups In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:29 -08:00
YOSHIFUJI Hideaki	e905a9edab	[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:39 -08:00
Eric Dumazet	dbca9b2750	[NET]: change layout of ehash table ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 14:16:46 -08:00
Jan Engelhardt	e60a13e030	[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:20 -08:00
Jan Engelhardt	6709dbbb19	[NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:19 -08:00
Jan Engelhardt	e1fd0586b0	[NETFILTER]: x_tables: fix return values for LOG/ULOG Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:18 -08:00
Eric Leblond	41f4689a7c	[NETFILTER]: NAT: optional source port randomization support This patch adds support to NAT to randomize source ports. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:17 -08:00
Patrick McHardy	cdd289a2f8	[NETFILTER]: add IPv6-capable TCPMSS target Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:16 -08:00
Patrick McHardy	a8d0f9526f	[NET]: Add UDPLITE support in a few missing spots Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:14 -08:00
Patrick McHardy	efbc597634	[NETFILTER]: nf_nat: remove broken HOOKNAME macro Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:12 -08:00
Patrick McHardy	a09113c2c8	[NETFILTER]: tcp conntrack: do liberal tracking for picked up connections Do liberal tracking (only RSTs need to be in-window) for connections picked up without seeing a SYN to deal with window scaling. Also change logging of invalid packets not to log packets accepted by liberal tracking to avoid spamming the logs. Based on suggestion from James Ralston <ralston@pobox.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:10 -08:00
Stephen Hemminger	22f8cde5bc	[NET]: unregister_netdevice as void There was no real useful information from the unregister_netdevice() return code, the only error occurred in a situation that was a driver bug. So change it to a void function. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:06 -08:00
Alexey Dobriyan	cc63f70b8b	[IPV4/IPV6] multicast: Check add_grhead() return value add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb from it passed to skb_put() without checking. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:04 -08:00
David S. Miller	f2f2102d1a	[XFRM]: Fix missed error setting in xfrm4_policy.c When we can't find the afinfo we should return EAFNOSUPPORT. GCC warned about the uninitialized 'err' for this path as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:03 -08:00
Miika Komu	4337226228	[IPSEC]: IPv4 over IPv6 IPsec tunnel This is the patch to support IPv4 over IPv6 IPsec. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:02 -08:00
Miika Komu	c82f963efe	[IPSEC]: IPv6 over IPv4 IPsec tunnel This is the patch to support IPv6 over IPv4 IPsec Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:01 -08:00
Miika Komu	cdca72652a	[IPSEC]: exporting xfrm_state_afinfo This patch exports xfrm_state_afinfo. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:00 -08:00
John Heffner	104439a887	[TCP]: Don't apply FIN exception to full TSO segments. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:51 -08:00
Baruch Even	8a3c3a9727	[TCP]: Check num sacks in SACK fast path We clear the unused parts of the SACK cache, This prevents us from mistakenly taking the cache data if the old data in the SACK cache is the same as the data in the SACK block. This assumes that we never receive an empty SACK block with start and end both at zero. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:50 -08:00
Baruch Even	6f74651ae6	[TCP]: Seperate DSACK from SACK fast path Move DSACK code outside the SACK fast-path checking code. If the DSACK determined that the information was too old we stayed with a partial cache copied. Most likely this matters very little since the next packet will not be DSACK and we will find it in the cache. but it's still not good form and there is little reason to couple the two checks. Since the SACK receive cache doesn't need the data to be in host order we also remove the ntohl in the checking loop. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:49 -08:00
Baruch Even	fda03fbb56	[TCP]: Advance fast path pointer for first block only Only advance the SACK fast-path pointer for the first block, the fast-path assumes that only the first block advances next time so we should not move the cached skb for the next sack blocks. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:48 -08:00
David S. Miller	8eb9086f21	[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:45 -08:00
Frederik Deweerdt	ba7808eac1	[TCP]: remove tcp header from tcp_v4_check (take #2 ) The tcphdr struct passed to tcp_v4_check is not used, the following patch removes it from the parameter list. This adds the netfilter modifications missing in the patch I sent for rc3-mm1. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:44 -08:00
Patrick McHardy	26932566a4	[NETLINK]: Don't BUG on undersized allocations Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:41 -08:00
Patrick McHardy	40e0cb004a	[NETFILTER]: ctnetlink: fix compile failure with NF_CONNTRACK_MARK=n CC net/netfilter/nf_conntrack_netlink.o net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_conntrack_event': net/netfilter/nf_conntrack_netlink.c:392: error: 'struct nf_conn' has no member named 'mark' make[3]: *** [net/netfilter/nf_conntrack_netlink.o] Error 1 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-02 19:33:11 -08:00
Patrick McHardy	adcb471110	[NETFILTER]: SIP conntrack: fix out of bounds memory access When checking for an @-sign in skp_epaddr_len, make sure not to run over the packet boundaries. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:25:24 -08:00
Lars Immisch	7da5bfbb12	[NETFILTER]: SIP conntrack: fix skipping over user info in SIP headers When trying to skip over the username in the Contact header, stop at the end of the line if no @ is found to avoid mangling following headers. We don't need to worry about continuation lines because we search inside a SIP URI. Fixes Netfilter Bugzilla #532. Signed-off-by: Lars Immisch <lars@ibp.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:24:57 -08:00
Robert Olsson	095b8501e4	[IPV4]: Fix single-entry /proc/net/fib_trie output. When main table is just a single leaf this gets printed as belonging to the local table in /proc/net/fib_trie. A fix is below. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 19:06:01 -08:00
Patrick McHardy	a46bf7d5a8	[NETFILTER]: nf_nat_pptp: fix expectation removal When removing the expectation for the opposite direction, the PPTP NAT helper initializes the tuple for lookup with the addresses of the opposite direction, which makes the lookup fail. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:07:30 -08:00
Patrick McHardy	c72c6b2a29	[NETFILTER]: nf_nat: fix ICMP translation with statically linked conntrack When nf_nat/nf_conntrack_ipv4 are linked statically, nf_nat is initialized before nf_conntrack_ipv4, which makes the nf_ct_l3proto_find_get(AF_INET) call during nf_nat initialization return the generic l3proto instead of the AF_INET specific one. This breaks ICMP error translation since the generic protocol always initializes the IPs in the tuple to 0. Change the linking order and put nf_conntrack_ipv4 first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:06:47 -08:00
David S. Miller	e89862f4c5	[TCP]: Restore SKB socket owner setting in tcp_transmit_skb(). Revert `931731123a` We can't elide the skb_set_owner_w() here because things like certain netfilter targets (such as owner MATCH) need a socket to be set on the SKB for correct operation. Thanks to Jan Engelhardt and other netfilter list members for pointing this out. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:04:55 -08:00
Baruch Even	db3ccdac26	[TCP]: Fix sorting of SACK blocks. The sorting of SACK blocks actually munges them rather than sort, causing the TCP stack to ignore some SACK information and breaking the assumption of ordered SACK blocks after sorting. The sort takes the data from a second buffer which isn't moved causing subsequent data moves to occur from the wrong location. The fix is to use a temporary buffer as a normal sort does. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-25 13:35:06 -08:00
Eric W. Biederman	6640e69731	[IPV4]: Fix the fib trie iterator to work with a single entry routing tables In a kernel with trie routing enabled I had a simple routing setup with only a single route to the outside world and no default route. "ip route table list main" showed my the route just fine but /proc/net/route was an empty file. What was going on? Thinking it was a bug in something I did and I looked deeper. Eventually I setup a second route and everything looked correct, huh? Finally I realized that the it was just the iterator pair in fib_trie_get_first, fib_trie_get_next just could not handle a routing table with a single entry. So to save myself and others further confusion, here is a simple fix for the fib proc iterator so it works even when there is only a single route in a routing table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-24 14:42:04 -08:00
Jarek Poplawski	52d570aabe	[TCP]: rare bad TCP checksum with 2.6.19 The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" changed to unconditional copying of ip_summed field from collapsed skb. This patch reverts this change. The majority of substantial work including heavy testing and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> Possible reasons pointed by: Herbert Xu and Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 22:07:12 -08:00
Masayuki Nakagawa	fb7e2399ec	[TCP]: skb is unexpectedly freed. I encountered a kernel panic with my test program, which is a very simple IPv6 client-server program. The server side sets IPV6_RECVPKTINFO on a listening socket, and the client side just sends a message to the server. Then the kernel panic occurs on the server. (If you need the test program, please let me know. I can provide it.) This problem happens because a skb is forcibly freed in tcp_rcv_state_process(). When a socket in listening state(TCP_LISTEN) receives a syn packet, then tcp_v6_conn_request() will be called from tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully returns, the skb would be discarded by __kfree_skb(). However, in case of a listening socket which was already set IPV6_RECVPKTINFO, an address of the skb will be stored in treq->pktopts and a ref count of the skb will be incremented in tcp_v6_conn_request(). But, even if the skb is still in use, the skb will be freed. Then someone still using the freed skb will cause the kernel panic. I suggest to use kfree_skb() instead of __kfree_skb(). Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 20:25:52 -08:00
Patrick McHardy	c54ea3b95a	[NETFILTER]: ctnetlink: fix leak in ctnetlink_create_conntrack error path Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 20:25:42 -08:00
Stephen Hemminger	65ebe63420	[PATCH] email change for shemminger@osdl.org Change my email address to reflect OSDL merger. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> [ The irony. Somebody still has his sign-off message hardcoded in a script or his brainstem ;^] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 14:18:49 -08:00

... 4 5 6 7 8 ...

1711 Commits