linux

Commit Graph

Author	SHA1	Message	Date
Stephen Hemminger	bb2f8cc0ec	[NETEM]: spelling errors Get rid of some of my creative spelling. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Thomas Graf	c702e8047f	[NETLINK]: Directly return -EINTR from netlink_dump_start() Now that all users of netlink_dump_start() use netlink_run_queue() to process the receive queue, it is possible to return -EINTR from netlink_dump_start() directly, therefore simplying the callers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Thomas Graf	ead592ba24	[IPv4] diag: Use netlink_run_queue() to process the receive queue Makes use of netlink_run_queue() to process the receive queue and converts inet_diag_rcv_msg() to use the type safe netlink interface. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:32 -07:00
Thomas Graf	1d00a4eb42	[NETLINK]: Remove error pointer from netlink message handler The error pointer argument in netlink message handlers is used to signal the special case where processing has to be interrupted because a dump was started but no error happened. Instead it is simpler and more clear to return -EINTR and have netlink_run_queue() deal with getting the queue right. nfnetlink passed on this error pointer to its subsystem handlers but only uses it to signal the start of a netlink dump. Therefore it can be removed there as well. This patch also cleans up the error handling in the affected message handlers to be consistent since it had to be touched anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:30 -07:00
Thomas Graf	45e7ae7f71	[NETLINK]: Ignore control messages directly in netlink_run_queue() Changes netlink_rcv_skb() to skip netlink controll messages and don't pass them on to the message handler. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	d35b685640	[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue() netlink_rcv_skb() is changed to skip messages which don't have the NLM_F_REQUEST bit to avoid every netlink family having to perform this check on their own. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	33a0543cd9	[NETLINK]: Remove unused groups variable Leftover from dynamic multicast groups allocation work. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:27 -07:00
Thomas Graf	267281058c	[TCP] westwood: Use type safe netlink interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:26 -07:00
Thomas Graf	e9195d677d	[TCP] vegas: Use type safe netlink interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:26 -07:00
Thomas Graf	51057f2fec	[RTNL]: Properly return rntl message handler Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:24 -07:00
Stephen Hemminger	1936502d00	[NET_SCHED] qdisc: avoid transmit softirq on watchdog wakeup If possible, avoid having to do a transmit softirq when a qdisc watchdog decides to re-enable. The watchdog routine runs off a timer, so it is already in the same effective context as the softirq. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:23 -07:00
Stephen Hemminger	11274e5a43	[NETEM]: avoid excessive requeues The netem code would call getnstimeofday() and dequeue/requeue after every packet, even if it was waiting. Avoid this overhead by using the throttled flag. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:22 -07:00
Stephen Hemminger	075aa573b7	[NETEM]: Optimize tfifo In most cases, the next packet will be sent after the last one. So optimize that case. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:21 -07:00
Stephen Hemminger	b407621c35	[NETEM]: use better types for time values The random number generator always generates 32 bit values. The time values are limited by psched_tdiff_t Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:20 -07:00
Stephen Hemminger	a362e0a789	[NETEM]: report reorder percent correctly. If you setup netem to just delay packets; "tc qdisc ls" will report the reordering as 100%. Well it's a lie, reorder isn't used unless gap is set, so just set value to 0 so the output of utility is correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:20 -07:00
Stephen Hemminger	7e58886b45	[TCP]: cubic optimization Use willy's work in optimizing cube root by having table for small values. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:19 -07:00
Thomas Graf	c454673da7	[NET] rules: Unified rules dumping Implements a unified, protocol independant rules dumping function which is capable of both, dumping a specific protocol family or all of them. This speeds up dumping as less lookups are required. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:17 -07:00
Thomas Graf	687ad8cc64	[RTNL]: Use rtnl registration interface for dump-all aliases Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:15 -07:00
Thomas Graf	32fe21c0c0	[BRIDGE]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:14 -07:00
Thomas Graf	c127ea2c45	[IPv6]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:13 -07:00
Thomas Graf	fa34ddd739	[DECNet]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:12 -07:00
Thomas Graf	708914cc5e	[PKT_SCHED] act: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:11 -07:00
Thomas Graf	82623c0d73	[PKT_SCHED] cls: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:10 -07:00
Thomas Graf	be577ddc2b	[PKT_SCHED] qdisc: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:09 -07:00
Thomas Graf	63f3444fb9	[IPv4]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:08 -07:00
Thomas Graf	9d9e6a5819	[NET] rules: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:07 -07:00
Thomas Graf	c8822a4e00	[NEIGH]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:06 -07:00
Thomas Graf	340d17fc9d	[NET] link: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:05 -07:00
Thomas Graf	e284986385	[RTNL]: Message handler registration interface This patch adds a new interface to register rtnetlink message handlers replacing the exported rtnl_links[] array which required many message handlers to be exported unnecessarly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:04 -07:00
Gerrit Renker	30833ffead	[CCID3]: Use initial RTT sample from SYN exchange The patch follows the following recommendation made in an erratum to RFC 4342: "Senders MAY additionally make use of other available RTT measurements, including those from the initial Request-Response packet exchange." It implements larger initial windows with regard to this inital RTT measurement, using the mechanism suggested in draft-ietf-dccp-rfc3448bis, section 4.2. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:04 -07:00
Gerrit Renker	89560b53b9	[DCCP]: Sample RTT from SYN exchange Function:	2007-04-25 22:27:02 -07:00
Gerrit Renker	7dfee1a9c0	[CCID3]: Use function for RTT sampling This replaces the existing occurrences of RTT sampling with the use of the new function dccp_sample_rtt. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:01 -07:00
Gerrit Renker	4712a792ee	[DCCP]: Provide function for RTT sampling A recurring problem, in particular in the CCID code, is that RTT samples from packets with timestamp echo and elapsed time options need to be taken. This service is provided via a new function dccp_sample_rtt in this patch. Furthermore, to protect against `insane' RTT samples, the sampled value is bounded between 100 microseconds and 4 seconds - for which u32 is sufficient. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:00 -07:00
Gerrit Renker	0c150efb28	[CCID3]: Handle Idle and Application-Limited periods This updates the code with regard to handling idle and application-limited periods as specified in [RFC 4342, 5.1]. Background:	2007-04-25 22:26:59 -07:00
Gerrit Renker	a21f9f96cd	[CCID3]: Wrap computation of RFC3390-initial rate into separate function The CCID 3 and TFRC specs (RFC 4342, RFC 3448, draft-3448bis) make frequent reference to the computation of the RFC-3390 initial sending rate: 1. Initial sending rate when RTT is known (RFC 4342, p. 6) 2. Response to Idle/Application-Limited periods (RFC 4342, 5.1) This warrants putting the code into its own function, for later code reuse. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:58 -07:00
Gerrit Renker	1761f7d7fe	[CCID3]: Remove build warnings for 64bit This clears the following sparc64 build warnings: 1) warning: format "%ld" expects type "long int", but argument 3 has type "suseconds_t" 2) warning: format "%llu" expects type "long long unsigned int", but argument 3 has type "__u64" Fixed by using typecast to unsigned. This is argued to be safe, since the quantities, after de-scaling (factor 2^6) fit all in u32. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:57 -07:00
Gerrit Renker	fddc2feb94	[CCID3]: More to see in dccp_probe This adds a few more fields of interest to /proc/net/dccpprobe, the following output ensues: 1 2 3 4 5 6 7 8 9 10 11 sec.usec src:sport dst:dport size s rtt p X_calc X_recv X t_ipi Also made the formatting consistent. Scripts that go with this can be downloaded from http://139.133.210.30/users/gerrit/dccp/dccp_probe/ Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:56 -07:00
Gerrit Renker	6626e3628f	[DCCP]: More debug information for dccp_wait_for_ccid This adds more detail in the wait_for_ccid packet scheduling loop. In particular, it informs about (i) when delay is used and (ii) why a packet is discarded. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:54 -07:00
Gerrit Renker	ac12b0c495	[DCCP]: Always use debug-toggle parameters Currently debugging output (when configured) is automatically enabled when DCCP modules are compiled into the kernel rather than built as loadable modules. This is not necessary, since the module parameters in this case become kernel commandline parameters, e.g. DCCP or CCID3 debug output can be enabled for a static build by appending the following at the boot prompt: dccp.dccp_debug=1 dccp_ccid3.ccid3_debug=1 This patch therefore does away with the more complicated way of always enabling debug output for static builds Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:53 -07:00
Gerrit Renker	1266adee12	[CCID3]: Remove race condition and update t_ipi when `s' changes This: 1. removes a race condition in the access to the scheduled send time t_nom which results from allowing asynchronous r/w access to t_nom without locks; 2. updates the inter-packet interval t_ipi = s/X when `s' changes, following a suggestion by Ian McDonald. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:52 -07:00
Ian McDonald	8699be7d24	[CCID3]: More verbose debugging This adds a few debugging statements to ccid3.c Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:51 -07:00
Ian McDonald	551dc5f7a1	[CCID3]: Fix use of invalid loss intervals This fixes a bug which uses an invalid comparison. The bug resulted in the use of invalid loss intervals. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:50 -07:00
Gerrit Renker	371fe7779c	[CCID3]: Use MSS for larger initial windows This improves the slow-start phase by using the MSS (as suggested in RFC 4342, sec. 5) instead of the packet size s. Also figured out that __u32 is ample resource enough. After applying, I got the following in the logs: ccid3_hc_tx_packet_recv: client(f7421700), s=6, MSS=1424, w_init=4380, R_sample=176us, X=24886363 Had the previous variant been used, w_init would have been as low as 24. Committer note: removed unneeded cast to unsigned long long that was causing a compiler warning on 64bit architectures. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:49 -07:00
Gerrit Renker	9bf17475eb	[CCID3]: Re-order CCID 3 source file No code change at all. This splits ccid3.c into a RX and a TX section, so that the file has an organisation similar to the other ones (e.g. packet_history.{h,c}). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:48 -07:00
Gerrit Renker	353b13e10a	[CCID3]: Remove redundant `len' test Since CCID3 avoids sending 0-byte data packets (cf. ccid3_hc_tx_send_packet), testing for zero-payload length, as performed by ccid3_hc_tx_update_s, is redundant - hence removed by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:47 -07:00
Gerrit Renker	8d13bf9a0b	[DCCP]: Remove ambiguity in the way before48 is used This removes two ambiguities in employing the new definition of before48, following the analysis on http://www.mail-archive.com/dccp@vger.kernel.org/msg01295.html (1) Updating GSR when P.seqno >= S.SWL With the old definition we did not update when P.seqno and S.SWL are 2^47 apart. To ensure the same behaviour as with the old definition, this is replaced with the equivalent condition dccp_delta_seqno(S.SWL, P.seqno) >= 0 (2) Sending SYNC when P.seqno >= S.OSR Here it is debatable whether the new definition causes an ambiguity: the case is similar to (1); and to have consistency with the case (1), we use the equivalent condition dccp_delta_seqno(S.OSR, P.seqno) >= 0 Detailed Justification	2007-04-25 22:26:46 -07:00
Gerrit Renker	b16be51b5e	[DCCP]: Fix for follows48 The follows48 relation identifies whether 48-bit sequence number x is the direct successor of y. Currently, it does not handle cases of the following type correctly: follows48(0x(prefix)10000LL, 0x(prefix)0FFFFLL) where prefix is an arbitrary hex sequence of up to 7 digits. This is fixed by reusing the new dccp_delta_seqno function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:45 -07:00
Gerrit Renker	d52de17b8c	[DCCP]: Make `before' relation unambiguous Problem:	2007-04-25 22:26:44 -07:00
Gerrit Renker	0aec51c869	[DCCP]: Make dccp_delta_seqno return signed numbers Problem:	2007-04-25 22:26:43 -07:00
Gerrit Renker	6b811d43f6	[DCCP]: 48-bit sequence number arithmetic This patch * organizes the sequence arithmetic functions into one corner of dccp.h * performs a small modification of dccp_set_seqno to make it more widely reusable (now it is safe to use any number, since it performs modulo-2^48 assignment) * adds functions and generic macros for 48-bit sequence arithmetic: --48 bit complement --modulo-48 addition and modulo-48 subtraction --dccp_inc_seqno now a special case of add48 Constants renamed following a suggestion by Arnaldo. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:42 -07:00
Arnaldo Carvalho de Melo	6b88dd966b	[SK_BUFF] ipv6: Use skb_network_offset in some more places So that we reduce the number of direct accesses to skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:26:38 -07:00
Arnaldo Carvalho de Melo	dc5fc579b9	[NETLINK]: Use nlmsg_trim() where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo	897933bcdf	[SK_BUFF]: Remove skb_add_mtu() leftovers Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:26:35 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Robert Olsson	965ffea43d	[IPV4]: fib_trie root node settings The threshold for root node can be more aggressive set to get better tree compression. The new setting mekes the root grow from 16 to 19 bits and substansial improvemnt in Aver depth this with the current table of 214393 prefixes But really the dynamic resize should need more investigation both in terms convergence and performance and maybe it should be possible to change... Maybe just for the brave to start with or we may have to back this out.	2007-04-25 22:26:32 -07:00
Robert Olsson	05eee48c5a	[IPV4]: fib_trie resize break The patch below adds break condition for the resize operations. If we don't achieve the desired fill factor a warning is printed. Trie should still be operational but new thresholds should be considered. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:32 -07:00
Arnaldo Carvalho de Melo	ca0605a7c8	[SK_BUFF]: Adjust the zeroing up to tail in __alloc_skb too I did it just in alloc_skb_from_cache, forgot __alloc_skb, fixed now. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:31 -07:00
Arnaldo Carvalho de Melo	4305b54135	[SK_BUFF]: Convert skb->end to sk_buff_data_t Now to convert the last one, skb->data, that will allow many simplifications and removal of some of the offset helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:29 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
David S. Miller	be8bd86321	[VLAN] vlan_dev: Use skb_reset_network_header(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:26 -07:00
Samuel Ortiz	c7630a4b93	[IrDA]: irda lockdep annotation Rmmoding irda triggers a lockdep false positive. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:23 -07:00
Arnaldo Carvalho de Melo	2e07fa9cd3	[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used in further shrinking work, likely with the offsetization of other pointers, such as ->{data,tail,end}, at the cost of adds, that were minimized by the usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then accessed multiple times in each function, it also is not more expensive than before with regards to most of the handling of such headers, like setting one of these headers to another (transport to network, etc), or subtracting, adding to/from it, comparing them, etc. Now we have this layout for sk_buff on a x86_64 machine: [acme@mica net-2.6.22]$ pahole vmlinux sk_buff struct sk_buff { struct sk_buff * next; /* 0 8 / struct sk_buff prev; /* 8 8 / struct rb_node rb; / 16 24 / struct sock sk; /* 40 8 / ktime_t tstamp; / 48 8 / struct net_device dev; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct net_device input_dev; /* 64 8 / sk_buff_data_t transport_header; / 72 4 / sk_buff_data_t network_header; / 76 4 / sk_buff_data_t mac_header; / 80 4 / / XXX 4 bytes hole, try to pack / struct dst_entry dst; /* 88 8 / struct sec_path sp; /* 96 8 / char cb[48]; / 104 48 / / cacheline 2 boundary (128 bytes) was 24 bytes ago/ unsigned int len; / 152 4 / unsigned int data_len; / 156 4 / unsigned int mac_len; / 160 4 / union { __wsum csum; / 4 / __u32 csum_offset; / 4 / }; / 164 4 / __u32 priority; / 168 4 / __u8 local_df:1; / 172 1 / __u8 cloned:1; / 172 1 / __u8 ip_summed:2; / 172 1 / __u8 nohdr:1; / 172 1 / __u8 nfctinfo:3; / 172 1 / __u8 pkt_type:3; / 173 1 / __u8 fclone:2; / 173 1 / __u8 ipvs_property:1; / 173 1 / / XXX 2 bits hole, try to pack / __be16 protocol; / 174 2 / void (destructor)(struct sk_buff ); / 176 8 / struct nf_conntrack nfct; /* 184 8 / / --- cacheline 3 boundary (192 bytes) --- / struct sk_buff nfct_reasm; /* 192 8 / struct nf_bridge_info nf_bridge; /* 200 8 / __u16 tc_index; / 208 2 / __u16 tc_verd; / 210 2 / dma_cookie_t dma_cookie; / 212 4 / __u32 secmark; / 216 4 / __u32 mark; / 220 4 / unsigned int truesize; / 224 4 / atomic_t users; / 228 4 / unsigned char head; /* 232 8 / unsigned char data; /* 240 8 / unsigned char tail; /* 248 8 / / --- cacheline 4 boundary (256 bytes) --- / unsigned char end; /* 256 8 / }; / size: 264, cachelines: 5 / / sum members: 260, holes: 1, sum holes: 4 / / bit holes: 1, sum bit holes: 2 bits / / last cacheline: 8 bytes */ On 32 bits nothing changes, and pointers continue to be used with the compiler turning all this abstraction layer into dust. But there are some sk_buff validation tricks that are now possible, humm... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:21 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00
Arnaldo Carvalho de Melo	cfe1fc7759	[SK_BUFF]: Introduce skb_network_header_len For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len, that is precalculated tho, don't think we need to bloat skb with one more member, so just use this new helper, reducing the number of non-skbuff.h references to the layer headers even more. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:19 -07:00
Arnaldo Carvalho de Melo	bff9b61ce3	[SK_BUFF]: Use the helpers to get the layer header pointer Some more cases... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:18 -07:00
Patrick McHardy	514bca322c	[NET_SCHED]: Fix warning net/sched/sch_api.c: In function 'psched_show': net/sched/sch_api.c:1219: warning: format '%08x' expects type 'unsigned int', but argument 6 has type 's64' Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:17 -07:00
Patrick McHardy	bb239acf56	[NET_SCHED]: sch_cbq: fix watchdog scheduled too late q->now is increased during dequeue and doesn't contain the current time afterwards, resulting in a too large timeout value for the qdisc watchdog. Use "now" instead, which still contains the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:16 -07:00
Patrick McHardy	4361cb17f0	[NET_SCHED]: Export real timer resolution in /proc/net/psched The timer resolution exported in /proc/net/psched is used by userspace to calculate HTB's burst values. Currently it is set to HZ, since we're now using hrtimers, use KTIME_MONOTONIC_RES, which makes HTB use smaller burst values. This patch also affects libnl, which incorrectly uses this value for the SFQ perturbation parameter, which is always in seconds, and some routing cache values, which are in USER_HZ, so both cases are broken anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:15 -07:00
Patrick McHardy	00c04af9df	[NET_SCHED]: kill jiffie conversion macros Now that all packet schedulers have been converted to hrtimers most users of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use it to convert external time units to packet scheduler clock ticks, so use PSCHED_TICKS_PER_SEC instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:14 -07:00
Patrick McHardy	fb983d4578	[NET_SCHED]: sch_htb: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:13 -07:00
Patrick McHardy	1a13cb63d6	[NET_SCHED]: sch_cbq: use hrtimer for delay_timer Switch delay_timer to hrtimer. The class penalty parameter is changed to use psched ticks as units. Since iproute never supported using this and the only existing user (libnl) incorrectly assumes psched ticks as units anyway, this shouldn't break anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:12 -07:00
Patrick McHardy	e9054a339e	[NET_SCHED]: sch_cbq: fix cbq_undelay_prio for non-active priorites cbq_undelay_prio is supposed to return a time delta, but returns the current time for non-active priorities, causing cbq_undelay to mark the priority as active and schedule a timer for twice the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:11 -07:00
Patrick McHardy	88a993540a	[NET_SCHED]: sch_cbq: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:09 -07:00
Patrick McHardy	59cb5c6734	[NET_SCHED]: sch_netem: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:08 -07:00
Patrick McHardy	f7f593e383	[NET_SCHED]: sch_tbf: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:07 -07:00
Patrick McHardy	ed2b229a97	[NET_SCHED]: sch_hfsc: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:06 -07:00
Patrick McHardy	4179477f63	[NET_SCHED]: Add hrtimer based qdisc watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:05 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Arnaldo Carvalho de Melo	ddc7b8e32b	[SK_BUFF]: Some more layer header conversions Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:03 -07:00
Arnaldo Carvalho de Melo	d10ba34b00	[SK_BUFF]: More skb_put related skb_reset_transport_header This time we have to set it to skb->tail that is not anymore equal to skb->data, so we either add a new helper or just add the skb->tail - skb->data offset, for now do the later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:01 -07:00
Arnaldo Carvalho de Melo	55f79cc0c0	[IPV6]: Reset the network header in ip6_nd_hdr ip6_nd_hdr is always called immediately after a alloc_skb + skb_reserve sequence, i.e. when skb->tail is equal to skb->data, making it correct to use skb_reset_network_header(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:00 -07:00
Arnaldo Carvalho de Melo	eeeb03745b	[SK_BUFF]: More skb_put related conversions to skb_reset_transport_header This is similar to the skb_reset_network_header(), i.e. at the point we reset the transport header pointer/offset skb->tail is equal to skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:59 -07:00
Pablo Neira Ayuso	ac6d141dc7	[NETFILTER]: nfnetlink: parse attributes with nfattr_parse in nfnetlink_check_attribute Use nfattr_parse to parse attributes, this patch also modifies the default behaviour since unknown attributes will be ignored instead of returning EINVAL. This ensure backward compatibility: new libraries with new attributes and old kernels can work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:58 -07:00
Pablo Neira Ayuso	c8e2078cfe	[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling This patch let userspace programs set the IP_CT_TCP_BE_LIBERAL flag to force the pickup of established connections. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:57 -07:00
Willy Tarreau	5c8ce7c921	[NETFILTER]: TCP conntrack: factorize out the PUSH flag The PUSH flag is accepted with every other valid combination. Let's get it out of the tcp_valid_flags table and reduce the number of combinations we have to handle. This does not significantly reduce the table size however (8 bytes). Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:56 -07:00
Willy Tarreau	8f5bd99071	[NETFILTER]: TCP conntrack: accept RST\|PSH as valid This combination has been encountered on an IBM AS/400 in response to packets sent to a closed session. There is no particular reason to mark it invalid. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:56 -07:00
Yasuyuki Kozakai	e7ac05f340	[NETFILTER]: nf_conntrack: add nf_copy() to safely copy members in skb This unifies the codes to copy netfilter related datas. Before copying, nf_copy() puts original members in destination skb. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:55 -07:00
Yasuyuki Kozakai	edda553c32	[NETFILTER]: nf_conntrack: add __nf_copy() to copy members in skb This unifies the codes to copy netfilter related datas. Note that __nf_copy() assumes destination skb doesn't have any netfilter related members. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:54 -07:00
Sami Farin	9b88790972	[NETFILTER]: nf_conntrack: use jhash2 in __hash_conntrack Now it uses jhash, but using jhash2 would be around 3-4 times faster (on P4). Signed-off-by: Sami Farin <safari-netfilter@safari.iki.fi> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:53 -07:00
Pablo Neira Ayuso	f4bc177f0f	[NETFILTER]: nfnetlink: move EXPORT_SYMBOL declarations next to the exported symbol Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:51 -07:00
Pablo Neira Ayuso	8a2e89533a	[NETFILTER]: nfnetlink: remove unused includes in nfnetlink.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:50 -07:00
Pablo Neira Ayuso	ac0f1d9894	[NETFILTER]: nfnetlink: remove unrequired check in nfnetlink_get_subsys subsys_table is initialized to NULL, therefore just returns NULL in case that it is not set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:49 -07:00
Pablo Neira Ayuso	d9e6d02949	[NETFILTER]: nfnetlink: remove duplicate checks in nfnetlink_check_attributes Remove nfnetlink_check_attributes duplicates message size and callback id checks. nfnetlink_find_client and nfnetlink_rcv_msg already do such checks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:48 -07:00
Pablo Neira Ayuso	67ca396606	[NETFILTER]: nfnetlink: remove early debugging messages from nfnetlink Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:47 -07:00
Patrick McHardy	010c7d6f86	[NETFILTER]: nf_conntrack: uninline notifier registration functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:46 -07:00
Patrick McHardy	73c361862c	[NETFILTER]: nfnetlink: use netlink_run_queue() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:44 -07:00
Patrick McHardy	a3c5029cf7	[NETFILTER]: nfnetlink: use mutex instead of semaphore Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:43 -07:00
Patrick McHardy	c6a1e615d1	[NETFILTER]: nf_conntrack: simplify l4 protocol array allocation The retrying after an allocation failure is not necessary anymore since we're holding the mutex the entire time, for the same reason the double allocation race can't happen anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:43 -07:00
Patrick McHardy	0661cca9c2	[NETFILTER]: nf_conntrack: simplify protocol locking Now that we don't use nf_conntrack_lock anymore but a single mutex for all protocol handling, no need to release and grab it again for sysctl registration. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:41 -07:00
Patrick McHardy	ac5357ebac	[NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration Remove ugly special-casing of nf_conntrack_l4proto_generic, all it wants is its sysctl tables registered, so do that explicitly in an init function and move the remaining protocol initialization and cleanup code to nf_conntrack_proto.c as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:40 -07:00
Patrick McHardy	b19caa0ca0	[NETFILTER]: nf_conntrack: switch protocol registration/unregistration to mutex The protocol lookups done by nf_conntrack are already protected by RCU, there is no need to keep taking nf_conntrack_lock for registration and unregistration. Switch to a mutex. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:39 -07:00
Patrick McHardy	587aa64163	[NETFILTER]: Remove IPv4 only connection tracking/NAT Remove the obsolete IPv4 only connection tracking/NAT as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:34 -07:00
Tobias Klauser	ce18afe57b	[NETFILTER]: x_tables: remove duplicate of xt_prefix Remove xt_proto_prefix array which duplicates xt_prefix and change all users of xt_proto_prefix to xt_prefix. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:33 -07:00
David S. Miller	239254fedc	[IPV4] xfrm4_mode_beet: Use skb_transport_header(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:32 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	a27ef749e7	[SCTP]: Eliminate some pointer attributions to the skb layer headers Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:30 -07:00
Arnaldo Carvalho de Melo	bd82393ca2	[SK_BUFF]: More skb_reset_transport_header conversions These are a bit more subtle, they are of this type: - skb->h.raw = payload; __skb_pull(skb, payload - skb->data); + skb_reset_transport_header(skb); __skb_pull results in: skb->data = skb->data + payload - skb->data; skb->data = payload; So after __skb_pull we have skb->data pointing to payload and we can just call skb_reset_transport_header(skb), that will do: skb->h.raw = payload; The others are similar, allowing us to get rid of some more cases where a pointer was being attributed to the layer headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:29 -07:00
Arnaldo Carvalho de Melo	39b89160df	[SK_BUFF]: Introduce ipipv6_hdr(), remove skb->h.ipv6h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:28 -07:00
Arnaldo Carvalho de Melo	b0061ce49c	[SK_BUFF]: Introduce ipip_hdr(), remove skb->h.ipiph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:27 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	ab6a5bb6b2	[TCP]: Introduce tcp_hdrlen() and tcp_optlen() The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to avoid the longer, open coded equivalent. Ditched a no-op in bnx2 in the process. I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:24 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	d9edf9e2be	[SK_BUFF]: Introduce igmp_hdr() & friends, remove skb->h.igmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:21 -07:00
Arnaldo Carvalho de Melo	cc70ab261c	[ICMP6]: Introduce icmp6_hdr() For consistency with all the other skb->h.raw accessors. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:20 -07:00
Arnaldo Carvalho de Melo	2c0fd387b0	[SCTP]: Introduce sctp_hdr() For consistency with all the other skb->h.raw accessors. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:19 -07:00
Arnaldo Carvalho de Melo	967b05f64e	[SK_BUFF]: Introduce skb_set_transport_header For the cases where the transport header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:17 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	d0a92be05e	[SK_BUFF]: Introduce arp_hdr(), remove skb->nh.arph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:12 -07:00
Stephen Hemminger	fd74e6ccd5	[BRIDGE]: faster compare for link local addresses Use logic operations rather than memcmp() to compare destination address with link local multicast addresses. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:11 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	e023dd6437	[IPMR]: Fix bug introduced when converting to skb_network_reset_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:08 -07:00
Arnaldo Carvalho de Melo	c9bdd4b525	[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl4 (skb->nh.iph)->ihl sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo	0272ffc46f	[SK_BUFF] ipmr: Missed one conversion to skb_network_header() We can't access skb->nh.raw directly anymore, it will become an offset. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:05 -07:00
Stephen Hemminger	0e1256ffd1	[NET]: show bound packet types Show what protocols are bound to what packet types in /proc/net/ptype Uses kallsyms to decode function pointers if possible. Example: Type Device Function ALL eth1 packet_rcv_spkt+0x0 0800 ip_rcv+0x0 0806 arp_rcv+0x0 86dd :ipv6:ipv6_rcv+0x0 Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:04 -07:00
Stephen Hemminger	f690808e17	[NET]: make seq_operations const The seq_file operations stuff can be marked constant to get it out of dirty cache. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:03 -07:00
Stephen Hemminger	6b2bedc3a6	[NET]: network dev read_mostly For Eric, mark packet type and network device watermarks as read mostly. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:02 -07:00
Arnaldo Carvalho de Melo	c14d2450cb	[SK_BUFF]: Introduce skb_set_network_header For the cases where the network header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:01 -07:00
Arnaldo Carvalho de Melo	878c814500	[SK_BUFF] ipmr: Another skb_push related conversion to skb_reset_network_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:00 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	bbe735e424	[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:58 -07:00
Arnaldo Carvalho de Melo	7f5c0cb05f	[SK_BUFF] xfrm4: use skb_reset_network_header Setting it to skb->h.raw, which is valid, in the (to become) old pointer based world order and in the new world of offset based layer headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:55 -07:00
Arnaldo Carvalho de Melo	1ced98e81d	[SK_BUFF] ipv6: More skb_reset_network_header conversions related to skb_pull Now related to this form: skb->nh.ipv6h = (struct ipv6hdr )skb_put(skb, length); That, as the others, is done when skb->tail is still equal to skb->data, making the conversion to skb_reset_network_header possible. Also one more case equivalent to skb->nh.raw = skb->data, of this form: iph = (struct ipv6hdr )skb->data; <SNIP> skb->nh.ipv6h = iph; Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:54 -07:00
Arnaldo Carvalho de Melo	8856dfa3e9	[SK_BUFF]: Use skb_reset_network_header after skb_push Some more cases where skb->nh.iph was being set that were converted to using skb_reset_network_header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:53 -07:00
Arnaldo Carvalho de Melo	04b964dbad	[SK_BUFF] ipconfig: Another conversion to skb_reset_network_header related to skb_put boot_pkt->iph is the first member, that is at skb->data, so just use skb_reset_network_header(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:52 -07:00
Arnaldo Carvalho de Melo	2ca9e6f2c2	[SK_BUFF]: Some more skb_put cases converted to skb_reset_network_header Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:51 -07:00
Arnaldo Carvalho de Melo	31c7711b50	[SK_BUFF]: Some more simple skb_reset_network_header conversions This time of the type: skb->nh.iph = (struct iphdr *)skb->data; That is completely equivalent to: skb->nh.raw = skb->data; Wonder why people love casts... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:50 -07:00
Arnaldo Carvalho de Melo	4209fb601c	[SK_BUFF]: Use skb_reset_network_header where the return of __pskb_pull was being used It returns skb->data, so we can just use skb_reset_network_header after it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:49 -07:00
Arnaldo Carvalho de Melo	7e28ecc282	[SK_BUFF]: Use skb_reset_network_header where the skb_pull return was being used But only in the cases where its a newly allocated skb, i.e. one where skb->tail is equal to skb->data, or just after skb_reserve, where this requirement is maintained. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:48 -07:00
Arnaldo Carvalho de Melo	e2d1bca7e6	[SK_BUFF]: Use skb_reset_network_header in skb_push cases skb_push updates and returns skb->data, so we can just call skb_reset_network_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:47 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	57effc70a5	[IPV6]: Use skb->nh.ipv6h instead of casting skb->nh.raw nh.ipv6h is there exactly for this reason! Use it while it exists ;-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:45 -07:00
Arnaldo Carvalho de Melo	98e399f82a	[SK_BUFF]: Introduce skb_mac_header() For the places where we need a pointer to the mac header, it is still legal to touch skb->mac.raw directly if just adding to, subtracting from or setting it to another layer header. This one also converts some more cases to skb_reset_mac_header() that my regex missed as it had no spaces before nor after '=', ugh. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:41 -07:00
Arnaldo Carvalho de Melo	31713c333d	[TCP]: Use skb_set_mac_header in tcp_collapse Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:39 -07:00
Arnaldo Carvalho de Melo	c51957dafa	[TCP]: Do the layer header setting in tcp_collapse relative to skb->data That is equal to skb->head before skb_reserve, to help in the layer header changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:38 -07:00
Arnaldo Carvalho de Melo	39f69c6f92	[SK_BUFF] xfrm: Use skb_set_mac_header in the memmove cases Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:37 -07:00
Arnaldo Carvalho de Melo	48d49d0ccd	[SK_BUFF]: Introduce skb_set_mac_header() For the cases where we want to set skb->mac.raw to an offset from skb->data. Simple cases first, the memmove ones and specially pktgen will be left for later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:37 -07:00
Arnaldo Carvalho de Melo	f64955eb11	[LLC]: Use skb_reset_mac_header in llc_mac_hdr_init skb_push updates and returns skb->data, so we can just call skb_reset_mac_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:35 -07:00
Arnaldo Carvalho de Melo	0a1b0ad9ae	[LLC]: Use skb_reset_mac_header in llc_alloc_frame skb->head is equal to skb->data after alloc_skb, so reset the mac header while this is true, i.e. before skb_reserve. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:34 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Arnaldo Carvalho de Melo	4c13eb6657	[ETH]: Make eth_type_trans set skb->dev like the other *_type_trans One less thing for drivers writers to worry about. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:30 -07:00
Arnaldo Carvalho de Melo	0a4f23fbbf	[HIPPI/FDDI]: Make {hippi,fddi}_type_trans set skb->dev Now all the _type_trans routines are consistent in this regard. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:26 -07:00
Arnaldo Carvalho de Melo	c8fb7948dc	[TR]: Make tr_type_trans set skb->dev Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:24 -07:00
Arnaldo Carvalho de Melo	c1a4b86e39	[TR]: Use tr_hdr() were appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:23 -07:00
Arnaldo Carvalho de Melo	7c81fd8bfb	[SOCKET]: Export __sock_recv_timestamp Kernel: arch/x86_64/boot/bzImage is ready (#2) MODPOST 1816 modules WARNING: "__sock_recv_timestamp" [net/sctp/sctp.ko] undefined! WARNING: "__sock_recv_timestamp" [net/packet/af_packet.ko] undefined! WARNING: "__sock_recv_timestamp" [net/key/af_key.ko] undefined! WARNING: "__sock_recv_timestamp" [net/ipv6/ipv6.ko] undefined! WARNING: "__sock_recv_timestamp" [net/atm/atm.ko] undefined! make[2]: * [__modpost] Error 1 make[1]: * [modules] Error 2 make: *** [_all] Error 2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:22 -07:00
Eric Dumazet	92f37fd2ee	[NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support Now that network timestamps use ktime_t infrastructure, we can add a new SOL_SOCKET sockopt SO_TIMESTAMPNS. This command is similar to SO_TIMESTAMP, but permits transmission of a 'timespec struct' instead of a 'timeval struct' control message. (nanosecond resolution instead of microsecond) Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are mutually exclusive. sock_recv_timestamp() became too big to be fully inlined so I added a __sock_recv_timestamp() helper function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:21 -07:00
Arnaldo Carvalho de Melo	c7a3c5da35	[UDP]: Use __skb_pull since we have checked it won't fail with pskb_may_pull Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:20 -07:00
Eric Dumazet	6dea649a8a	[NET]: New sysctls should use __read_mostly tags net_msg_warn should be placed in the read_mostly section, to avoid performance problems on SMP Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:19 -07:00
YOSHIFUJI Hideaki	e5268f12f2	[IPV6]: Ensure to truncate result and return full length for sticky options. Bug noticed by Chris Wright <chrisw@sous-sol.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:17 -07:00
YOSHIFUJI Hideaki	4c6510a738	[IPV6]: Return correct result for sticky options. We returned incorrect result with IPV6_RTHDRDSTOPTS, IPV6_RTHDR and IPV6_DSTOPTS. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:16 -07:00
Stephen Hemminger	3fbe070a42	[UDP]: deinline A couple of functions are exported or used indirectly so it is pointless to mark them as inline. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:15 -07:00
Stephen Hemminger	6f05f62971	[NET]: deinline some functions Several functions are marked inline or forced inline, but it would be better to let the compiler decide. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:14 -07:00
Stephen Hemminger	2de979bd7d	[TCP]: whitespace cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:13 -07:00
Stephen Hemminger	132adf5463	[IPV4]: cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:11 -07:00
Stephen Hemminger	1ac58ee37f	[WIRELESS]: use ARRAY_SIZE() Use ARRAY_SIZE() macro now. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:10 -07:00
Stephen Hemminger	e71a4783aa	[NET] core: whitespace cleanup Fix whitespace around keywords. Fix indentation especially of switch statements. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:09 -07:00
Stephen Hemminger	add459aa1a	[UDP]: ipv6 style cleanup Fix whitespace around keywords. Eliminate unnecessary ()'s on return statements. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:08 -07:00
Stephen Hemminger	6516c65573	[UDP]: ipv4 whitespace cleanup Fix whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:07 -07:00
Stephen Hemminger	a2a316fd06	[NET]: Replace CONFIG_NET_DEBUG with sysctl. Covert network warning messages from a compile time to runtime choice. Removes kernel config option and replaces it with new /proc/sys/net/core/warnings. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:05 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
Adrian Bunk	cb69cc5236	[TCP/DCCP/RANDOM]: Remove unused exports. This patch removes the following not or no longer used exports: - drivers/char/random.c: secure_tcp_sequence_number - net/dccp/options.c: sysctl_dccp_feat_sequence_window - net/netlink/af_netlink.c: netlink_set_err Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:03 -07:00
David S. Miller	fe067e8ab5	[TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:02 -07:00
YOSHIFUJI Hideaki	02ea4923b4	[NET] TIPC: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:01 -07:00
YOSHIFUJI Hideaki	b6d9bcb069	[NET] SCHED: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:00 -07:00
YOSHIFUJI Hideaki	8f05ce91c8	[NET] NETFILTER: Use htonl() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:59 -07:00
YOSHIFUJI Hideaki	4412ec4948	[NET] IPV4: Use hton{s,l}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:58 -07:00
YOSHIFUJI Hideaki	1c9e8ef7f7	[NET] IEEE80211: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:57 -07:00
YOSHIFUJI Hideaki	f576e24ffa	[NET] ETHERNET: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:56 -07:00
YOSHIFUJI Hideaki	724800d61b	[NET] CORE: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:55 -07:00
YOSHIFUJI Hideaki	aca3192cc6	[NET] BLUETOOTH: Use cpu_to_le{16,32}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:54 -07:00
YOSHIFUJI Hideaki	acde4855bb	[NET] ATM: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:53 -07:00
YOSHIFUJI Hideaki	b93b7eebd3	[NET] 8021Q: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:53 -07:00
YOSHIFUJI Hideaki	2953fd2468	[NET] 802: Use hton{s,l}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:52 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
Herbert Xu	1ab6eb62b0	[UDP6]: Restore sk_filter optimisation This reverts the changeset [IPV6]: UDPv6 checksum. We always need to check UDPv6 checksum because it is mandatory. The sk_filter optimisation has nothing to do whether we verify the checksum. It simply postpones it to the point when the user calls recv or poll. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:50 -07:00
Eric Dumazet	243bbcaa09	[IPV4]: Optimize inet_getpeer() 1) Some sysctl vars are declared __read_mostly 2) We can avoid updating stack[] when doing an AVL lookup only. lookup() macro is extended to receive a second parameter, that may be NULL in case of a pure lookup (no need to save the AVL path). This removes unnecessary instructions, because compiler knows if this _stack parameter is NULL or not. text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:49 -07:00
Stephen Hemminger	43e683926f	[TCP] TCP Yeah: cleanup Eliminate need for full 6/4/64 divide to compute queue. Variable maxqueue was really a constant. Fix indentation. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:48 -07:00
Stephen Hemminger	c5f5877c04	[TCP] tcp_cubic: faster cube root The Newton-Raphson method is quadratically convergent so only a small fixed number of steps are necessary. Therefore it is faster to unroll the loop. Since div64_64 is no longer inline it won't cause code explosion. Also fixes a bug that can occur if x^2 was bigger than 32 bits. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:47 -07:00
YOSHIFUJI Hideaki	ca04356939	[IPV6] ADDRCONF: Fix possible inet6_ifaddr leakage with CONFIG_OPTIMISTIC_DAD. The inet6_ifaddr for source address of RS is leaked if the address is not an optimistic address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:44 -07:00
Neil Horman	95c385b4d5	[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. Nominally an autoconfigured IPv6 address is added to an interface in the Tentative state (as per RFC 2462). Addresses in this state remain in this state while the Duplicate Address Detection process operates on them to determine their uniqueness on the network. During this period, these tentative addresses may not be used for communication, increasing the time before a node may be able to communicate on a network. Using Optimistic Duplicate Address Detection, autoconfigured addresses may be used immediately for communication on the network, as long as certain rules are followed to avoid conflicts with other nodes during the Duplicate Address Detection process. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:43 -07:00
Yasuyuki Kozakai	502b093569	[IPV6] IP6TUNNEL: Enable to control the handled inner protocol. ip6_tunnel before supporting IPv4/IPv6 tunnel allows only IPPROTO_IPV6 in configurations from userland. This allows userland to set IPPROTO_IPIP and 0(wildcard). ip6_tunnel only handles allowed inner protocols. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:42 -07:00
Yasuyuki Kozakai	3144581cb0	[IPV6] IP6TUNNEL: Rename functions ip6ip6_* to ip6_tnl_*. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:41 -07:00
Yasuyuki Kozakai	c4d3efafcc	[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel. Some notes - Protocol number IPPROTO_IPIP is used for IPv4 over IPv6 packets. - If IP6_TNL_F_USE_ORIG_TCLASS is set, TOS in IPv4 header is copied to Traffic Class in outer IPv6 header on xmit. - IP6_TNL_F_USE_ORIG_FLOWLABEL is ignored on xmit of IPv4 packets, because IPv4 header does not have flow label. - Kernel sends ICMP error if IPv4 packet is too big on xmit, even if DF flag is not set. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:40 -07:00
Yasuyuki Kozakai	61ec2aec28	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_xmit(). This enables to add IPv4/IPv6 specific handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:39 -07:00
Yasuyuki Kozakai	8359925be8	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_rcv(). This enables to add IPv4/IPv6 specific handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:38 -07:00
Yasuyuki Kozakai	e490d1d85c	[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_err(). This enables to add IPv4/IPv6 specific error handling later, Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:37 -07:00
YOSHIFUJI Hideaki	7159039a12	[IPV6]: Decentralize EXPORT_SYMBOLs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:23:36 -07:00
David S. Miller	b558ff7999	[NETLINK]: Mirror UDP MSG_TRUNC semantics. If the user passes MSG_TRUNC in via msg_flags, return the full packet size not the truncated size. Idea from Herbert Xu and Thomas Graf. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:35 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
Stephen Hemminger	3927f2e8f9	[NET]: div64_64 consolidate (rev3) Here is the current version of the 64 bit divide common code. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:33 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
Stephen Hemminger	39df232f1a	[PKTGEN]: fix device name handling Since devices can change name and other wierdness, don't hold onto a copy of device name, instead use pointer to output device. Fix a couple of leaks in error handling path as well. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:31 -07:00
Stephen Hemminger	d5f1ce9a5e	[PKTGEN]: don't use __constant_htonl() The existing htonl() macro is smart enough to do the same code as using __constant_htonl() and it looks cleaner. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:30 -07:00
Stephen Hemminger	5fa6fc76f5	[PKTGEN]: use random32 Can use random32() now. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:29 -07:00
Stephen Hemminger	25c4e53a4c	[PKTGEN]: use pr_debug Remove private debug macro and replace with standard version Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:28 -07:00
Eric Dumazet	fa438ccfdf	[NET]: Keep sk_backlog near sk_lock sk_backlog is a critical field of struct sock. (known famous words) It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(), tcp_v4_rcv(), sk_receive_skb(). It really makes sense to place it next to sk_lock, because sk_backlog is only used after sk_lock locked (and thus memory cache line in L1 cache). This should reduce cache misses and sk_lock acquisition time. (In theory, we could only move the head pointer near sk_lock, and leaving tail far away, because 'tail' is normally not so hot, but keep it simple :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:27 -07:00
Ilpo Järvinen	e317f6f69c	[TCP]: FRTO undo response falls back to ratehalving one if ECEd Undoing ssthresh is disabled in fastretrans_alert whenever FLAG_ECE is set by clearing prior_ssthresh. The clearing does not protect FRTO because FRTO operates before fastretrans_alert. Moving the clearing of prior_ssthresh earlier seems to be a suboptimal solution to the FRTO case because then FLAG_ECE will cause a second ssthresh reduction in try_to_open (the first occurred when FRTO was entered). So instead, FRTO falls back immediately to the rate halving response, which switches TCP to CA_CWR state preventing the latter reduction of ssthresh. If the first ECE arrived before the ACK after which FRTO is able to decide RTO as spurious, prior_ssthresh is already cleared. Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be set also in the following ACKs resulting in rate halving response that sees TCP is already in CA_CWR, which again prevents an extra ssthresh reduction on that round-trip. If the first ECE arrived before RTO, ssthresh has already been adapted and prior_ssthresh remains cleared on entry because TCP is in CA_CWR (the same applies also to a case where FRTO is entered more than once and ECE comes in the middle). High_seq must not be touched after tcp_enter_cwr because CWR round-trip calculation depends on it. I believe that after this patch, FRTO should be ECN-safe and even able to take advantage of synergy benefits. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:26 -07:00
Ilpo Järvinen	e01f9d7793	[TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr) A local variable for icsk was created but this change was missing. Spotted by Jarek Poplawski. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:25 -07:00
Ilpo Järvinen	3cfe3baaf0	[TCP]: Add two new spurious RTO responses to FRTO New sysctl tcp_frto_response is added to select amongst these responses: - Rate halving based; reuses CA_CWR state (default) - Very conservative; used to be the only one available (=1) - Undo cwr; undoes ssthresh and cwnd reductions (=2) The response with rate halving requires a new parameter to tcp_enter_cwr because FRTO has already reduced ssthresh and doing a second reduction there has to be prevented. In addition, to keep things nice on 80 cols screen, a local variable was added. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:23 -07:00
Ilpo Järvinen	c5e7af0df5	[TCP]: Correct reordering detection change (no FRTO case) The reordering detection must work also when FRTO has not been used at all which was the original intention of mine, just the expression of the idea was flawed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:22 -07:00
Eric Dumazet	54287cc178	[TCP]: Keep copied_seq, rcv_wup and rcv_next together. I noticed in oprofile study a cache miss in tcp_rcv_established() to read copied_seq. ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293 2.0400 */ 55493 0.0281 :ffffffff80400bc9: mov 0x4c8(%r12),%eax copied_seq 543103 0.2746 :ffffffff80400bd1: cmp 0x3e0(%r12),%eax rcv_nxt if (tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len) { In this function, the cache line 0x4c0 -> 0x500 is used only for this reading 'copied_seq' field. rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...) As you suggested, I changed tcp_create_openreq_child() so that these fields are changed together, to avoid adding a new store buffer stall. Patch is 64bit friendly (no new hole because of alignment constraints) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:21 -07:00
Ilpo Järvinen	cf4c6bf83d	[TCP]: struct *sock argument renamed: sp -> sk In general, TCP code uses "sk" for struct sock pointer. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:20 -07:00
John Heffner	886236c124	[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:19 -07:00
Angelo P. Castellani	5ef814753e	[TCP] YeAH-TCP: algorithm implementation YeAH-TCP is a sender-side high-speed enabled TCP congestion control algorithm, which uses a mixed loss/delay approach to compute the congestion window. It's design goals target high efficiency, internal, RTT and Reno fairness, resilience to link loss while keeping network elements load as low as possible. For further details look here: http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf Signed-off-by: Angelo P. Castellani <angelo.castellani@gmail.con> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:18 -07:00
Ilpo Järvinen	4dc2665e36	[TCP]: SACK enhanced FRTO Implements the SACK-enhanced FRTO given in RFC4138 using the variant given in Appendix B. RFC4138, Appendix B: "This means that in order to declare timeout spurious, the TCP sender must receive an acknowledgment for non-retransmitted segment between SND.UNA and RecoveryPoint in algorithm step 3. RecoveryPoint is defined in conservative SACK-recovery algorithm [RFC3517]" The basic version of the FRTO algorithm can still be used also when SACK is enabled. To enabled SACK-enhanced version, tcp_frto sysctl is set to 2. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:16 -07:00
Ilpo Järvinen	288035f915	[TCP]: Prevent reordering adjustments during FRTO To be honest, I'm not too sure how the reord stuff works in the first place but this seems necessary. When FRTO has been active, the one and only retransmission could be unnecessary but the state and sending order might not be what the sacktag code expects it to be (to work correctly). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:15 -07:00
Ilpo Järvinen	66e93e45c0	[TCP] FRTO: Fake cwnd for ssthresh callback TCP without FRTO would be in Loss state with small cwnd. FRTO, however, leaves cwnd (typically) to a larger value which causes ssthresh to become too large in case RTO is triggered again compared to what conventional recovery would do. Because consecutive RTOs result in only a single ssthresh reduction, RTO+cumulative ACK+RTO pattern is required to trigger this event. A large comment is included for congestion control module writers trying to figure out what CA_EVENT_FRTO handler should do because there exists a remote possibility of incompatibility between FRTO and module defined ssthresh functions. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:14 -07:00
Ilpo Järvinen	d1a54c6a0a	[TCP] FRTO: Reverse RETRANS bit clearing logic Previously RETRANS bits were cleared on the entry to FRTO. We postpone that into tcp_enter_frto_loss, which is really the place were the clearing should be done anyway. This allows simplification of the logic from a clearing loop to the head skb clearing only. Besides, the other changes made in the previous patches to tcp_use_frto made it impossible for the non-SACKed FRTO to be entered if other than the head has been rexmitted. With SACK-enhanced FRTO (and Appendix B), however, there can be a number retransmissions in flight when RTO expires (same thing could happen before this patchset also with non-SACK FRTO). To not introduce any jumpiness into the packet counting during FRTO, instead of clearing RETRANS bits from skbs during entry, do it later on. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:13 -07:00
Ilpo Järvinen	46d0de4ed9	[TCP] FRTO: Entry is allowed only during (New)Reno like recovery This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:12 -07:00
Ilpo Järvinen	7c9a4a5b67	[TCP]: Prevent unrelated cwnd adjustment while using FRTO FRTO controls cwnd when it still processes the ACK input or it has just reverted back to conventional RTO recovery; the normal rules apply when FRTO has reverted to standard congestion control. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:11 -07:00
Ilpo Järvinen	94d0ea7786	[TCP] FRTO: frto_counter modulo-op converted to two assignments Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:10 -07:00
Ilpo Järvinen	52c63f1e86	[TCP]: Don't enter to fast recovery while using FRTO Because TCP is not in Loss state during FRTO recovery, fast recovery could be triggered by accident. Non-SACK FRTO is more robust than not yet included SACK-enhanced version (that can receiver high number of duplicate ACKs with SACK blocks during FRTO), at least with unidirectional transfers, but under extraordinary patterns fast recovery can be incorrectly triggered, e.g., Data loss+ACK losses => cumulative ACK with enough SACK blocks to meet sacked_out >= dupthresh condition). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:09 -07:00
Ilpo Järvinen	aa8b6a7ad1	[TCP] FRTO: Response should reset also snd_cwnd_cnt Since purpose is to reduce CWND, we prevent immediate growth. This is not a major issue nor is "the correct way" specified anywhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:08 -07:00
Ilpo Järvinen	95c4922bf9	[TCP] FRTO: fixes fallback to conventional recovery The FRTO detection did not care how ACK pattern affects to cwnd calculation of the conventional recovery. This caused incorrect setting of cwnd when the fallback becames necessary. The knowledge tcp_process_frto() has about the incoming ACK is now passed on to tcp_enter_frto_loss() in allowed_segments parameter that gives the number of segments that must be added to packets-in-flight while calculating the new cwnd. Instead of snd_una we use FLAG_DATA_ACKED in duplicate ACK detection because RFC4138 states (in Section 2.2): If the first acknowledgment after the RTO retransmission does not acknowledge all of the data that was retransmitted in step 1, the TCP sender reverts to the conventional RTO recovery. Otherwise, a malicious receiver acknowledging partial segments could cause the sender to declare the timeout spurious in a case where data was lost. If the next ACK after RTO is duplicate, we do not retransmit anything, which is equal to what conservative conventional recovery does in such case. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:07 -07:00
Ilpo Järvinen	6408d206c7	[TCP] FRTO: Ignore some uninteresting ACKs Handles RFC4138 shortcoming (in step 2); it should also have case c) which ignores ACKs that are not duplicates nor advance window (opposite dir data, winupdate). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:06 -07:00
Ilpo Järvinen	7b0eb22b1d	[TCP] FRTO: Use Disorder state during operation instead of Open Retransmission counter assumptions are to be changed. Forcing reason to do this exist: Using sysctl in check would be racy as soon as FRTO starts to ignore some ACKs (doing that in the following patches). Userspace may disable it at any moment giving nice oops if timing is right. frto_counter would be inaccessible from userspace, but with SACK enhanced FRTO retrans_out can include other than head, and possibly leaving it non-zero after spurious RTO, boom again. Luckily, solution seems rather simple: never go directly to Open state but use Disorder instead. This does not really change much, since TCP could anyway change its state to Disorder during FRTO using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when a SACK block makes ACK dubious). Besides, Disorder seems to be the state where TCP should be if not recovering (in Recovery or Loss state) while having some retransmissions in-flight (see tcp_try_to_open), which is exactly what happens with FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:05 -07:00
Ilpo Järvinen	7487c48c4f	[TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh In case a latency spike causes more than one RTO, the later should not cause the already reduced ssthresh to propagate into the prior_ssthresh since FRTO declares all such RTOs spurious at once or none of them. In treating of ssthresh, we mimic what tcp_enter_loss() does. The previous state (in frto_counter) must be available until we have checked it in tcp_enter_frto(), and also ACK information flag in process_frto(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:04 -07:00
Ilpo Järvinen	30935cf4f9	[TCP] FRTO: Comment cleanup & improvement Moved comments out from the body of process_frto() to the head (preferred way; see Documentation/CodingStyle). Bonus: it's much easier to read in this compacted form. FRTO algorithm and implementation is described in greater detail. For interested reader, more information is available in RFC4138. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:03 -07:00
Ilpo Järvinen	bdaae17da8	[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:02 -07:00
Ilpo Järvinen	9ead9a1d38	[TCP] FRTO: Separated response from FRTO detection algorithm FRTO spurious RTO detection algorithm (RFC4138) does not include response to a detected spurious RTO but can use different response algorithms. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:01 -07:00
Ilpo Järvinen	522e7548a9	[TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit FRTO was slightly too brave... Should only clear TCPCB_SACKED_RETRANS bit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:00 -07:00
Alexey Kuznetsov	1194ed0a3e	[NETLINK]: Infinite recursion in netlink. Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 13:07:28 -07:00
YOSHIFUJI Hideaki	a23cf14b16	IPv6: fix Routing Header Type 0 handling thinko Oops, thinko. The test for accempting a RH0 was exatly the wrong way around. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 19:26:06 -07:00
YOSHIFUJI Hideaki	0bcbc92629	[IPV6]: Disallow RH0 by default. A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-24 14:58:30 -07:00
Patrick McHardy	05d224468a	[XFRM]: beet: fix pseudo header length value draft-nikander-esp-beet-mode-07.txt is not entirely clear on how the length value of the pseudo header should be calculated, it states "The Header Length field contains the length of the pseudo header, IPv4 options, and padding in 8 octets units.", but also states "Length in octets (Header Len + 1) * 8". draft-nikander-esp-beet-mode-08-pre1.txt [1] clarifies this, the header length should not include the first 8 byte. This change affects backwards compatibility, but option encapsulation didn't work until very recently anyway. [1] http://users.piuha.net/jmelen/BEET/draft-nikander-esp-beet-mode-08-pre1.txt Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:39:02 -07:00
Stephen Hemminger	4d4d3d1e88	[TCP]: Congestion control initialization. Change to defer congestion control initialization. If setsockopt() was used to change TCP_CONGESTION before connection is established, then protocols that use sequence numbers to keep track of one RTT interval (vegas, illinois, ...) get confused. Change the init hook to be called after handshake. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:32:11 -07:00
Trond Myklebust	241c39b9ac	RPC: Fix the TCP resend semantics for NFSv4 Fix a regression due to the patch "NFS: disconnect before retrying NFSv4 requests over TCP" The assumption made in xprt_transmit() that the condition "req->rq_bytes_sent == 0 and request is on the receive list" should imply that we're dealing with a retransmission is false. Firstly, it may simply happen that the socket send queue was full at the time the request was initially sent through xprt_transmit(). Secondly, doing this for each request that was retransmitted implies that we disconnect and reconnect for _every_ request that happened to be retransmitted irrespective of whether or not a disconnection has already occurred. Fix is to move this logic into the call_status request timeout handler. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:30 -07:00
Denis Lunev	ac57b3a9ce	[NETLINK]: Don't attach callback to a going-away netlink socket There is a race between netlink_dump_start() and netlink_release() that can lead to the situation when a netlink socket with non-zero callback is freed. Here it is: CPU1: CPU2 netlink_release(): netlink_dump_start(): sk = netlink_lookup(); /* OK / netlink_remove(); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } spin_unlock(&nlk->cb_lock); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } nlk->cb = cb; spin_unlock(&nlk->cb_lock); ... sock_orphan(sk); / * proceed with releasing * the socket */ The proposal it to make sock_orphan before detaching the callback in netlink_release() and to check for the sock to be SOCK_DEAD in netlink_dump_start() before setting a new callback. Signed-off-by: Denis Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 17:05:58 -07:00
Olaf Kirch	bfb6709d0b	[IrDA]: Correctly handling socket error This patch fixes an oops first reported in mid 2006 - see http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that when an error is signalled on the socket, irda_recvmsg_stream returns without removing a local wait_queue variable from the socket's sk_sleep queue. This causes havoc further down the road. In response to this problem, a patch was made that invoked sock_orphan on the socket when receiving a disconnect indication. This is not a good fix, as this sets sk_sleep to NULL, causing applications sleeping in recvmsg (and other places) to oops. This is against the latest net-2.6 and should be considered for -stable inclusion. Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 15:07:22 -07:00
Vlad Yasevich	d0cf0d9940	[SCTP]: Do not interleave non-fragments when in partial delivery The way partial delivery is currently implemnted, it is possible to intereleave a message (either from another steram, or unordered) that is not part of partial delivery process. The only way to this is for a message to not be a fragment and be 'in order' or unorderd for a given stream. This will result in bypassing the reassembly/ordering queues where things live duing partial delivery, and the message will be delivered to the socket in the middle of partial delivery. This is a two-fold problem, in that: 1. the app now must check the stream-id and flags which it may not be doing. 2. this clearing partial delivery state from the association and results in ulp hanging. This patch is a band-aid over a much bigger problem in that we don't do stream interleave. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 14:16:09 -07:00
David S. Miller	fefaa75e04	[IPSEC] af_key: Fix thinko in pfkey_xfrm_policy2msg() Make sure to actually assign the determined mode to rq->sadb_x_ipsecrequest_mode. Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 14:16:07 -07:00
Linus Torvalds	80d74d5123	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [BRIDGE]: Unaligned access when comparing ethernet addresses [SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation. [SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message [NET]: Set a separate lockdep class for neighbour table's proxy_queue [NET]: Fix UDP checksum issue in net poll mode. [KEY]: Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx. [NET]: Get rid of alloc_skb_from_cache	2007-04-17 16:51:32 -07:00
NeilBrown	30f3deeee8	knfsd: use a spinlock to protect sk_info_authunix sk_info_authunix is not being protected properly so the object that it points to can be cache_put twice, leading to corruption. We borrow svsk->sk_defer_lock to provide the protection. We should probably rename that lock to have a more generic name - later. Thanks to Gabriel for reporting this. Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: Gabriel Barazer <gabriel@oxeva.fr> Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-17 16:36:27 -07:00
Evgeny Kravtsunov	19bb3506e2	[BRIDGE]: Unaligned access when comparing ethernet addresses compare_ether_addr() implicitly requires that the addresses passed are 2-bytes aligned in memory. This is not true for br_stp_change_bridge_id() and br_stp_recalculate_bridge_id() in which one of the addresses is unsigned char *, and thus may not be 2-bytes aligned. Signed-off-by: Evgeny Kravtsunov <emkravts@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org>	2007-04-17 14:16:00 -07:00
Paolo Galtieri	0304ff8a2d	[SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation. During the sctp_bindx() call to add additional addresses to the endpoint, any v4mapped addresses are converted and stored as regular v4 addresses. However, when trying to remove these addresses, the v4mapped addresses are not converted and the operation fails. This patch unmaps the addresses on during the remove operation as well. Signed-off-by: Paolo Galtieri <pgaltieri@mvista.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:42 -07:00
Tsutomu Fujii	ea2bc483ff	[SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message In current implementation, LKSCTP does receive buffer accounting for data in sctp_receive_queue and pd_lobby. However, LKSCTP don't do accounting for data in frag_list when data is fragmented. In addition, LKSCTP doesn't do accounting for data in reasm and lobby queue in structure sctp_ulpq. When there are date in these queue, assertion failed message is printed in inet_sock_destruct because sk_rmem_alloc of oldsk does not become 0 when socket is destroyed. Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:37 -07:00
Pavel Emelianov	c2ecba7171	[NET]: Set a separate lockdep class for neighbour table's proxy_queue Otherwise the following calltrace will lead to a wrong lockdep warning: neigh_proxy_process() `- lock(neigh_table->proxy_queue.lock); arp_redo /* via tbl->proxy_redo */ arp_process neigh_event_ns neigh_update skb_queue_purge `- lock(neighbor->arp_queue.lock); This is not a deadlock actually, as neighbor table's proxy_queue and the neighbor's arp_queue are different queues. Lockdep thinks there is a deadlock as both queues are initialized with skb_queue_head_init() and thus have a common class. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:31 -07:00
Aubrey.Li	5e7d7fa573	[NET]: Fix UDP checksum issue in net poll mode. In net poll mode, the current checksum function doesn't consider the kind of packet which is padded to reach a specific minimum length. I believe that's the problem causing my test case failed. The following patch fixed this issue. Signed-off-by: Aubrey.Li <aubreylee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:26 -07:00

... 3 4 5 6 7 ...

4772 Commits