Commit Graph

219 Commits

Author SHA1 Message Date
Stephen Hemminger 7be87351a1 tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
I found another case where we are sending information to userspace
in the wrong HZ scale.  This should have been fixed back in 2.5 :-(

This means an ABI change but as it stands there is no way for an application
like ss to get the right value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 20:00:19 -07:00
David S. Miller caea902f72 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/rt2x00/Kconfig
	drivers/net/wireless/rt2x00/rt2x00usb.c
	net/sctp/protocol.c
2008-06-16 18:25:48 -07:00
Rami Rosen a9d246dbb0 ipv4: Remove unused definitions in net/ipv4/tcp_ipv4.c.
1) Remove ICMP_MIN_LENGTH, as it is unused.

2) Remove unneeded tcp_v4_send_check() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:07:16 -07:00
Brian Haley 7d06b2e053 net: change proto destroy method to return void
Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-14 17:04:49 -07:00
David S. Miller 4ae127d1b6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/smc911x.c
2008-06-13 20:52:39 -07:00
David S. Miller ec0a196626 tcp: Revert 'process defer accept as established' changes.
This reverts two changesets, ec3c0982a2
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adb
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems.  The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock.  The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> 		goto death;
> 	}
>
>+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+		tcp_send_active_reset(sk, GFP_ATOMIC);
>+		goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-12 16:34:35 -07:00
David S. Miller e6e30add6b Merge branch 'net-next-2.6-misc-20080612a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next 2008-06-11 22:33:59 -07:00
Adrian Bunk 0b04082995 net: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 21:00:38 -07:00
YOSHIFUJI Hideaki 9501f97229 tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki 8d26d76dd4 tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6.
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki 076fb72233 tcp md5sig: Remove redundant protocol argument.
Protocol is always TCP, so remove useless protocol argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki 7d5d5525bd tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:18 +09:00
Arnaldo Carvalho de Melo ce4a7d0d48 inet{6}_request_sock: Init ->opt and ->pktopts in the constructor
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 12:39:35 -07:00
Denis V. Lunev 84841c3c6c ipv4: assign PDE->data before gluing PDE into /proc tree
The check for PDE->data != NULL becomes useless after the replacement
of proc_net_fops_create with proc_create_data.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 04:10:08 -07:00
Pavel Emelyanov 5e659e4cb0 [NET]: Fix heavy stack usage in seq_file output routines.
Plan C: we can follow the Al Viro's proposal about %n like in this patch.
The same applies to udp, fib (the /proc/net/route file), rt_cache and 
sctp debug. This is minus ~150-200 bytes for each.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-24 01:02:16 -07:00
YOSHIFUJI Hideaki a7d632b6b4 [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:00 -07:00
Denis V. Lunev 5f4472c5a6 [TCP]: Remove owner from tcp_seq_afinfo.
Move it to tcp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:53 -07:00
Denis V. Lunev 68fcadd16c [TCP]: Place file operations directly into tcp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:30 -07:00
Denis V. Lunev 52d6f3f11b [TCP]: Cleanup /proc/tcp[6] creation/removal.
Replace seq_open with seq_open_net and remove tcp_seq_release
completely.  seq_release_net will do this job just fine.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:12:41 -07:00
Denis V. Lunev 9427c4b36b [TCP]: Move seq_ops from tcp_iter_state to tcp_seq_afinfo.
No need to create seq_operations for each instance of 'netstat'.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:12:13 -07:00
Denis V. Lunev 1abf4fb20d [TCP]: No need to check afinfo != NULL in tcp_proc_(un)register.
tcp_proc_register/tcp_proc_unregister are called with a static pointer only.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:11:46 -07:00
Denis V. Lunev a4146b1b2c [TCP]: Replace struct net on tcp_iter_state with seq_net_private.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:11:14 -07:00
Florian Westphal 4dfc281702 [Syncookies]: Add support for TCP options via timestamps.
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.

Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.

The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.

changes since v1:
 don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
 and have ipv6/syncookies.c use it.
 Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()

Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 03:12:40 -07:00
Denis V. Lunev 7feb49c82a [NETNS]: Use TCP control socket from a correct namespace.
Signed-off-by: Denis V.Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:32:00 -07:00
Denis V. Lunev 046ee90235 [NETNS]: Create tcp control socket in the each namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:31:33 -07:00
Denis V. Lunev 5677242f43 [NETNS]: Inet control socket should not hold a namespace.
This is a generic requirement, so make inet_ctl_sock_create namespace
aware and create a inet_ctl_sock_destroy wrapper around
sk_release_kernel.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:28:30 -07:00
Denis V. Lunev eee4fe4ded [INET]: Let inet_ctl_sock_create return sock rather than socket.
All upper protocol layers are already use sock internally.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:27:58 -07:00
Denis V. Lunev 3d58b5fa8e [INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create.
This call is nothing common with INET connection sockets code. It
simply creates an unhashes kernel sockets for protocol messages.

Move the new call into af_inet.c after the rename.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:22:32 -07:00
Denis V. Lunev 14c0c8e8e0 [TCP]: Replace socket with sock for reset sending.
Replace tcp_socket with tcp_sock. This is more effective (less
derefferences on fast paths). Additionally, the approach is unified to
one used in ICMP.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:19:38 -07:00
Pavel Emelyanov bdcde3d71a [SOCK]: Drop inuse pcounter from struct proto (v2).
An uppercut - do not use the pcounter on struct proto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:39:33 -07:00
YOSHIFUJI Hideaki 878628fbf2 [NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.

We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki 3b1e0a655f [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki c346dca108 [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:53 +09:00
Pavel Emelyanov 757764f61d [NETNS][TCP]: Register /proc/net/tcp in a namespace.
After the commit f40c8174d3 ([NETNS][IPV4] 
tcp - make proc handle the network namespaces) it is now possible to make
this file present in newly created namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:56:02 -07:00
Pavel Emelyanov 39d8cda76c [SOCK]: Add udp_hash member to struct proto.
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to 
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 
and -v6 protocols.

The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.

The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization 
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc 
warning about inability to initialize anonymous member this way.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:50:58 -07:00
Patrick McManus ec3c0982a2 [TCP]: TCP_DEFER_ACCEPT updates - process as established
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.

Benefits:
  - established connection is now reset if it never makes it
   to the accept queue

 - diagnostic state of established matches with the packet traces
   showing completed handshake

 - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
   enforced with reasonable accuracy instead of rounding up to next
   exponential back-off of syn-ack retry.

Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 16:33:01 -07:00
Pavel Emelyanov 28518fc170 [NET]: NULL pointer dereference and other nasty things in /proc/net/(tcp|udp)[6]
Commits f40c81 ([NETNS][IPV4] tcp - make proc handle the network
namespaces) and a91275 ([NETNS][IPV6] udp - make proc handle the
network namespace) both introduced bad checks on sockets and tw
buckets to belong to proper net namespace.

I.e. when checking for socket to belong to given net and family the

	do {
		sk = sk_next(sk);
	} while (sk && sk->sk_net != net && sk->sk_family != family);

constructions were used. This is wrong, since as soon as the
sk->sk_net fits the net the socket is immediately returned, even if it
belongs to other family.

As the result four /proc/net/(udp|tcp)[6] entries show wrong info.
The udp6 entry even oopses when dereferencing inet6_sk(sk) pointer:

static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket)
{
	...
        struct ipv6_pinfo *np = inet6_sk(sp);
	...

        dest  = &np->daddr; /* will be NULL for AF_INET sockets */
	...
	seq_printf(...
	           dest->s6_addr32[0], dest->s6_addr32[1],
                   dest->s6_addr32[2], dest->s6_addr32[3],
	...

Fix it by converting && to ||.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 15:52:00 -07:00
Daniel Lezcano 6f8b13bcb3 [NETNS][IPV6] tcp6 - make proc per namespace
Make the proc for tcp6 to be per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:14:45 -07:00
Daniel Lezcano f40c8174d3 [NETNS][IPV4] tcp - make proc handle the network namespaces
This patch, like udp proc, makes the proc functions to take care of
which namespace the socket belongs.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:13:54 -07:00
Eric Dumazet ee6b967301 [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts
(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 18:30:47 -08:00
Denis V. Lunev 7cd04fa7e3 [TCP]: Merge exit paths in tcp_v4_conn_request.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 11:59:32 -08:00
Denis V. Lunev fd80eb942a [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.
It looks like dst parameter is used in this API due to historical
reasons.  Actually, it is really used in the direct call to
tcp_v4_send_synack only.  So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:43:03 -08:00
Denis V. Lunev 9b0f976f27 [INET]: Remove struct net_proto_family* from _init calls.
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:13:15 -08:00
Kris Katterjohn 9bf1d83e7e [TCP]: Fix tcp_v4_send_synack() comment
Signed-off-by: Kris Katterjohn <katterjohn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:29:19 -08:00
Arnaldo Carvalho de Melo ab1e0a13d7 [SOCK] proto: Add hashinfo member to struct proto
This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Pavel Emelyanov c67499c0e7 [NETNS]: Tcp-v4 sockets per-net lookup.
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.

The dccp and inet_diag, which use that lookup functions
pass the init_net into them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:19 -08:00
YOSHIFUJI Hideaki 9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
Pavel Emelyanov 152da81deb [INET]: Uninline the __inet_hash function.
This one is used in quite many places in the networking code and
seems to big to be inline.

After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function                                     old     new   delta
__inet_hash_nolisten                           -     282    +282
__inet_hash                                    -     179    +179
tcp_sacktag_write_queue                     2255    2254      -1
__inet_lookup_listener                       284     274     -10
tcp_v4_syn_recv_sock                         755     493    -262
tcp_v4_hash                                  389      35    -354
inet_hash_connect                           1086     599    -487

This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:26 -08:00
YOSHIFUJI Hideaki 354faf0977 [IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-20 17:30:31 -08:00
YOSHIFUJI Hideaki a80cc20da4 [IPV4] TCPMD5: Omit redundant NULL check for kfree() argument.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-20 17:30:06 -08:00
Eric Dumazet 230140cffa [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:11 -08:00
Eric Dumazet 47a31a6ffc [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure
Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
"inuse sockets" infrastructure

Each protocol use then a static percpu var, instead of a dynamic one.
This saves some ram and some cpu cycles

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:58 -08:00
Jens Axboe c46f2334c8 [SG] Get rid of __sg_mark_end()
sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.

So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
David S. Miller 51c739d1f4 [NET]: Fix incorrect sg_mark_end() calls.
This fixes scatterlist corruptions added by

	commit 68e3f5dd4d
	[CRYPTO] users: Fix up scatterlist conversion errors

The issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.

The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().

After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.

I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().

Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.

Arguably sg_mark_end() is a poorly named function because it doesn't
just "mark", it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.

The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:29:29 -07:00
Matthias M. Dellweg b0a713e9e6 [TCP] MD5: Remove some more unnecessary casting.
while reviewing the tcp_md5-related code further i came across with
another two of these casts which you probably have missed. I don't
actually think that they impose a problem by now, but as you said we
should remove them.

Signed-off-by: Matthias M. Dellweg <2500@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-29 22:37:27 -07:00
David S. Miller c7da57a183 [TCP]: Fix scatterlist handling in MD5 signature support.
Use sg_init_table() and sg_mark_end() as needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 00:41:21 -07:00
Stephen Hemminger 227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Eric W. Biederman 457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
David S. Miller f8ab18d2d9 [TCP]: Fix MD5 signature handling on big-endian.
Based upon a report and initial patch by Peter Lieven.

tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key.  Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().

Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses.  This just so happens to work by accident on
little-endian, but on big-endian it doesn't.

Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-28 15:18:35 -07:00
David S. Miller 3516ffb0fe [TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg().
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.

The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP.  Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.

TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:28 -07:00
Herbert Xu a7ab4b501f [TCPv4]: Improve BH latency in /proc/net/tcp
Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table.  Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.

This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.

The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:20 -07:00
David S. Miller 3d7dbeac58 [TCP]: Disable TSO if MD5SIG is enabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-12 14:36:42 -07:00
Patrick McHardy f0e48dbfc5 [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack
A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.

Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:38:51 -07:00
Wei Dong 584bdf8cbd [IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP
Signed-off-by: Wei Dong <weidong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:50 -07:00
Herbert Xu 604763722c [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY
When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
treat it as such in the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:43 -07:00
Herbert Xu 663ead3bb8 [NET]: Use csum_start offset instead of skb_transport_header
The skb transport pointer is currently used to specify the start
of the checksum region for transmit checksum offload.  Unfortunately,
the same pointer is also used during receive side processing.

This creates a problem when we want to retransmit a received
packet with partial checksums since the skb transport pointer
would be overwritten.

This patch solves this problem by creating a new 16-bit csum_start
offset value to replace the skb transport header for the purpose
of checksums.  This offset is calculated from skb->head so that
it does not have to change when skb->data changes.

No extra space is required since csum_offset itself fits within
a 16-bit word so we can use the other 16 bits for csum_start.

For backwards compatibility, just before we push a packet with
partial checksums off into the device driver, we set the skb
transport header to what it would have been under the old scheme.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:40 -07:00
Eric Dumazet 4103f8cd5c [TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates
tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated.
(Very hot cache line)
It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP.

ffffffff8081d9c0 B tcp_orphan_count
ffffffff8081d9c4 B tcp_memory_allocated
ffffffff8081d9c8 B tcp_sockets_allocated
ffffffff8081d9cc B tcp_memory_pressure
ffffffff8081d9d0 b tcp_md5sig_users
ffffffff8081d9d8 b tcp_md5sig_pool
ffffffff8081d9e0 b warntime.31570
ffffffff8081d9e8 b tcp_socket

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:19 -07:00
Arnaldo Carvalho de Melo aa8223c7bb [SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo ab6a5bb6b2 [TCP]: Introduce tcp_hdrlen() and tcp_optlen()
The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to
avoid the longer, open coded equivalent.

Ditched a no-op in bnx2 in the process.

I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()...

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:24 -07:00
Arnaldo Carvalho de Melo 88c7664f13 [SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo eddc9ec53b [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:10 -07:00
David S. Miller fe067e8ab5 [TCP]: Abstract out all write queue operations.
This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:02 -07:00
James Morris 9d729f72dc [NET]: Convert xtime.tv_sec to get_seconds()
Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:32 -07:00
Ilpo Järvinen cf4c6bf83d [TCP]: struct *sock argument renamed: sp -> sk
In general, TCP code uses "sk" for struct sock pointer.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:20 -07:00
YOSHIFUJI Hideaki e905a9edab [NET] IPV4: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:19:39 -08:00
Eric Dumazet dbca9b2750 [NET]: change layout of ehash table
ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads 
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead 
of only one, we probably can avoid one cache miss, and reduce ram usage, 
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future 
patch could try to make this structure have a convenient size (a power of two 
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could 
probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
to free the unused space. It currently allocates a big zone, but the last 
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 14:16:46 -08:00
David S. Miller 8eb9086f21 [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.
Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).

With help from Venkat Tekkirala.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:45 -08:00
Frederik Deweerdt ba7808eac1 [TCP]: remove tcp header from tcp_v4_check (take #2)
The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:44 -08:00
Craig Schlenter cb48cfe807 [TCP]: Fix iov_len calculation in tcp_v4_send_ack().
This fixes the ftp stalls present in the current kernels.

All credit goes to Komuro <komurojun-mbn@nifty.com> for tracking
this down. The patch is untested but it looks *cough* obviously
correct.

Signed-off-by: Craig Schlenter <craig@codefountain.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 00:30:08 -08:00
Leigh Brown a9fc00cca8 [TCP]: Trivial fix to message in tcp_v4_inbound_md5_hash
The message logged in tcp_v4_inbound_md5_hash when the hash was expected
but not found was reversed.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:26 -08:00
Leigh Brown 8228a18dd3 [TCP]: Fix oops caused by tcp_v4_md5_do_del
md5sig_info.alloced4 must be set to zero when freeing keys4, otherwise
it will not be alloc'd again when another key is added to the same
socket by tcp_v4_md5_do_add.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:25 -08:00
Andrew Morton b6332e6cf9 [TCP]: Fix warnings with TCP_MD5SIG disabled.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:52 -08:00
Adrian Bunk f5b99bcddd [NET]: Possible cleanups.
This patch contains the following possible cleanups:
- make the following needlessly global functions statis:
  - ipv4/tcp.c: __tcp_alloc_md5sig_pool()
  - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
  - ipv4/udplite.c: udplite_rcv()
  - ipv4/udplite.c: udplite_err()
- make the following needlessly global structs static:
  - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
  - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
  - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
- net/ipv{4,6}/udplite.c: remove inline's from static functions
                          (gcc should know best when to inline them)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:51 -08:00
David S. Miller 08dd1a506b [TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG.
It just obfuscates the code and adds limited value.  And as Adrian
Bunk noticed, it lacked Kconfig help text too, so just kill it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:47 -08:00
Al Viro ff1dcadb1b [NET]: Split skb->csum
... into anonymous union of __wsum and __u32 (csum and csum_offset resp.)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:18 -08:00
Al Viro 8e5200f540 [NET]: Fix assorted misannotations (from md5 and udplite merges).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:27:16 -08:00
Arnaldo Carvalho de Melo f6685938f9 [TCP_IPV4]: Use kmemdup where appropriate
Also use a variable to avoid the longish tp->md5sig_info-> use
in tcp_v4_md5_do_add.

Code diff stats:

[acme@newtoy net-2.6.20]$ codiff /tmp/tcp_ipv4.o.before /tmp/tcp_ipv4.o.after
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp_ipv4.c:
  tcp_v4_md5_do_add     |  -62
  tcp_v4_syn_recv_sock  |  -32
  tcp_v4_parse_md5_keys |  -86
 3 functions changed, 180 bytes removed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:54 -08:00
Arnaldo Carvalho de Melo 7174259e6c [TCP_IPV4]: CodingStyle cleanups, no code change
Mostly related to CONFIG_TCP_MD5SIG recent merge.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:23:53 -08:00
Al Viro b51655b958 [NET]: Annotate __skb_checksum_complete() and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:38 -08:00
Al Viro 714e85be35 [IPV6]: Assorted trivial endianness annotations.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:50 -08:00
YOSHIFUJI Hideaki cfb6eeb4c8 [TCP]: MD5 Signature Option (RFC2385) support.
Based on implementation by Rick Payne.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:39 -08:00
Gerrit Renker b9df3cb8cf [TCP/DCCP]: Introduce net_xmit_eval
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.

This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:22:27 -08:00
Gerrit Renker a94f723d59 [TCP]: Remove dead code in init_sequence
This removes two redundancies:

1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
   is always true, due to
	* tcp_v6_conn_request() is the only function calling this one
	* tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
	  tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]

2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
   never used.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:22:10 -08:00
Eric Dumazet 72a3effaf6 [NET]: Size listen hash tables using backlog hint
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:44 -08:00
Eric Dumazet 06ca719fad [TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err()
I believe this NET_INC_STATS() call can be replaced by
NET_INC_STATS_BH(), a little bit cheaper.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-20 00:22:25 -07:00
YOSHIFUJI Hideaki 9469c7b4aa [NET]: Use typesafe inet_twsk() inline function instead of cast.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:58 -07:00
YOSHIFUJI Hideaki 4244f8a9f8 [TCP]: Use TCPOLEN_TSTAMP_ALIGNED macro instead of magic number.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11 23:59:54 -07:00
Al Viro 23f33c2d4f [IPV4]: struct inet_timewait_sock annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:02:27 -07:00
Al Viro adaf345b53 [IPV4]: annotate address in inet_request_sock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:51 -07:00
Al Viro bada8adc4e [IPV4]: ip_route_connect() ipv4 address arguments annotated
annotated address arguments (port number left alone for now); ditto
for inferred net-endian variables in callers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 17:54:06 -07:00