linux/include/net
Eric Dumazet 81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
..
bluetooth [Bluetooth] Add support for extended inquiry responses 2005-09-13 01:32:25 +02:00
irda [IRDA]: IrDA prototype fixes 2005-09-05 18:08:11 -07:00
sctp [TCP]: Move the tcp sock states to net/tcp_states.h 2005-08-29 15:41:54 -07:00
tc_act
act_api.h [NET]: Kill skb->tc_classid 2005-08-29 15:31:18 -07:00
act_generic.h [PKT_SCHED]: Fixup simple action define. 2005-05-19 12:42:39 -07:00
addrconf.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
af_unix.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
ah.h
arp.h [NET]: Kill skb->real_dev 2005-08-29 15:32:25 -07:00
atmclip.h
ax25.h [AX.25]: Add descriptions to constants 2005-09-12 14:24:24 -07:00
checksum.h
compat.h [NET]: Need struct sock forward decl in net/compat.h 2005-09-08 12:32:46 -07:00
datalink.h [NET]: Kill skb->real_dev 2005-08-29 15:32:25 -07:00
dn_dev.h
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h [TCP]: Move the tcp sock states to net/tcp_states.h 2005-08-29 15:41:54 -07:00
dsfield.h
dst.h
esp.h
flow.h
gen_stats.h
icmp.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
ieee80211_crypt.h [NET] ieee80211 subsystem 2005-05-12 22:48:20 -04:00
ieee80211.h [PATCH] ieee80211: Fix debug comments ipw->ieee80211 2005-08-28 19:23:07 -04:00
if_inet6.h
inet6_hashtables.h [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
inet_common.h [INET]: Move the TCP hashtable functions/structs to inet_hashtables.[ch] 2005-08-29 15:38:39 -07:00
inet_connection_sock.h [INET]: compile errors when DEBUG is defined 2005-08-29 22:51:28 -07:00
inet_ecn.h
inet_hashtables.h [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
inet_timewait_sock.h [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
inetpeer.h
ip6_checksum.h
ip6_fib.h [IPV6]: V6 route events reported with wrong netlink PID and seq number 2005-06-21 13:51:04 -07:00
ip6_route.h [TCP]: Move the tcp sock states to net/tcp_states.h 2005-08-29 15:41:54 -07:00
ip6_tunnel.h
ip_fib.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
ip_mp_alg.h
ip_vs.h [IPVS]: ip_vs_ftp breaks connections using persistence 2005-09-14 21:08:51 -07:00
ip.h [IP]: Introduce ip_options_get_from_user 2005-08-29 16:01:39 -07:00
ipcomp.h
ipconfig.h
ipip.h
ipv6.h [IPV6]: Support IPV6_{RECV,}TCLASS socket options / ancillary data. 2005-09-08 10:19:03 +09:00
ipx.h
iw_handler.h [PATCH] WE-19 for kernel 2.6.13 2005-09-06 22:40:24 -04:00
lapb.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h [LLC]: fix llc_ui_recvmsg, making it behave like tcp_recvmsg 2005-09-22 08:29:08 -03:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h [LLC]: Fix the accept path 2005-09-22 07:57:21 -03:00
llc.h [LLC]: Fix sparse warnings 2005-09-22 05:14:33 -03:00
ndisc.h
neighbour.h [NET]: Store skb->timestamp as offset to a base timestamp 2005-08-29 15:58:24 -07:00
netrom.h [NETROM]: Introduct stuct nr_private 2005-09-12 14:28:03 -07:00
p8022.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
pkt_act.h
pkt_cls.h [NET]: Remove explicit initializations of skb->input_dev 2005-08-29 15:33:26 -07:00
pkt_sched.h [PKT_SCHED]: Cleanup qdisc creation and alignment macros 2005-07-05 14:15:09 -07:00
protocol.h
psnap.h [NET]: Kill skb->real_dev 2005-08-29 15:32:25 -07:00
raw.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
rawv6.h [IPV4/6]: Check if packet was actually delivered to a raw socket to decide whether to send an ICMP unreachable 2005-08-29 15:37:22 -07:00
request_sock.h [ICSK]: Move generalised functions from tcp to inet_connection_sock 2005-08-29 15:49:50 -07:00
rose.h
route.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
sch_generic.h [PKT_SCHED]: Cleanup qdisc creation and alignment macros 2005-07-05 14:15:09 -07:00
scm.h
slhc_vj.h [NET]: Remove __ARGS from include/net/slhc_vj.h 2005-07-05 15:12:04 -07:00
snmp.h [PATCH] smp_processor_id() cleanup 2005-06-21 18:46:13 -07:00
sock.h [INET]: speedup inet (tcp/dccp) lookups 2005-10-03 14:13:38 -07:00
syncppp.h
tcp_ecn.h [NET]: Introduce inet_connection_sock 2005-08-29 15:43:19 -07:00
tcp_states.h [TCP]: Move the tcp sock states to net/tcp_states.h 2005-08-29 15:41:54 -07:00
tcp.h [TCP]: Keep TSO enabled even during loss events. 2005-09-01 22:47:01 -07:00
transp_v6.h [IPV6]: Support IPV6_{RECV,}TCLASS socket options / ancillary data. 2005-09-08 10:19:03 +09:00
udp.h [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
x25.h [NET]: Kill skb->real_dev 2005-08-29 15:32:25 -07:00
x25device.h [NET]: Remove explicit initializations of skb->input_dev 2005-08-29 15:33:26 -07:00
xfrm.h [IPV4]: possible cleanups 2005-08-29 15:33:20 -07:00