linux/net/ipv4
Mike Stroyan 18955cfcb2 [IPV4] tcp/route: Another look at hash table sizes
The tcp_ehash hash table gets too big on systems with really big memory.
It is worse on systems with pages larger than 4KB.  It wastes memory that
could be better used.  It also makes the netstat command slow because reading
/proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.

  The default value should not be larger for larger page sizes.  It seems
that the effect of page size is an unintended error dating back a long
time.  I also wonder if the default value really should be a larger
fraction of memory for systems with more memory.  While systems with
really big ram can afford more space for hash tables, it is not clear to
me that they benefit from increasing the allocation ratio for this table.

  The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
mm/page_alloc.c:alloc_large_system_hash.

tcp_init calls alloc_large_system_hash passing parameters-
    bucketsize=sizeof(struct tcp_ehash_bucket)
    numentries=thash_entries
    scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
    limit=0

On i386, PAGE_SHIFT is 12 for a page size of 4K
On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K

The num_physpages test above makes the allocation take a larger fraction
of the total memory on systems with larger memory.  The threshold size
for a i386 system is 512MB.  For an ia64 system with 16KB pages the
threshold is 2GB.

For smaller memory systems-
On i386, scale = (27 - 12) = 15
On ia64, scale = (27 - 14) = 13
For larger memory systems-
On i386, scale = (25 - 12) = 13
On ia64, scale = (25 - 14) = 11

  For the rest of this discussion, I'll just track the larger memory case.

  The default behavior has numentries=thash_entries=0, so the allocated
size is determined by either scale or by the default limit of 1/16 of
total memory.

In alloc_large_system_hash-
|	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
|	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
|	numentries >>= 20 - PAGE_SHIFT;
|	numentries <<= 20 - PAGE_SHIFT;

  At this point, numentries is pages for all of memory, rounded up to the
nearest megabyte boundary.

|	/* limit to 1 bucket per 2^scale bytes of low memory */
|	if (scale > PAGE_SHIFT)
|		numentries >>= (scale - PAGE_SHIFT);
|	else
|		numentries <<= (PAGE_SHIFT - scale);

On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
bytes of total memory.
On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
bytes of total memory.

|        log2qty = long_log2(numentries);
|
|        do {
|                size = bucketsize << log2qty;

bucketsize is 16, so size is 16 times numentries, rounded
down to a power of two.

On i386, size is 1/512 of bytes of total memory.
On ia64, size is 1/128 of bytes of total memory.

For smaller systems the results are
On i386, size is 1/2048 of bytes of total memory.
On ia64, size is 1/512 of bytes of total memory.

  The large page effect can be removed by just replacing
the use of PAGE_SHIFT with a constant of 12 in the calls to
alloc_large_system_hash.  That makes them more like the other uses of
that function from fs/inode.c and fs/dcache.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:12:55 -08:00
..
ipvs [NET]: kfree cleanup 2005-11-08 09:41:34 -08:00
netfilter [NETFILTER]: ip_conntrack_netlink.c needs linux/interrupt.h 2005-11-23 19:03:46 -08:00
Kconfig [INET_DIAG]: Move the tcp_diag interface to the proper place 2005-08-29 15:57:54 -07:00
Makefile [INET_DIAG]: Move the tcp_diag interface to the proper place 2005-08-29 15:57:54 -07:00
af_inet.c [NET]: kfree cleanup 2005-11-08 09:41:34 -08:00
ah4.c [CRYPTO]: crypto_free_tfm() callers no longer need to check for NULL 2005-09-01 17:44:29 -07:00
arp.c [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl 2005-10-03 14:35:55 -07:00
datagram.c [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
devinet.c [IPV4]: Fix secondary IP addresses after promotion 2005-11-22 14:47:37 -08:00
esp4.c [IPSEC] Fix block size/MTU bugs in ESP 2005-10-10 21:11:34 -07:00
fib_frontend.c [IPV4]: Fix secondary IP addresses after promotion 2005-11-22 14:47:37 -08:00
fib_hash.c [NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers 2005-08-29 16:11:18 -07:00
fib_lookup.h [IPV4]: Prepare FIB core for RCU. 2005-08-29 16:08:31 -07:00
fib_rules.c [NETLINK]: Correctly set NLM_F_MULTI without checking the pid 2005-06-18 22:54:12 -07:00
fib_semantics.c [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl 2005-10-03 14:35:55 -07:00
fib_trie.c [FIB_TRIE]: Don't show local table in /proc/net/route output 2005-11-20 21:09:00 -08:00
icmp.c [NET]: Detect hardware rx checksum faults correctly 2005-11-10 13:01:24 -08:00
igmp.c [NET]: Detect hardware rx checksum faults correctly 2005-11-10 13:01:24 -08:00
inet_connection_sock.c [TCP/DCCP]: Randomize port selection 2005-11-05 21:23:15 -02:00
inet_diag.c [NETLINK]: Make netlink_callback->done() optional 2005-11-10 02:26:40 +01:00
inet_hashtables.c [NET]: Introduce inet_connection_sock 2005-08-29 15:43:19 -07:00
inet_timewait_sock.c [TWSK]: Grab the module refcount for timewait sockets 2005-10-10 21:25:23 -07:00
inetpeer.c [PATCH] timer initialization cleanup: DEFINE_TIMER 2005-09-09 14:03:48 -07:00
ip_forward.c [IPV4]: Remove some dead code from ip_forward() 2005-08-29 16:03:06 -07:00
ip_fragment.c [IPV4,IPV6]: replace handmade list with hlist in IPv{4,6} reassembly 2005-11-16 12:55:37 -08:00
ip_gre.c [NET]: Detect hardware rx checksum faults correctly 2005-11-10 13:01:24 -08:00
ip_input.c [NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers 2005-08-29 16:11:18 -07:00
ip_options.c [NET]: kfree cleanup 2005-11-08 09:41:34 -08:00
ip_output.c [IPV4]: Fix ip_queue_xmit identity increment for TSO packets 2005-11-08 09:41:56 -08:00
ip_sockglue.c [NET]: kfree cleanup 2005-11-08 09:41:34 -08:00
ipcomp.c [CRYPTO]: crypto_free_tfm() callers no longer need to check for NULL 2005-09-01 17:44:29 -07:00
ipconfig.c [NET]: fix-up schedule_timeout() usage 2005-09-12 14:15:34 -07:00
ipip.c [NET]: fix oops after tunnel module unload 2005-07-30 17:46:44 -07:00
ipmr.c [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl 2005-10-03 14:35:55 -07:00
multipath.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
multipath_drr.c [IPV4]: possible cleanups 2005-08-29 15:33:20 -07:00
multipath_random.c [IPV4]: Multipath modules need a license to prevent kernel tainting. 2005-06-13 14:29:06 -07:00
multipath_rr.c [IPV4]: Multipath modules need a license to prevent kernel tainting. 2005-06-13 14:29:06 -07:00
multipath_wrandom.c [NET]: kfree cleanup 2005-11-08 09:41:34 -08:00
netfilter.c [NETFILTER]: Move reroute-after-queue code up to the nf_queue layer. 2005-08-29 15:36:19 -07:00
proc.c [NET]: Wider use of for_each_*cpu() 2005-10-25 23:54:01 -02:00
protocol.c [TCP]: Move the tcp sock states to net/tcp_states.h 2005-08-29 15:41:54 -07:00
raw.c [PATCH] raw_sendmsg DoS on 2.6 2005-09-19 18:45:42 -07:00
route.c [IPV4] tcp/route: Another look at hash table sizes 2005-11-29 16:12:55 -08:00
syncookies.c [NET]: Fix sparse warnings 2005-08-29 16:01:32 -07:00
sysctl_net_ipv4.c [TCP]: Appropriate Byte Count support 2005-11-10 17:09:53 -08:00
tcp.c [IPV4] tcp/route: Another look at hash table sizes 2005-11-29 16:12:55 -08:00
tcp_bic.c [TCP]: add tcp_slow_start helper 2005-11-10 17:07:24 -08:00
tcp_cong.c [TCP]: Appropriate Byte Count support 2005-11-10 17:09:53 -08:00
tcp_diag.c [INET_DIAG]: Move the tcp_diag interface to the proper place 2005-08-29 15:57:54 -07:00
tcp_highspeed.c [TCP]: TCP highspeed build error 2005-11-17 14:11:18 -08:00
tcp_htcp.c [TCP]: add tcp_slow_start helper 2005-11-10 17:07:24 -08:00
tcp_hybla.c [TCP]: fix congestion window update when using TSO deferal 2005-11-10 16:53:30 -08:00
tcp_input.c [TCP]: More spelling fixes. 2005-11-15 15:17:10 -08:00
tcp_ipv4.c [TCP]: spelling fixes 2005-11-10 17:13:47 -08:00
tcp_minisocks.c [TCP]: spelling fixes 2005-11-10 17:13:47 -08:00
tcp_output.c [TCP]: speed up SACK processing 2005-11-10 17:14:59 -08:00
tcp_scalable.c [TCP]: add tcp_slow_start helper 2005-11-10 17:07:24 -08:00
tcp_timer.c [TCP]: spelling fixes 2005-11-10 17:13:47 -08:00
tcp_vegas.c [PATCH] TCP: fix vegas build 2005-11-11 09:21:28 -08:00
tcp_westwood.c [INET_DIAG]: Rename tcp_diag.[ch] to inet_diag.[ch] 2005-08-29 15:57:48 -07:00
udp.c [NET]: Detect hardware rx checksum faults correctly 2005-11-10 13:01:24 -08:00
xfrm4_input.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
xfrm4_output.c [IPSEC]: Add XFRM_STATE_NOPMTUDISC flag 2005-06-20 13:21:43 -07:00
xfrm4_policy.c [IPSEC]: Store idev entries 2005-05-03 16:27:10 -07:00
xfrm4_state.c [IPV4]: possible cleanups 2005-08-29 15:33:20 -07:00
xfrm4_tunnel.c [NET]: Make ipip/ip6_tunnel independant of XFRM 2005-07-19 14:03:34 -07:00