linux/net
Jon Paul Maloy 10724cc7bb tipc: redesign connection-level flow control
There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.

Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.

A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.

This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.

This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.

In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03 15:51:16 -04:00
..
6lowpan 6lowpan: move mac802154 header 2016-04-13 10:41:10 +02:00
9p net/9p: convert to new CQ API 2016-03-10 20:54:09 -05:00
802
8021q vlan: propagate gso_max_segs 2016-03-17 21:05:01 -04:00
appletalk appletalk: fix erroneous return value 2016-02-18 14:59:34 -05:00
atm
ax25 ax25: add link layer header validation function 2016-03-09 22:13:01 -05:00
batman-adv batman-adv: clarify CFG80211 dependency 2016-03-02 13:45:47 -05:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2016-04-26 13:15:56 -04:00
bridge bridge: netlink: export per-vlan stats 2016-05-02 22:27:06 -04:00
caif net: caif: fix misleading indentation 2016-03-14 13:09:50 -04:00
can sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ceph mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
core net: rtnetlink: add linkxstats callbacks and attribute 2016-05-02 22:27:06 -04:00
dcb
dccp dccp: do not assume DCCP code is non preemptible 2016-05-02 17:02:25 -04:00
decnet decnet: Do not build routes to devices without decnet private data. 2016-04-10 23:01:30 -04:00
dns_resolver
dsa net: dsa: Provide CPU port statistics to master netdev 2016-04-28 17:16:17 -04:00
ethernet eth: Pull header from first fragment via eth_get_headlen 2016-02-24 13:58:05 -05:00
hsr NLA_BINARY misuse bug in HSR 2016-04-21 13:59:08 -04:00
ieee802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2016-04-26 13:15:56 -04:00
ipv4 net: relax expensive skb_unclone() in iptunnel_handle_offloads() 2016-05-03 00:22:19 -04:00
ipv6 gre6: Cleanup GREv6 transmit path, call common GRE functions 2016-05-02 19:23:32 -04:00
ipx
irda Merge 4.5-rc4 into tty-next 2016-02-14 14:36:04 -08:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-01-19 14:21:08 -05:00
kcm kcm: Add receive message timeout 2016-03-09 16:36:15 -05:00
key
l2tp l2tp: use nla_put_u64_64bit() 2016-04-25 15:09:10 -04:00
l3mdev net: l3mdev: address selection should only consider devices in L3 domain 2016-02-26 14:22:26 -05:00
lapb
llc sock: tigthen lockdep checks for sock_owned_by_user 2016-04-13 22:37:20 -04:00
mac80211 cfg80211: remove enum ieee80211_band 2016-04-12 15:56:15 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls GSO: Add GSO type for fixed IPv4 ID 2016-04-14 16:23:40 -04:00
netfilter netfilter/ipvs: use nla_put_u64_64bit() 2016-04-25 15:09:11 -04:00
netlabel netlabel: do not initialise statics to NULL 2016-03-07 11:08:26 -05:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-23 18:51:33 -04:00
netrom
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
openvswitch ovs: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-23 18:51:33 -04:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
rds RDS: TCP: Call pskb_extract() helper function 2016-04-25 16:54:14 -04:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose
rxrpc net: udp: rename UDP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
sched fq_codel: add batch ability to fq_codel_drop() 2016-05-03 12:47:09 -04:00
sctp sctp: prepare for socket backlog behavior change 2016-05-02 17:02:26 -04:00
sunrpc net: udp: rename UDP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
switchdev switchdev: Adding complete operation to deferred switchdev ops 2016-04-24 14:23:32 -04:00
tipc tipc: redesign connection-level flow control 2016-05-03 15:51:16 -04:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-23 00:09:14 -05:00
vmw_vsock VSOCK: constify vsock_transport structure 2016-05-03 13:03:05 -04:00
wimax
wireless wireless: use nla_put_u64_64bit() 2016-04-25 15:09:11 -04:00
x25
xfrm xfrm: align nlattr properly when needed 2016-04-23 20:13:25 -04:00
Kconfig Make DST_CACHE a silent config option 2016-03-21 22:56:38 -04:00
Makefile kcm: Kernel Connection Multiplexor module 2016-03-09 16:36:14 -05:00
compat.c
socket.c tcp: remove SKBTX_ACK_TSTAMP since it is redundant 2016-04-28 16:06:10 -04:00
sysctl_net.c