linux/include
Eric Dumazet d41a69f1d3 tcp: make tcp_sendmsg() aware of socket backlog
Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 17:02:26 -04:00
..
acpi Merge branches 'acpi-processor' and 'acpi-cppc' 2016-03-14 14:20:33 +01:00
asm-generic asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic() 2016-04-21 11:06:09 +02:00
clocksource
crypto Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-03-17 11:33:45 -07:00
drm drm: Loongson-3 doesn't fully support wc memory 2016-04-22 10:24:11 +10:00
dt-bindings The clk changes for this release cycle are mostly dominated by 2016-03-23 06:06:45 -07:00
keys
kvm arm64: KVM: vgic-v3: Avoid accessing ICH registers 2016-03-09 04:24:04 +00:00
linux qed: add infrastructure for device self tests. 2016-05-02 00:16:39 -04:00
math-emu
media Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2016-03-19 16:31:54 -07:00
memory
misc cxl: Remove cxl_get_phys_dev() kernel API 2016-03-09 23:40:02 +11:00
net tcp: make tcp_sendmsg() aware of socket backlog 2016-05-02 17:02:26 -04:00
pcmcia
ras
rdma Round two of 4.6 merge window patches 2016-03-22 15:48:44 -07:00
rxrpc rxrpc: Static arrays of strings should be const char *const[] 2016-04-11 15:34:40 -04:00
scsi Merge branch 'fixes-base' into fixes 2016-04-05 06:56:47 -04:00
soc IOMMU Updates for Linux v4.6 2016-03-22 11:57:43 -07:00
sound ALSA: hda - Fix possible race on regmap bypass flip 2016-04-21 17:59:17 +02:00
target target: add a new add_wwn_groups fabrics method 2016-03-30 20:06:44 -07:00
trace perf, bpf: minimize the size of perf_trace_() tracepoint handler 2016-04-21 13:48:20 -04:00
uapi ppp: add rtnetlink device creation support 2016-04-29 16:09:44 -04:00
video gpu: ipu-v3: ipu-dmfc: Rename ipu_dmfc_init_channel to ipu_dmfc_config_wait4eot 2016-03-31 11:24:33 +02:00
xen xen-netback: re-import canonical netif header 2016-03-13 22:08:01 -04:00
Kbuild