Go to file
Martin KaFai Lau 40a1227ea8 tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket
Although the actual cookie check "__cookie_v[46]_check()" does
not involve sk specific info, it checks whether the sk has recent
synq overflow event in "tcp_synq_no_recent_overflow()".  The
tcp_sk(sk)->rx_opt.ts_recent_stamp is updated every second
when it has sent out a syncookie (through "tcp_synq_overflow()").

The above per sk "recent synq overflow event timestamp" works well
for non SO_REUSEPORT use case.  However, it may cause random
connection request reject/discard when SO_REUSEPORT is used with
syncookie because it fails the "tcp_synq_no_recent_overflow()"
test.

When SO_REUSEPORT is used, it usually has multiple listening
socks serving TCP connection requests destinated to the same local IP:PORT.
There are cases that the TCP-ACK-COOKIE may not be received
by the same sk that sent out the syncookie.  For example,
if reuse->socks[] began with {sk0, sk1},
1) sk1 sent out syncookies and tcp_sk(sk1)->rx_opt.ts_recent_stamp
   was updated.
2) the reuse->socks[] became {sk1, sk2} later.  e.g. sk0 was first closed
   and then sk2 was added.  Here, sk2 does not have ts_recent_stamp set.
   There are other ordering that will trigger the similar situation
   below but the idea is the same.
3) When the TCP-ACK-COOKIE comes back, sk2 was selected.
   "tcp_synq_no_recent_overflow(sk2)" returns true. In this case,
   all syncookies sent by sk1 will be handled (and rejected)
   by sk2 while sk1 is still alive.

The userspace may create and remove listening SO_REUSEPORT sockets
as it sees fit.  E.g. Adding new thread (and SO_REUSEPORT sock) to handle
incoming requests, old process stopping and new process starting...etc.
With or without SO_ATTACH_REUSEPORT_[CB]BPF,
the sockets leaving and joining a reuseport group makes picking
the same sk to check the syncookie very difficult (if not impossible).

The later patches will allow bpf prog more flexibility in deciding
where a sk should be located in a bpf map and selecting a particular
SO_REUSEPORT sock as it sees fit.  e.g. Without closing any sock,
replace the whole bpf reuseport_array in one map_update() by using
map-in-map.  Getting the syncookie check working smoothly across
socks in the same "reuse->socks[]" is important.

A partial solution is to set the newly added sk's ts_recent_stamp
to the max ts_recent_stamp of a reuseport group but that will require
to iterate through reuse->socks[]  OR
pessimistically set it to "now - TCP_SYNCOOKIE_VALID" when a sk is
joining a reuseport group.  However, neither of them will solve the
existing sk getting moved around the reuse->socks[] and that
sk may not have ts_recent_stamp updated, unlikely under continuous
synflood but not impossible.

This patch opts to treat the reuseport group as a whole when
considering the last synq overflow timestamp since
they are serving the same IP:PORT from the userspace
(and BPF program) perspective.

"synq_overflow_ts" is added to "struct sock_reuseport".
The tcp_synq_overflow() and tcp_synq_no_recent_overflow()
will update/check reuse->synq_overflow_ts if the sk is
in a reuseport group.  Similar to the reuseport decision in
__inet_lookup_listener(), both sk->sk_reuseport and
sk->sk_reuseport_cb are tested for SO_REUSEPORT usage.
Update on "synq_overflow_ts" happens at roughly once
every second.

A synflood test was done with a 16 rx-queues and 16 reuseport sockets.
No meaningful performance change is observed.  Before and
after the change is ~9Mpps in IPv4.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-11 01:58:45 +02:00
Documentation dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2. 2018-08-09 11:16:28 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
arch Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
block Partially revert "block: fail op_is_write() requests to read-only partitions" 2018-08-04 18:19:55 -07:00
certs certs/blacklist: fix const confusion 2018-06-26 09:43:03 -07:00
crypto net: simplify sock_poll_wait 2018-07-30 09:10:25 -07:00
drivers veth: Support per queue XDP ring 2018-08-10 16:12:21 +02:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
include tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket 2018-08-11 01:58:45 +02:00
init Kbuild fixes for v4.18 2018-06-30 13:05:30 -07:00
ipc Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
kernel bpf: btf: add pretty print for hash/lru_hash maps 2018-08-10 20:54:07 +02:00
lib Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
mm ipc/shm.c add ->pagesize function to shm_vm_ops 2018-08-02 16:03:40 -07:00
net tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket 2018-08-11 01:58:45 +02:00
samples samples/bpf: xdp_redirect_cpu load balance like Suricata 2018-08-10 16:07:49 +02:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-07-18 19:32:54 -07:00
security net: sched: introduce chain object to uapi 2018-07-23 20:44:12 -07:00
sound ALSA: hda/realtek - Yet another Clevo P950 quirk entry 2018-07-18 12:17:46 +02:00
tools tools/bpf: add bpffs pretty print btf test for hash/lru_hash maps 2018-08-10 20:54:07 +02:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt Miscellaneous bugfixes, plus a small patchlet related to Spectre v2. 2018-07-18 11:08:44 -07:00
.clang-format clang-format: add configuration file 2018-04-11 10:28:35 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap Merge branch 'asoc-4.17' into asoc-4.18 for compress dependencies 2018-04-26 12:24:28 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS/CREDITS: Drop METAG ARCHITECTURE 2018-03-05 16:34:24 +00:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: add basic helper macros to scripts/Kconfig.include 2018-05-29 03:31:19 +09:00
MAINTAINERS Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
Makefile Linux 4.18-rc8 2018-08-05 12:37:41 -07:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.