From 32b5a2c9950b9284000059d752f7afa164deb15e Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Tue, 7 May 2019 20:28:15 +0300 Subject: [PATCH 001/145] wireless: Skip directory when generating certificates Commit 715a12334764 ("wireless: don't write C files on failures") drops the `test -f $$f` check. The list of targets contains the CONFIG_CFG80211_EXTRA_REGDB_KEYDIR directory itself, and this check used to filter it out. After the check was removed, the extra keydir option no longer works, failing with the following message: od: 'standard input': read error: Is a directory This commit restores the check to make extra keydir work again. Fixes: 715a12334764 ("wireless: don't write C files on failures") Signed-off-by: Maxim Mikityanskiy Signed-off-by: Johannes Berg --- net/wireless/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/net/wireless/Makefile b/net/wireless/Makefile index 72a224ce8e0a..2eee93985ab0 100644 --- a/net/wireless/Makefile +++ b/net/wireless/Makefile @@ -39,6 +39,7 @@ $(obj)/extra-certs.c: $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR:"%"=%) \ @(set -e; \ allf=""; \ for f in $^ ; do \ + test -f $$f || continue;\ # similar to hexdump -v -e '1/1 "0x%.2x," "\n"' \ thisf=$$(od -An -v -tx1 < $$f | \ sed -e 's/ /\n/g' | \ From 221fb7268d67c0867a93f23586bd53c3c3969eee Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 20 May 2019 14:22:25 -0700 Subject: [PATCH 002/145] Documentation/networking: fix af_xdp.rst Sphinx warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix Sphinx warnings in Documentation/networking/af_xdp.rst by adding indentation: Documentation/networking/af_xdp.rst:319: WARNING: Literal block expected; none found. Documentation/networking/af_xdp.rst:326: WARNING: Literal block expected; none found. Fixes: 0f4a9b7d4ecb ("xsk: add FAQ to facilitate for first time users") Signed-off-by: Randy Dunlap Cc: Magnus Karlsson Cc: Daniel Borkmann Acked-by: Björn Töpel Signed-off-by: Daniel Borkmann --- Documentation/networking/af_xdp.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index e14d7d40fc75..50bccbf68308 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -316,16 +316,16 @@ A: When a netdev of a physical NIC is initialized, Linux usually all the traffic, you can force the netdev to only have 1 queue, queue id 0, and then bind to queue 0. You can use ethtool to do this:: - sudo ethtool -L combined 1 + sudo ethtool -L combined 1 If you want to only see part of the traffic, you can program the NIC through ethtool to filter out your traffic to a single queue id that you can bind your XDP socket to. Here is one example in which UDP traffic to and from port 4242 are sent to queue 2:: - sudo ethtool -N rx-flow-hash udp4 fn - sudo ethtool -N flow-type udp4 src-port 4242 dst-port \ - 4242 action 2 + sudo ethtool -N rx-flow-hash udp4 fn + sudo ethtool -N flow-type udp4 src-port 4242 dst-port \ + 4242 action 2 A number of other ways are possible all up to the capabilitites of the NIC you have. From 9b28ae243ef3b13d8a88b5451d025475c75ebdef Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Tue, 21 May 2019 08:52:38 +0100 Subject: [PATCH 003/145] bpf: fix out-of-bounds read in __bpf_skc_lookup __bpf_skc_lookup takes a socket tuple and the length of the tuple as an argument. Based on the length, it decides which address family to pass to the helper function sk_lookup. In case of AF_INET6, it fails to verify that the length of the tuple is long enough. sk_lookup may therefore access data past the end of the tuple. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Lorenz Bauer Signed-off-by: Daniel Borkmann --- net/core/filter.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 55bfc941d17a..76f1d99640e2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5304,7 +5304,13 @@ __bpf_skc_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len, struct net *net; int sdif; - family = len == sizeof(tuple->ipv4) ? AF_INET : AF_INET6; + if (len == sizeof(tuple->ipv4)) + family = AF_INET; + else if (len == sizeof(tuple->ipv6)) + family = AF_INET6; + else + return NULL; + if (unlikely(family == AF_UNSPEC || flags || !((s32)netns_id < 0 || netns_id <= S32_MAX))) goto out; From f7355a6c049711ecfbeed573e43d48bee7acb83a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 17 May 2019 14:21:17 -0700 Subject: [PATCH 004/145] bpf: Check sk_fullsock() before returning from bpf_sk_lookup() The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL. Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be accessed, e.g. type, protocol, mark and priority. Some new helper, like bpf_sk_storage_get(), also expects ARG_PTR_TO_SOCKET is a fullsock. bpf_sk_lookup() currently calls sk_to_full_sk() before returning. However, the ptr returned from sk_to_full_sk() is not guaranteed to be a fullsock. For example, it cannot get a fullsock if sk is in TCP_TIME_WAIT. This patch checks for sk_fullsock() before returning. If it is not a fullsock, sock_gen_put() is called if needed and then returns NULL. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Cc: Joe Stringer Signed-off-by: Martin KaFai Lau Acked-by: Joe Stringer Signed-off-by: Daniel Borkmann --- net/core/filter.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 76f1d99640e2..fdcc504d2dec 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5343,8 +5343,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len, struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net, ifindex, proto, netns_id, flags); - if (sk) + if (sk) { sk = sk_to_full_sk(sk); + if (!sk_fullsock(sk)) { + if (!sock_flag(sk, SOCK_RCU_FREE)) + sock_gen_put(sk); + return NULL; + } + } return sk; } @@ -5375,8 +5381,14 @@ bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len, struct sock *sk = bpf_skc_lookup(skb, tuple, len, proto, netns_id, flags); - if (sk) + if (sk) { sk = sk_to_full_sk(sk); + if (!sk_fullsock(sk)) { + if (!sock_flag(sk, SOCK_RCU_FREE)) + sock_gen_put(sk); + return NULL; + } + } return sk; } From f7c2d64bac1be2ff32f8e4f500c6e5429c1003e0 Mon Sep 17 00:00:00 2001 From: Chang-Hsien Tsai Date: Sun, 19 May 2019 09:05:44 +0000 Subject: [PATCH 005/145] samples, bpf: fix to change the buffer size for read() If the trace for read is larger than 4096, the return value sz will be 4096. This results in off-by-one error on buf: static char buf[4096]; ssize_t sz; sz = read(trace_fd, buf, sizeof(buf)); if (sz > 0) { buf[sz] = 0; puts(buf); } Signed-off-by: Chang-Hsien Tsai Signed-off-by: Daniel Borkmann --- samples/bpf/bpf_load.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index eae7b635343d..6e87cc831e84 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -678,7 +678,7 @@ void read_trace_pipe(void) static char buf[4096]; ssize_t sz; - sz = read(trace_fd, buf, sizeof(buf)); + sz = read(trace_fd, buf, sizeof(buf) - 1); if (sz > 0) { buf[sz] = 0; puts(buf); From a195cefff49f60054998333e81ee95170ce8bf92 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Mon, 20 May 2019 23:49:38 +0200 Subject: [PATCH 006/145] samples, bpf: suppress compiler warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GCC 9 fails to calculate the size of local constant strings and produces a false positive: samples/bpf/task_fd_query_user.c: In function ‘test_debug_fs_uprobe’: samples/bpf/task_fd_query_user.c:242:67: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 215 [-Wformat-truncation=] 242 | snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s/id", | ^~ 243 | event_type, event_alias); | ~~~~~~~~~~~ samples/bpf/task_fd_query_user.c:242:2: note: ‘snprintf’ output between 45 and 300 bytes into a destination of size 256 242 | snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s/id", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 243 | event_type, event_alias); | ~~~~~~~~~~~~~~~~~~~~~~~~ Workaround this by lowering the buffer size to a reasonable value. Related GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83431 Signed-off-by: Matteo Croce Signed-off-by: Daniel Borkmann --- samples/bpf/task_fd_query_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/bpf/task_fd_query_user.c b/samples/bpf/task_fd_query_user.c index aff2b4ae914e..e39938058223 100644 --- a/samples/bpf/task_fd_query_user.c +++ b/samples/bpf/task_fd_query_user.c @@ -216,7 +216,7 @@ static int test_debug_fs_uprobe(char *binary_path, long offset, bool is_return) { const char *event_type = "uprobe"; struct perf_event_attr attr = {}; - char buf[256], event_alias[256]; + char buf[256], event_alias[sizeof("test_1234567890")]; __u64 probe_offset, probe_addr; __u32 len, prog_id, fd_type; int err, res, kfd, efd; From fe121ee531d1362810bfd30f38a1b88b1d3d376c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Tue, 21 May 2019 15:46:22 +0200 Subject: [PATCH 007/145] bpf, riscv: clear target register high 32-bits for and/or/xor on ALU32 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When using 32-bit subregisters (ALU32), the RISC-V JIT would not clear the high 32-bits of the target register and therefore generate incorrect code. E.g., in the following code: $ cat test.c unsigned int f(unsigned long long a, unsigned int b) { return (unsigned int)a & b; } $ clang-9 -target bpf -O2 -emit-llvm -S test.c -o - | \ llc-9 -mattr=+alu32 -mcpu=v3 .text .file "test.c" .globl f .p2align 3 .type f,@function f: r0 = r1 w0 &= w2 exit .Lfunc_end0: .size f, .Lfunc_end0-f The JIT would not clear the high 32-bits of r0 after the and-operation, which in this case might give an incorrect return value. After this patch, that is not the case, and the upper 32-bits are cleared. Reported-by: Jiong Wang Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann --- arch/riscv/net/bpf_jit_comp.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c index 80b12aa5e10d..e5c8d675bd6e 100644 --- a/arch/riscv/net/bpf_jit_comp.c +++ b/arch/riscv/net/bpf_jit_comp.c @@ -759,14 +759,20 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, case BPF_ALU | BPF_AND | BPF_X: case BPF_ALU64 | BPF_AND | BPF_X: emit(rv_and(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_OR | BPF_X: case BPF_ALU64 | BPF_OR | BPF_X: emit(rv_or(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_XOR | BPF_X: case BPF_ALU64 | BPF_XOR | BPF_X: emit(rv_xor(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_MUL | BPF_X: case BPF_ALU64 | BPF_MUL | BPF_X: From 00d8304553de529802430799d2b2566fd32859d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Wed, 22 May 2019 11:23:23 +0200 Subject: [PATCH 008/145] selftests: bpf: add zero extend checks for ALU32 and/or/xor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add three tests to test_verifier/basic_instr that make sure that the high 32-bits of the destination register is cleared after an ALU32 and/or/xor. Signed-off-by: Björn Töpel Acked-by: Yonghong Song Signed-off-by: Daniel Borkmann --- .../selftests/bpf/verifier/basic_instr.c | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/tools/testing/selftests/bpf/verifier/basic_instr.c b/tools/testing/selftests/bpf/verifier/basic_instr.c index ed91a7b9a456..4d844089938e 100644 --- a/tools/testing/selftests/bpf/verifier/basic_instr.c +++ b/tools/testing/selftests/bpf/verifier/basic_instr.c @@ -132,3 +132,42 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, +{ + "and32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, -2), + BPF_ALU32_REG(BPF_AND, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "or32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, -2), + BPF_ALU32_REG(BPF_OR, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "xor32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, From 186bcc3dcd106dc52d706117f912054b616666e1 Mon Sep 17 00:00:00 2001 From: Jakub Sitnicki Date: Wed, 22 May 2019 12:01:42 +0200 Subject: [PATCH 009/145] bpf: sockmap, restore sk_write_space when psock gets dropped Once psock gets unlinked from its sock (sk_psock_drop), user-space can still trigger a call to sk->sk_write_space by setting TCP_NOTSENT_LOWAT socket option. This causes a null-ptr-deref because we try to read psock->saved_write_space from sk_psock_write_space: ================================================================== BUG: KASAN: null-ptr-deref in sk_psock_write_space+0x69/0x80 Read of size 8 at addr 00000000000001a0 by task sockmap-echo/131 CPU: 0 PID: 131 Comm: sockmap-echo Not tainted 5.2.0-rc1-00094-gf49aa1de9836 #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014 Call Trace: ? sk_psock_write_space+0x69/0x80 __kasan_report.cold.2+0x5/0x3f ? sk_psock_write_space+0x69/0x80 kasan_report+0xe/0x20 sk_psock_write_space+0x69/0x80 tcp_setsockopt+0x69a/0xfc0 ? tcp_shutdown+0x70/0x70 ? fsnotify+0x5b0/0x5f0 ? remove_wait_queue+0x90/0x90 ? __fget_light+0xa5/0xf0 __sys_setsockopt+0xe6/0x180 ? sockfd_lookup_light+0xb0/0xb0 ? vfs_write+0x195/0x210 ? ksys_write+0xc9/0x150 ? __x64_sys_read+0x50/0x50 ? __bpf_trace_x86_fpu+0x10/0x10 __x64_sys_setsockopt+0x61/0x70 do_syscall_64+0xc5/0x520 ? vmacache_find+0xc0/0x110 ? syscall_return_slowpath+0x110/0x110 ? handle_mm_fault+0xb4/0x110 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe ? trace_hardirqs_off_caller+0x4b/0x120 ? trace_hardirqs_off_thunk+0x1a/0x3a entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f2e5e7cdcce Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8a 11 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffed011b778 EFLAGS: 00000206 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2e5e7cdcce RDX: 0000000000000019 RSI: 0000000000000006 RDI: 0000000000000007 RBP: 00007ffed011b790 R08: 0000000000000004 R09: 00007f2e5e84ee80 R10: 00007ffed011b788 R11: 0000000000000206 R12: 00007ffed011b78c R13: 00007ffed011b788 R14: 0000000000000007 R15: 0000000000000068 ================================================================== Restore the saved sk_write_space callback when psock is being dropped to fix the crash. Signed-off-by: Jakub Sitnicki Signed-off-by: Daniel Borkmann --- include/linux/skmsg.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 178a3933a71b..50ced8aba9db 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -351,6 +351,8 @@ static inline void sk_psock_update_proto(struct sock *sk, static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { + sk->sk_write_space = psock->saved_write_space; + if (psock->sk_proto) { sk->sk_prot = psock->sk_proto; psock->sk_proto = NULL; From 79c92ca42b5a3e0ea172ea2ce8df8e125af237da Mon Sep 17 00:00:00 2001 From: Yu Wang Date: Fri, 10 May 2019 17:04:52 +0800 Subject: [PATCH 010/145] mac80211: handle deauthentication/disassociation from TDLS peer When receiving a deauthentication/disassociation frame from a TDLS peer, a station should not disconnect the current AP, but only disable the current TDLS link if it's enabled. Without this change, a TDLS issue can be reproduced by following the steps as below: 1. STA-1 and STA-2 are connected to AP, bidirection traffic is running between STA-1 and STA-2. 2. Set up TDLS link between STA-1 and STA-2, stay for a while, then teardown TDLS link. 3. Repeat step #2 and monitor the connection between STA and AP. During the test, one STA may send a deauthentication/disassociation frame to another, after TDLS teardown, with reason code 6/7, which means: Class 2/3 frame received from nonassociated STA. On receive this frame, the receiver STA will disconnect the current AP and then reconnect. It's not a expected behavior, purpose of this frame should be disabling the TDLS link, not the link with AP. Cc: stable@vger.kernel.org Signed-off-by: Yu Wang Signed-off-by: Johannes Berg --- net/mac80211/ieee80211_i.h | 3 +++ net/mac80211/mlme.c | 12 +++++++++++- net/mac80211/tdls.c | 23 +++++++++++++++++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 073a8235ae1b..a8af4aafa117 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -2225,6 +2225,9 @@ void ieee80211_tdls_cancel_channel_switch(struct wiphy *wiphy, const u8 *addr); void ieee80211_teardown_tdls_peers(struct ieee80211_sub_if_data *sdata); void ieee80211_tdls_chsw_work(struct work_struct *wk); +void ieee80211_tdls_handle_disconnect(struct ieee80211_sub_if_data *sdata, + const u8 *peer, u16 reason); +const char *ieee80211_get_reason_code_string(u16 reason_code); extern const struct ethtool_ops ieee80211_ethtool_ops; diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index b7a9fe3d5fcb..383b0df100e4 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -2963,7 +2963,7 @@ static void ieee80211_rx_mgmt_auth(struct ieee80211_sub_if_data *sdata, #define case_WLAN(type) \ case WLAN_REASON_##type: return #type -static const char *ieee80211_get_reason_code_string(u16 reason_code) +const char *ieee80211_get_reason_code_string(u16 reason_code) { switch (reason_code) { case_WLAN(UNSPECIFIED); @@ -3028,6 +3028,11 @@ static void ieee80211_rx_mgmt_deauth(struct ieee80211_sub_if_data *sdata, if (len < 24 + 2) return; + if (!ether_addr_equal(mgmt->bssid, mgmt->sa)) { + ieee80211_tdls_handle_disconnect(sdata, mgmt->sa, reason_code); + return; + } + if (ifmgd->associated && ether_addr_equal(mgmt->bssid, ifmgd->associated->bssid)) { const u8 *bssid = ifmgd->associated->bssid; @@ -3077,6 +3082,11 @@ static void ieee80211_rx_mgmt_disassoc(struct ieee80211_sub_if_data *sdata, reason_code = le16_to_cpu(mgmt->u.disassoc.reason_code); + if (!ether_addr_equal(mgmt->bssid, mgmt->sa)) { + ieee80211_tdls_handle_disconnect(sdata, mgmt->sa, reason_code); + return; + } + sdata_info(sdata, "disassociated from %pM (Reason: %u=%s)\n", mgmt->sa, reason_code, ieee80211_get_reason_code_string(reason_code)); diff --git a/net/mac80211/tdls.c b/net/mac80211/tdls.c index 24c37f91ca46..ba8fe48952d9 100644 --- a/net/mac80211/tdls.c +++ b/net/mac80211/tdls.c @@ -1994,3 +1994,26 @@ void ieee80211_tdls_chsw_work(struct work_struct *wk) } rtnl_unlock(); } + +void ieee80211_tdls_handle_disconnect(struct ieee80211_sub_if_data *sdata, + const u8 *peer, u16 reason) +{ + struct ieee80211_sta *sta; + + rcu_read_lock(); + sta = ieee80211_find_sta(&sdata->vif, peer); + if (!sta || !sta->tdls) { + rcu_read_unlock(); + return; + } + rcu_read_unlock(); + + tdls_dbg(sdata, "disconnected from TDLS peer %pM (Reason: %u=%s)\n", + peer, reason, + ieee80211_get_reason_code_string(reason)); + + ieee80211_tdls_oper_request(&sdata->vif, peer, + NL80211_TDLS_TEARDOWN, + WLAN_REASON_TDLS_TEARDOWN_UNREACHABLE, + GFP_ATOMIC); +} From 818e9dfa2c145c7b0d241c5c419f4b897a1af564 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Mon, 29 Apr 2019 14:07:54 +0000 Subject: [PATCH 011/145] mac80211: remove set but not used variable 'old' Fixes gcc '-Wunused-but-set-variable' warning: net/mac80211/key.c: In function 'ieee80211_set_tx_key': net/mac80211/key.c:271:24: warning: variable 'old' set but not used [-Wunused-but-set-variable] It is not used since introduction in commit 96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: YueHaibing Signed-off-by: Johannes Berg --- net/mac80211/key.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/mac80211/key.c b/net/mac80211/key.c index 20bf9db7a388..89f09a09efdb 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -268,11 +268,9 @@ int ieee80211_set_tx_key(struct ieee80211_key *key) { struct sta_info *sta = key->sta; struct ieee80211_local *local = key->local; - struct ieee80211_key *old; assert_key_lock(local); - old = key_mtx_dereference(local, sta->ptk[sta->ptk_idx]); sta->ptk_idx = key->conf.keyidx; ieee80211_check_fast_xmit(sta); From 25d16d124a5e249e947c0487678b61dcff25cf8b Mon Sep 17 00:00:00 2001 From: John Crispin Date: Thu, 23 May 2019 10:27:24 +0200 Subject: [PATCH 012/145] mac80211: fix rate reporting inside cfg80211_calculate_bitrate_he() The reported rate is not scaled down correctly. After applying this patch, the function will behave just like the v/ht equivalents. Signed-off-by: Shashidhar Lakkavalli Signed-off-by: John Crispin Signed-off-by: Johannes Berg --- net/wireless/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index cf63b635afc0..b9d8ceb21327 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -1246,7 +1246,7 @@ static u32 cfg80211_calculate_bitrate_he(struct rate_info *rate) if (rate->he_dcm) result /= 2; - return result; + return result / 10000; } u32 cfg80211_calculate_bitrate(struct rate_info *rate) From 85a55ff2cf6b73aab01732cc827f645563b7f3d1 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 29 Apr 2019 13:19:50 -0500 Subject: [PATCH 013/145] mac80211_hwsim: mark expected switch fall-through MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/wireless/mac80211_hwsim.c: In function ‘init_mac80211_hwsim’: drivers/net/wireless/mac80211_hwsim.c:3853:21: warning: this statement may fall through [-Wimplicit-fallthrough=] param.reg_strict = true; ~~~~~~~~~~~~~~~~~^~~~~~ drivers/net/wireless/mac80211_hwsim.c:3854:3: note: here case HWSIM_REGTEST_DRIVER_REG_ALL: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva Signed-off-by: Johannes Berg --- drivers/net/wireless/mac80211_hwsim.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 60ca13e0f15b..b5274d1f30fa 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3851,6 +3851,7 @@ static int __init init_mac80211_hwsim(void) break; case HWSIM_REGTEST_STRICT_ALL: param.reg_strict = true; + /* fall through */ case HWSIM_REGTEST_DRIVER_REG_ALL: param.reg_alpha2 = hwsim_alpha2s[0]; break; From 33d915d9e8ce811d8958915ccd18d71a66c7c495 Mon Sep 17 00:00:00 2001 From: Manikanta Pubbisetty Date: Wed, 8 May 2019 14:55:33 +0530 Subject: [PATCH 014/145] {nl,mac}80211: allow 4addr AP operation on crypto controlled devices As per the current design, in the case of sw crypto controlled devices, it is the device which advertises the support for AP/VLAN iftype based on it's ability to tranmsit packets encrypted in software (In VLAN functionality, group traffic generated for a specific VLAN group is always encrypted in software). Commit db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") has introduced this change. Since 4addr AP operation also uses AP/VLAN iftype, this conditional way of advertising AP/VLAN support has broken 4addr AP mode operation on crypto controlled devices which do not support VLAN functionality. In the case of ath10k driver, not all firmwares have support for VLAN functionality but all can support 4addr AP operation. Because AP/VLAN support is not advertised for these devices, 4addr AP operations are also blocked. Fix this by allowing 4addr operation on devices which do not support AP/VLAN iftype but can support 4addr AP operation (decision is based on the wiphy flag WIPHY_FLAG_4ADDR_AP). Cc: stable@vger.kernel.org Fixes: db3bdcb9c3ff ("mac80211: allow AP_VLAN operation on crypto controlled devices") Signed-off-by: Manikanta Pubbisetty Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 3 ++- net/mac80211/util.c | 4 +++- net/wireless/core.c | 6 +++++- net/wireless/nl80211.c | 8 ++++++-- 4 files changed, 16 insertions(+), 5 deletions(-) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 87dae868707e..948139690a58 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -3839,7 +3839,8 @@ struct cfg80211_ops { * on wiphy_new(), but can be changed by the driver if it has a good * reason to override the default * @WIPHY_FLAG_4ADDR_AP: supports 4addr mode even on AP (with a single station - * on a VLAN interface) + * on a VLAN interface). This flag also serves an extra purpose of + * supporting 4ADDR AP mode on devices which do not support AP/VLAN iftype. * @WIPHY_FLAG_4ADDR_STATION: supports 4addr mode even as a station * @WIPHY_FLAG_CONTROL_PORT_PROTOCOL: This device supports setting the * control port protocol ethertype. The device also honours the diff --git a/net/mac80211/util.c b/net/mac80211/util.c index cba4633cd6cf..1c8384f81526 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -3795,7 +3795,9 @@ int ieee80211_check_combinations(struct ieee80211_sub_if_data *sdata, } /* Always allow software iftypes */ - if (local->hw.wiphy->software_iftypes & BIT(iftype)) { + if (local->hw.wiphy->software_iftypes & BIT(iftype) || + (iftype == NL80211_IFTYPE_AP_VLAN && + local->hw.wiphy->flags & WIPHY_FLAG_4ADDR_AP)) { if (radar_detect) return -EINVAL; return 0; diff --git a/net/wireless/core.c b/net/wireless/core.c index b36ad8efb5e5..4e83892f1ac2 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -1396,8 +1396,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb, } break; case NETDEV_PRE_UP: - if (!(wdev->wiphy->interface_modes & BIT(wdev->iftype))) + if (!(wdev->wiphy->interface_modes & BIT(wdev->iftype)) && + !(wdev->iftype == NL80211_IFTYPE_AP_VLAN && + rdev->wiphy.flags & WIPHY_FLAG_4ADDR_AP && + wdev->use_4addr)) return notifier_from_errno(-EOPNOTSUPP); + if (rfkill_blocked(rdev->rfkill)) return notifier_from_errno(-ERFKILL); break; diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index fffe4b371e23..4b3c5281ca14 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3419,8 +3419,7 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) if (info->attrs[NL80211_ATTR_IFTYPE]) type = nla_get_u32(info->attrs[NL80211_ATTR_IFTYPE]); - if (!rdev->ops->add_virtual_intf || - !(rdev->wiphy.interface_modes & (1 << type))) + if (!rdev->ops->add_virtual_intf) return -EOPNOTSUPP; if ((type == NL80211_IFTYPE_P2P_DEVICE || type == NL80211_IFTYPE_NAN || @@ -3439,6 +3438,11 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) return err; } + if (!(rdev->wiphy.interface_modes & (1 << type)) && + !(type == NL80211_IFTYPE_AP_VLAN && params.use_4addr && + rdev->wiphy.flags & WIPHY_FLAG_4ADDR_AP)) + return -EOPNOTSUPP; + err = nl80211_parse_mon_options(rdev, type, info, ¶ms); if (err < 0) return err; From bd95e678e0f6e18351ecdc147ca819145db9ed7b Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Fri, 24 May 2019 08:01:00 -0700 Subject: [PATCH 015/145] bpf: sockmap, fix use after free from sleep in psock backlog workqueue Backlog work for psock (sk_psock_backlog) might sleep while waiting for memory to free up when sending packets. However, while sleeping the socket may be closed and removed from the map by the user space side. This breaks an assumption in sk_stream_wait_memory, which expects the wait queue to be still there when it wakes up resulting in a use-after-free shown below. To fix his mark sendmsg as MSG_DONTWAIT to avoid the sleep altogether. We already set the flag for the sendpage case but we missed the case were sendmsg is used. Sockmap is currently the only user of skb_send_sock_locked() so only the sockmap paths should be impacted. ================================================================== BUG: KASAN: use-after-free in remove_wait_queue+0x31/0x70 Write of size 8 at addr ffff888069a0c4e8 by task kworker/0:2/110 CPU: 0 PID: 110 Comm: kworker/0:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 Workqueue: events sk_psock_backlog Call Trace: print_address_description+0x6e/0x2b0 ? remove_wait_queue+0x31/0x70 kasan_report+0xfd/0x177 ? remove_wait_queue+0x31/0x70 ? remove_wait_queue+0x31/0x70 remove_wait_queue+0x31/0x70 sk_stream_wait_memory+0x4dd/0x5f0 ? sk_stream_wait_close+0x1b0/0x1b0 ? wait_woken+0xc0/0xc0 ? tcp_current_mss+0xc5/0x110 tcp_sendmsg_locked+0x634/0x15d0 ? tcp_set_state+0x2e0/0x2e0 ? __kasan_slab_free+0x1d1/0x230 ? kmem_cache_free+0x70/0x140 ? sk_psock_backlog+0x40c/0x4b0 ? process_one_work+0x40b/0x660 ? worker_thread+0x82/0x680 ? kthread+0x1b9/0x1e0 ? ret_from_fork+0x1f/0x30 ? check_preempt_curr+0xaf/0x130 ? iov_iter_kvec+0x5f/0x70 ? kernel_sendmsg_locked+0xa0/0xe0 skb_send_sock_locked+0x273/0x3c0 ? skb_splice_bits+0x180/0x180 ? start_thread+0xe0/0xe0 ? update_min_vruntime.constprop.27+0x88/0xc0 sk_psock_backlog+0xb3/0x4b0 ? strscpy+0xbf/0x1e0 process_one_work+0x40b/0x660 worker_thread+0x82/0x680 ? process_one_work+0x660/0x660 kthread+0x1b9/0x1e0 ? __kthread_create_on_node+0x250/0x250 ret_from_fork+0x1f/0x30 Fixes: 20bf50de3028c ("skbuff: Function to send an skbuf on a socket") Reported-by: Jakub Sitnicki Tested-by: Jakub Sitnicki Signed-off-by: John Fastabend Signed-off-by: Daniel Borkmann --- net/core/skbuff.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e89be6282693..4a7c656b195b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2337,6 +2337,7 @@ do_frag_list: kv.iov_base = skb->data + offset; kv.iov_len = slen; memset(&msg, 0, sizeof(msg)); + msg.msg_flags = MSG_DONTWAIT; ret = kernel_sendmsg_locked(sk, &msg, &kv, 1, slen); if (ret <= 0) From a71fd9dac23613d96ba3c05619a8ef4fd6cdf9b9 Mon Sep 17 00:00:00 2001 From: Jouni Malinen Date: Tue, 28 May 2019 01:46:43 +0300 Subject: [PATCH 016/145] mac80211: Do not use stack memory with scatterlist for GMAC ieee80211_aes_gmac() uses the mic argument directly in sg_set_buf() and that does not allow use of stack memory (e.g., BUG_ON() is hit in sg_set_buf() with CONFIG_DEBUG_SG). BIP GMAC TX side is fine for this since it can use the skb data buffer, but the RX side was using a stack variable for deriving the local MIC value to compare against the received one. Fix this by allocating heap memory for the mic buffer. This was found with hwsim test case ap_cipher_bip_gmac_128 hitting that BUG_ON() and kernel panic. Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg --- net/mac80211/wpa.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c index 58d0b258b684..5dd48f0a4b1b 100644 --- a/net/mac80211/wpa.c +++ b/net/mac80211/wpa.c @@ -1175,7 +1175,7 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx) struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb); struct ieee80211_key *key = rx->key; struct ieee80211_mmie_16 *mmie; - u8 aad[GMAC_AAD_LEN], mic[GMAC_MIC_LEN], ipn[6], nonce[GMAC_NONCE_LEN]; + u8 aad[GMAC_AAD_LEN], *mic, ipn[6], nonce[GMAC_NONCE_LEN]; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; if (!ieee80211_is_mgmt(hdr->frame_control)) @@ -1206,13 +1206,18 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx) memcpy(nonce, hdr->addr2, ETH_ALEN); memcpy(nonce + ETH_ALEN, ipn, 6); + mic = kmalloc(GMAC_MIC_LEN, GFP_ATOMIC); + if (!mic) + return RX_DROP_UNUSABLE; if (ieee80211_aes_gmac(key->u.aes_gmac.tfm, aad, nonce, skb->data + 24, skb->len - 24, mic) < 0 || crypto_memneq(mic, mmie->mic, sizeof(mmie->mic))) { key->u.aes_gmac.icverrors++; + kfree(mic); return RX_DROP_UNUSABLE; } + kfree(mic); } memcpy(key->u.aes_gmac.rx_pn, ipn, 6); From f77bf4863dc2218362f4227d56af4a5f3f08830c Mon Sep 17 00:00:00 2001 From: Andy Strohman Date: Fri, 24 May 2019 23:27:29 -0700 Subject: [PATCH 017/145] nl80211: fix station_info pertid memory leak When dumping stations, memory allocated for station_info's pertid member will leak if the nl80211 header cannot be added to the sk_buff due to insufficient tail room. I noticed this leak in the kmalloc-2048 cache. Cc: stable@vger.kernel.org Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Andy Strohman Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 4b3c5281ca14..140d24e5718f 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4859,8 +4859,10 @@ static int nl80211_send_station(struct sk_buff *msg, u32 cmd, u32 portid, struct nlattr *sinfoattr, *bss_param; hdr = nl80211hdr_put(msg, portid, seq, flags, cmd); - if (!hdr) + if (!hdr) { + cfg80211_sinfo_release_content(sinfo); return -1; + } if (nla_put_u32(msg, NL80211_ATTR_IFINDEX, dev->ifindex) || nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, mac_addr) || From 551842446ed695641a00782cd118cbb064a416a1 Mon Sep 17 00:00:00 2001 From: Thomas Pedersen Date: Fri, 24 May 2019 21:16:24 -0700 Subject: [PATCH 018/145] mac80211: mesh: fix RCU warning ifmsh->csa is an RCU-protected pointer. The writer context in ieee80211_mesh_finish_csa() is already mutually exclusive with wdev->sdata.mtx, but the RCU checker did not know this. Use rcu_dereference_protected() to avoid a warning. fixes the following warning: [ 12.519089] ============================= [ 12.520042] WARNING: suspicious RCU usage [ 12.520652] 5.1.0-rc7-wt+ #16 Tainted: G W [ 12.521409] ----------------------------- [ 12.521972] net/mac80211/mesh.c:1223 suspicious rcu_dereference_check() usage! [ 12.522928] other info that might help us debug this: [ 12.523984] rcu_scheduler_active = 2, debug_locks = 1 [ 12.524855] 5 locks held by kworker/u8:2/152: [ 12.525438] #0: 00000000057be08c ((wq_completion)phy0){+.+.}, at: process_one_work+0x1a2/0x620 [ 12.526607] #1: 0000000059c6b07a ((work_completion)(&sdata->csa_finalize_work)){+.+.}, at: process_one_work+0x1a2/0x620 [ 12.528001] #2: 00000000f184ba7d (&wdev->mtx){+.+.}, at: ieee80211_csa_finalize_work+0x2f/0x90 [ 12.529116] #3: 00000000831a1f54 (&local->mtx){+.+.}, at: ieee80211_csa_finalize_work+0x47/0x90 [ 12.530233] #4: 00000000fd06f988 (&local->chanctx_mtx){+.+.}, at: ieee80211_csa_finalize_work+0x51/0x90 Signed-off-by: Thomas Pedersen Signed-off-by: Johannes Berg --- net/mac80211/mesh.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index 766e5e5bab8a..d5aba5029cb0 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -1220,7 +1220,8 @@ int ieee80211_mesh_finish_csa(struct ieee80211_sub_if_data *sdata) ifmsh->chsw_ttl = 0; /* Remove the CSA and MCSP elements from the beacon */ - tmp_csa_settings = rcu_dereference(ifmsh->csa); + tmp_csa_settings = rcu_dereference_protected(ifmsh->csa, + lockdep_is_held(&sdata->wdev.mtx)); RCU_INIT_POINTER(ifmsh->csa, NULL); if (tmp_csa_settings) kfree_rcu(tmp_csa_settings, rcu_head); @@ -1242,6 +1243,8 @@ int ieee80211_mesh_csa_beacon(struct ieee80211_sub_if_data *sdata, struct mesh_csa_settings *tmp_csa_settings; int ret = 0; + lockdep_assert_held(&sdata->wdev.mtx); + tmp_csa_settings = kmalloc(sizeof(*tmp_csa_settings), GFP_ATOMIC); if (!tmp_csa_settings) From 8a03447dd311da2ad2df74dcf730a1a15f673379 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Mon, 6 May 2019 09:39:17 +0200 Subject: [PATCH 019/145] rtw88: fix subscript above array bounds compiler warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit My compiler complains about: drivers/net/wireless/realtek/rtw88/phy.c: In function ‘rtw_phy_rf_power_2_rssi’: drivers/net/wireless/realtek/rtw88/phy.c:430:26: warning: array subscript is above array bounds [-Warray-bounds] linear = db_invert_table[i][j]; According to comment power_db should be in range 1 ~ 96 . To fix add check for boundaries before access the array. Signed-off-by: Stanislaw Gruszka Acked-by: Yan-Hsuan Chuang Signed-off-by: Kalle Valo --- drivers/net/wireless/realtek/rtw88/phy.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/wireless/realtek/rtw88/phy.c b/drivers/net/wireless/realtek/rtw88/phy.c index 4381b360b5b5..8284a7797398 100644 --- a/drivers/net/wireless/realtek/rtw88/phy.c +++ b/drivers/net/wireless/realtek/rtw88/phy.c @@ -423,6 +423,11 @@ static u64 rtw_phy_db_2_linear(u8 power_db) u8 i, j; u64 linear; + if (power_db > 96) + power_db = 96; + else if (power_db < 1) + return 1; + /* 1dB ~ 96dB */ i = (power_db - 1) >> 3; j = (power_db - 1) - (i << 3); From a24bad74737f4c8814e0669d38dba5f2ddb86514 Mon Sep 17 00:00:00 2001 From: Yan-Hsuan Chuang Date: Tue, 7 May 2019 10:28:18 +0800 Subject: [PATCH 020/145] rtw88: fix unassigned rssi_level in rtw_sta_info The new rssi_level should be stored in si, otherwise the rssi_level will never be updated and get a wrong RA mask, which is calculated by the rssi level If a wrong RA mask is chosen, the firmware will pick some *bad rates*. The most hurtful scene will be in *noisy environment*, such as office or public area with many APs and users. The latency would be high and the overall throughput would be only half or less. Tested in 2.4G in office area, with this patch the throughput increased from such as "1x Mbps -> 4x Mbps". Signed-off-by: Yan-Hsuan Chuang Signed-off-by: Kalle Valo --- drivers/net/wireless/realtek/rtw88/phy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw88/phy.c b/drivers/net/wireless/realtek/rtw88/phy.c index 8284a7797398..77b8c02b5ac6 100644 --- a/drivers/net/wireless/realtek/rtw88/phy.c +++ b/drivers/net/wireless/realtek/rtw88/phy.c @@ -144,10 +144,10 @@ static void rtw_phy_stat_rssi_iter(void *data, struct ieee80211_sta *sta) struct rtw_phy_stat_iter_data *iter_data = data; struct rtw_dev *rtwdev = iter_data->rtwdev; struct rtw_sta_info *si = (struct rtw_sta_info *)sta->drv_priv; - u8 rssi, rssi_level; + u8 rssi; rssi = ewma_rssi_read(&si->avg_rssi); - rssi_level = rtw_phy_get_rssi_level(si->rssi_level, rssi); + si->rssi_level = rtw_phy_get_rssi_level(si->rssi_level, rssi); rtw_fw_send_rssi_info(rtwdev, si); From f57b5d85ed5865f0cd0a6dc4726c995b9e57e28a Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Thu, 23 May 2019 08:30:08 -0700 Subject: [PATCH 021/145] rsi: Properly initialize data in rsi_sdio_ta_reset When building with -Wuninitialized, Clang warns: drivers/net/wireless/rsi/rsi_91x_sdio.c:940:43: warning: variable 'data' is uninitialized when used here [-Wuninitialized] put_unaligned_le32(TA_HOLD_THREAD_VALUE, data); ^~~~ drivers/net/wireless/rsi/rsi_91x_sdio.c:930:10: note: initialize the variable 'data' to silence this warning u8 *data; ^ = NULL 1 warning generated. Using Clang's suggestion of initializing data to NULL wouldn't work out because data will be dereferenced by put_unaligned_le32. Use kzalloc to properly initialize data, which matches a couple of other places in this driver. Fixes: e5a1ecc97e5f ("rsi: add firmware loading for 9116 device") Link: https://github.com/ClangBuiltLinux/linux/issues/464 Signed-off-by: Nathan Chancellor Signed-off-by: Kalle Valo --- drivers/net/wireless/rsi/rsi_91x_sdio.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c index f9c67ed473d1..b42cd50b837e 100644 --- a/drivers/net/wireless/rsi/rsi_91x_sdio.c +++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c @@ -929,11 +929,15 @@ static int rsi_sdio_ta_reset(struct rsi_hw *adapter) u32 addr; u8 *data; + data = kzalloc(RSI_9116_REG_SIZE, GFP_KERNEL); + if (!data) + return -ENOMEM; + status = rsi_sdio_master_access_msword(adapter, TA_BASE_ADDR); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to set ms word to common reg\n"); - return status; + goto err; } rsi_dbg(INIT_ZONE, "%s: Bring TA out of reset\n", __func__); @@ -944,7 +948,7 @@ static int rsi_sdio_ta_reset(struct rsi_hw *adapter) RSI_9116_REG_SIZE); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to hold TA threads\n"); - return status; + goto err; } put_unaligned_le32(TA_SOFT_RST_CLR, data); @@ -954,7 +958,7 @@ static int rsi_sdio_ta_reset(struct rsi_hw *adapter) RSI_9116_REG_SIZE); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to get TA out of reset\n"); - return status; + goto err; } put_unaligned_le32(TA_PC_ZERO, data); @@ -964,7 +968,8 @@ static int rsi_sdio_ta_reset(struct rsi_hw *adapter) RSI_9116_REG_SIZE); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to Reset TA PC value\n"); - return -EINVAL; + status = -EINVAL; + goto err; } put_unaligned_le32(TA_RELEASE_THREAD_VALUE, data); @@ -974,17 +979,19 @@ static int rsi_sdio_ta_reset(struct rsi_hw *adapter) RSI_9116_REG_SIZE); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to release TA threads\n"); - return status; + goto err; } status = rsi_sdio_master_access_msword(adapter, MISC_CFG_BASE_ADDR); if (status < 0) { rsi_dbg(ERR_ZONE, "Unable to set ms word to common reg\n"); - return status; + goto err; } rsi_dbg(INIT_ZONE, "***** TA Reset done *****\n"); - return 0; +err: + kfree(data); + return status; } static struct rsi_host_intf_ops sdio_host_intf_ops = { From 5b0efb4d670c8b53b25c166967efd2a02b309e05 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Fri, 3 May 2019 14:29:07 +0200 Subject: [PATCH 022/145] rtw88: avoid circular locking between local->iflist_mtx and rtwdev->mutex Remove circular lock dependency by using atomic version of interfaces iterate in watch_dog_work(), hence avoid taking local->iflist_mtx (rtw_vif_watch_dog_iter() only update some data, it can be called from atomic context). Fixes below LOCKDEP warning: [ 1157.219415] ====================================================== [ 1157.225772] [ INFO: possible circular locking dependency detected ] [ 1157.232150] 3.10.0-1043.el7.sgruszka1.x86_64.debug #1 Not tainted [ 1157.238346] ------------------------------------------------------- [ 1157.244635] kworker/u4:2/14490 is trying to acquire lock: [ 1157.250194] (&rtwdev->mutex){+.+.+.}, at: [] rtw_ops_config+0x2b/0x90 [rtw88] [ 1157.259151] but task is already holding lock: [ 1157.265085] (&local->iflist_mtx){+.+...}, at: [] ieee80211_mgd_probe_ap.part.28+0xca/0x160 [mac80211] [ 1157.276169] which lock already depends on the new lock. [ 1157.284488] the existing dependency chain (in reverse order) is: [ 1157.292101] -> #2 (&local->iflist_mtx){+.+...}: [ 1157.296919] [] lock_acquire+0x99/0x1e0 [ 1157.302955] [] mutex_lock_nested+0x93/0x410 [ 1157.309416] [] ieee80211_iterate_interfaces+0x2f/0x60 [mac80211] [ 1157.317730] [] rtw_watch_dog_work+0xcb/0x130 [rtw88] [ 1157.325003] [] process_one_work+0x22c/0x720 [ 1157.331481] [] worker_thread+0x126/0x3b0 [ 1157.337589] [] kthread+0xef/0x100 [ 1157.343260] [] ret_from_fork_nospec_end+0x0/0x39 [ 1157.350091] -> #1 ((&(&rtwdev->watch_dog_work)->work)){+.+...}: [ 1157.356314] [] lock_acquire+0x99/0x1e0 [ 1157.362427] [] flush_work+0x5b/0x310 [ 1157.368287] [] __cancel_work_timer+0xae/0x170 [ 1157.374940] [] cancel_delayed_work_sync+0x13/0x20 [ 1157.381930] [] rtw_core_stop+0x29/0x50 [rtw88] [ 1157.388679] [] rtw_enter_ips+0x16/0x20 [rtw88] [ 1157.395428] [] rtw_ops_config+0x42/0x90 [rtw88] [ 1157.402173] [] ieee80211_hw_config+0xc3/0x680 [mac80211] [ 1157.409854] [] ieee80211_do_open+0x69b/0x9c0 [mac80211] [ 1157.417418] [] ieee80211_open+0x69/0x70 [mac80211] [ 1157.424496] [] __dev_open+0xe2/0x160 [ 1157.430356] [] __dev_change_flags+0xa3/0x180 [ 1157.436922] [] dev_change_flags+0x29/0x60 [ 1157.443224] [] devinet_ioctl+0x794/0x890 [ 1157.449331] [] inet_ioctl+0x75/0xa0 [ 1157.455087] [] sock_do_ioctl+0x2b/0x60 [ 1157.461178] [] sock_ioctl+0x233/0x310 [ 1157.467109] [] do_vfs_ioctl+0x410/0x6c0 [ 1157.473233] [] SyS_ioctl+0xa1/0xc0 [ 1157.478914] [] system_call_fastpath+0x25/0x2a [ 1157.485569] -> #0 (&rtwdev->mutex){+.+.+.}: [ 1157.490022] [] __lock_acquire+0xec1/0x1630 [ 1157.496305] [] lock_acquire+0x99/0x1e0 [ 1157.502413] [] mutex_lock_nested+0x93/0x410 [ 1157.508890] [] rtw_ops_config+0x2b/0x90 [rtw88] [ 1157.515724] [] ieee80211_hw_config+0xc3/0x680 [mac80211] [ 1157.523370] [] ieee80211_recalc_ps.part.27+0x9a/0x180 [mac80211] [ 1157.531685] [] ieee80211_mgd_probe_ap.part.28+0x115/0x160 [mac80211] [ 1157.540353] [] ieee80211_beacon_connection_loss_work+0x4d/0x80 [mac80211] [ 1157.549513] [] process_one_work+0x22c/0x720 [ 1157.555886] [] worker_thread+0x126/0x3b0 [ 1157.562170] [] kthread+0xef/0x100 [ 1157.567765] [] ret_from_fork_nospec_end+0x0/0x39 [ 1157.574579] other info that might help us debug this: [ 1157.582788] Chain exists of: &rtwdev->mutex --> (&(&rtwdev->watch_dog_work)->work) --> &local->iflist_mtx [ 1157.593024] Possible unsafe locking scenario: [ 1157.599046] CPU0 CPU1 [ 1157.603653] ---- ---- [ 1157.608258] lock(&local->iflist_mtx); [ 1157.612180] lock((&(&rtwdev->watch_dog_work)->work)); [ 1157.620074] lock(&local->iflist_mtx); [ 1157.626555] lock(&rtwdev->mutex); [ 1157.630124] *** DEADLOCK *** [ 1157.636148] 4 locks held by kworker/u4:2/14490: [ 1157.640755] #0: (%s#6){.+.+.+}, at: [] process_one_work+0x1ba/0x720 [ 1157.648965] #1: ((&ifmgd->beacon_connection_loss_work)){+.+.+.}, at: [] process_one_work+0x1ba/0x720 [ 1157.659950] #2: (&wdev->mtx){+.+.+.}, at: [] ieee80211_mgd_probe_ap.part.28+0x25/0x160 [mac80211] [ 1157.670901] #3: (&local->iflist_mtx){+.+...}, at: [] ieee80211_mgd_probe_ap.part.28+0xca/0x160 [mac80211] [ 1157.682466] Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Stanislaw Gruszka Acked-by: Yan-Hsuan Chuang Signed-off-by: Kalle Valo --- drivers/net/wireless/realtek/rtw88/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c index f447361f7573..b2dac4609138 100644 --- a/drivers/net/wireless/realtek/rtw88/main.c +++ b/drivers/net/wireless/realtek/rtw88/main.c @@ -162,7 +162,8 @@ static void rtw_watch_dog_work(struct work_struct *work) rtwdev->stats.tx_cnt = 0; rtwdev->stats.rx_cnt = 0; - rtw_iterate_vifs(rtwdev, rtw_vif_watch_dog_iter, &data); + /* use atomic version to avoid taking local->iflist_mtx mutex */ + rtw_iterate_vifs_atomic(rtwdev, rtw_vif_watch_dog_iter, &data); /* fw supports only one station associated to enter lps, if there are * more than two stations associated to the AP, then we can not enter From 6aca09771db4277a78853d6ac680d8d5f0d915e3 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Sat, 4 May 2019 18:32:24 +0800 Subject: [PATCH 023/145] rtw88: Make some symbols static Fix sparse warnings: drivers/net/wireless/realtek/rtw88/phy.c:851:4: warning: symbol 'rtw_cck_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/phy.c:852:4: warning: symbol 'rtw_ofdm_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/phy.c:853:4: warning: symbol 'rtw_ht_1s_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/phy.c:854:4: warning: symbol 'rtw_ht_2s_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/phy.c:855:4: warning: symbol 'rtw_vht_1s_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/phy.c:856:4: warning: symbol 'rtw_vht_2s_size' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/fw.c:11:6: warning: symbol 'rtw_fw_c2h_cmd_handle_ext' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/fw.c:50:6: warning: symbol 'rtw_fw_send_h2c_command' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: YueHaibing Signed-off-by: Kalle Valo --- drivers/net/wireless/realtek/rtw88/fw.c | 6 ++++-- drivers/net/wireless/realtek/rtw88/phy.c | 13 +++++++------ 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c index cf4265cda224..628477971213 100644 --- a/drivers/net/wireless/realtek/rtw88/fw.c +++ b/drivers/net/wireless/realtek/rtw88/fw.c @@ -8,7 +8,8 @@ #include "reg.h" #include "debug.h" -void rtw_fw_c2h_cmd_handle_ext(struct rtw_dev *rtwdev, struct sk_buff *skb) +static void rtw_fw_c2h_cmd_handle_ext(struct rtw_dev *rtwdev, + struct sk_buff *skb) { struct rtw_c2h_cmd *c2h; u8 sub_cmd_id; @@ -47,7 +48,8 @@ void rtw_fw_c2h_cmd_handle(struct rtw_dev *rtwdev, struct sk_buff *skb) } } -void rtw_fw_send_h2c_command(struct rtw_dev *rtwdev, u8 *h2c) +static void rtw_fw_send_h2c_command(struct rtw_dev *rtwdev, + u8 *h2c) { u8 box; u8 box_state; diff --git a/drivers/net/wireless/realtek/rtw88/phy.c b/drivers/net/wireless/realtek/rtw88/phy.c index 77b8c02b5ac6..404d89432c96 100644 --- a/drivers/net/wireless/realtek/rtw88/phy.c +++ b/drivers/net/wireless/realtek/rtw88/phy.c @@ -853,12 +853,13 @@ u8 rtw_vht_2s_rates[] = { DESC_RATEVHT2SS_MCS6, DESC_RATEVHT2SS_MCS7, DESC_RATEVHT2SS_MCS8, DESC_RATEVHT2SS_MCS9 }; -u8 rtw_cck_size = ARRAY_SIZE(rtw_cck_rates); -u8 rtw_ofdm_size = ARRAY_SIZE(rtw_ofdm_rates); -u8 rtw_ht_1s_size = ARRAY_SIZE(rtw_ht_1s_rates); -u8 rtw_ht_2s_size = ARRAY_SIZE(rtw_ht_2s_rates); -u8 rtw_vht_1s_size = ARRAY_SIZE(rtw_vht_1s_rates); -u8 rtw_vht_2s_size = ARRAY_SIZE(rtw_vht_2s_rates); + +static u8 rtw_cck_size = ARRAY_SIZE(rtw_cck_rates); +static u8 rtw_ofdm_size = ARRAY_SIZE(rtw_ofdm_rates); +static u8 rtw_ht_1s_size = ARRAY_SIZE(rtw_ht_1s_rates); +static u8 rtw_ht_2s_size = ARRAY_SIZE(rtw_ht_2s_rates); +static u8 rtw_vht_1s_size = ARRAY_SIZE(rtw_vht_1s_rates); +static u8 rtw_vht_2s_size = ARRAY_SIZE(rtw_vht_2s_rates); u8 *rtw_rate_section[RTW_RATE_SECTION_MAX] = { rtw_cck_rates, rtw_ofdm_rates, rtw_ht_1s_rates, rtw_ht_2s_rates, From 0112fa557c3bb3a002bc85760dc3761d737264d3 Mon Sep 17 00:00:00 2001 From: Pradeep Kumar Chitrapu Date: Tue, 28 May 2019 16:36:16 -0700 Subject: [PATCH 024/145] mac80211: free peer keys before vif down in mesh freeing peer keys after vif down is resulting in peer key uninstall to fail due to interface lookup failure. so fix that. Signed-off-by: Pradeep Kumar Chitrapu Signed-off-by: Johannes Berg --- net/mac80211/mesh.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index d5aba5029cb0..fe44f0d98de0 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -929,6 +929,7 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data *sdata) /* flush STAs and mpaths on this iface */ sta_info_flush(sdata); + ieee80211_free_keys(sdata, true); mesh_path_flush_by_iface(sdata); /* stop the beacon */ From 180aa422ef2701bd466bb9ade1923a17adfc6299 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 28 May 2019 14:19:07 +0200 Subject: [PATCH 025/145] nl80211: fill all policy .type entries For old commands, it's fine to have .type = NLA_UNSPEC and it behaves the same as NLA_MIN_LEN. However, for new commands with strict validation this is no longer true, and for policy export to userspace these are also ignored. Fix up the remaining ones that don't have a type. Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 87 ++++++++++++++++++++++++++++++++---------- 1 file changed, 66 insertions(+), 21 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 140d24e5718f..e3c0805af415 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -303,8 +303,11 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_IFINDEX] = { .type = NLA_U32 }, [NL80211_ATTR_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ-1 }, - [NL80211_ATTR_MAC] = { .len = ETH_ALEN }, - [NL80211_ATTR_PREV_BSSID] = { .len = ETH_ALEN }, + [NL80211_ATTR_MAC] = { .type = NLA_EXACT_LEN_WARN, .len = ETH_ALEN }, + [NL80211_ATTR_PREV_BSSID] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_ATTR_KEY] = { .type = NLA_NESTED, }, [NL80211_ATTR_KEY_DATA] = { .type = NLA_BINARY, @@ -355,7 +358,10 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_MESH_CONFIG] = { .type = NLA_NESTED }, [NL80211_ATTR_SUPPORT_MESH_AUTH] = { .type = NLA_FLAG }, - [NL80211_ATTR_HT_CAPABILITY] = { .len = NL80211_HT_CAPABILITY_LEN }, + [NL80211_ATTR_HT_CAPABILITY] = { + .type = NLA_EXACT_LEN_WARN, + .len = NL80211_HT_CAPABILITY_LEN + }, [NL80211_ATTR_MGMT_SUBTYPE] = { .type = NLA_U8 }, [NL80211_ATTR_IE] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, @@ -385,7 +391,10 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_WPA_VERSIONS] = { .type = NLA_U32 }, [NL80211_ATTR_PID] = { .type = NLA_U32 }, [NL80211_ATTR_4ADDR] = { .type = NLA_U8 }, - [NL80211_ATTR_PMKID] = { .len = WLAN_PMKID_LEN }, + [NL80211_ATTR_PMKID] = { + .type = NLA_EXACT_LEN_WARN, + .len = WLAN_PMKID_LEN + }, [NL80211_ATTR_DURATION] = { .type = NLA_U32 }, [NL80211_ATTR_COOKIE] = { .type = NLA_U64 }, [NL80211_ATTR_TX_RATES] = { .type = NLA_NESTED }, @@ -447,7 +456,10 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_WDEV] = { .type = NLA_U64 }, [NL80211_ATTR_USER_REG_HINT_TYPE] = { .type = NLA_U32 }, [NL80211_ATTR_AUTH_DATA] = { .type = NLA_BINARY, }, - [NL80211_ATTR_VHT_CAPABILITY] = { .len = NL80211_VHT_CAPABILITY_LEN }, + [NL80211_ATTR_VHT_CAPABILITY] = { + .type = NLA_EXACT_LEN_WARN, + .len = NL80211_VHT_CAPABILITY_LEN + }, [NL80211_ATTR_SCAN_FLAGS] = { .type = NLA_U32 }, [NL80211_ATTR_P2P_CTWINDOW] = NLA_POLICY_MAX(NLA_U8, 127), [NL80211_ATTR_P2P_OPPPS] = NLA_POLICY_MAX(NLA_U8, 1), @@ -483,7 +495,10 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_VENDOR_DATA] = { .type = NLA_BINARY }, [NL80211_ATTR_QOS_MAP] = { .type = NLA_BINARY, .len = IEEE80211_QOS_MAP_LEN_MAX }, - [NL80211_ATTR_MAC_HINT] = { .len = ETH_ALEN }, + [NL80211_ATTR_MAC_HINT] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_ATTR_WIPHY_FREQ_HINT] = { .type = NLA_U32 }, [NL80211_ATTR_TDLS_PEER_CAPABILITY] = { .type = NLA_U32 }, [NL80211_ATTR_SOCKET_OWNER] = { .type = NLA_FLAG }, @@ -494,7 +509,10 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { NLA_POLICY_MAX(NLA_U8, IEEE80211_NUM_UPS - 1), [NL80211_ATTR_ADMITTED_TIME] = { .type = NLA_U16 }, [NL80211_ATTR_SMPS_MODE] = { .type = NLA_U8 }, - [NL80211_ATTR_MAC_MASK] = { .len = ETH_ALEN }, + [NL80211_ATTR_MAC_MASK] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_ATTR_WIPHY_SELF_MANAGED_REG] = { .type = NLA_FLAG }, [NL80211_ATTR_NETNS_FD] = { .type = NLA_U32 }, [NL80211_ATTR_SCHED_SCAN_DELAY] = { .type = NLA_U32 }, @@ -506,15 +524,21 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_MU_MIMO_GROUP_DATA] = { .len = VHT_MUMIMO_GROUPS_DATA_LEN }, - [NL80211_ATTR_MU_MIMO_FOLLOW_MAC_ADDR] = { .len = ETH_ALEN }, + [NL80211_ATTR_MU_MIMO_FOLLOW_MAC_ADDR] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_ATTR_NAN_MASTER_PREF] = NLA_POLICY_MIN(NLA_U8, 1), [NL80211_ATTR_BANDS] = { .type = NLA_U32 }, [NL80211_ATTR_NAN_FUNC] = { .type = NLA_NESTED }, [NL80211_ATTR_FILS_KEK] = { .type = NLA_BINARY, .len = FILS_MAX_KEK_LEN }, - [NL80211_ATTR_FILS_NONCES] = { .len = 2 * FILS_NONCE_LEN }, + [NL80211_ATTR_FILS_NONCES] = { + .type = NLA_EXACT_LEN_WARN, + .len = 2 * FILS_NONCE_LEN + }, [NL80211_ATTR_MULTICAST_TO_UNICAST_ENABLED] = { .type = NLA_FLAG, }, - [NL80211_ATTR_BSSID] = { .len = ETH_ALEN }, + [NL80211_ATTR_BSSID] = { .type = NLA_EXACT_LEN_WARN, .len = ETH_ALEN }, [NL80211_ATTR_SCHED_SCAN_RELATIVE_RSSI] = { .type = NLA_S8 }, [NL80211_ATTR_SCHED_SCAN_RSSI_ADJUST] = { .len = sizeof(struct nl80211_bss_select_rssi_adjust) @@ -527,7 +551,7 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM] = { .type = NLA_U16 }, [NL80211_ATTR_FILS_ERP_RRK] = { .type = NLA_BINARY, .len = FILS_ERP_MAX_RRK_LEN }, - [NL80211_ATTR_FILS_CACHE_ID] = { .len = 2 }, + [NL80211_ATTR_FILS_CACHE_ID] = { .type = NLA_EXACT_LEN_WARN, .len = 2 }, [NL80211_ATTR_PMK] = { .type = NLA_BINARY, .len = PMK_MAX_LEN }, [NL80211_ATTR_SCHED_SCAN_MULTI] = { .type = NLA_FLAG }, [NL80211_ATTR_EXTERNAL_AUTH_SUPPORT] = { .type = NLA_FLAG }, @@ -588,10 +612,13 @@ static const struct nla_policy nl80211_wowlan_tcp_policy[NUM_NL80211_WOWLAN_TCP] = { [NL80211_WOWLAN_TCP_SRC_IPV4] = { .type = NLA_U32 }, [NL80211_WOWLAN_TCP_DST_IPV4] = { .type = NLA_U32 }, - [NL80211_WOWLAN_TCP_DST_MAC] = { .len = ETH_ALEN }, + [NL80211_WOWLAN_TCP_DST_MAC] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_WOWLAN_TCP_SRC_PORT] = { .type = NLA_U16 }, [NL80211_WOWLAN_TCP_DST_PORT] = { .type = NLA_U16 }, - [NL80211_WOWLAN_TCP_DATA_PAYLOAD] = { .len = 1 }, + [NL80211_WOWLAN_TCP_DATA_PAYLOAD] = { .type = NLA_MIN_LEN, .len = 1 }, [NL80211_WOWLAN_TCP_DATA_PAYLOAD_SEQ] = { .len = sizeof(struct nl80211_wowlan_tcp_data_seq) }, @@ -599,8 +626,8 @@ nl80211_wowlan_tcp_policy[NUM_NL80211_WOWLAN_TCP] = { .len = sizeof(struct nl80211_wowlan_tcp_data_token) }, [NL80211_WOWLAN_TCP_DATA_INTERVAL] = { .type = NLA_U32 }, - [NL80211_WOWLAN_TCP_WAKE_PAYLOAD] = { .len = 1 }, - [NL80211_WOWLAN_TCP_WAKE_MASK] = { .len = 1 }, + [NL80211_WOWLAN_TCP_WAKE_PAYLOAD] = { .type = NLA_MIN_LEN, .len = 1 }, + [NL80211_WOWLAN_TCP_WAKE_MASK] = { .type = NLA_MIN_LEN, .len = 1 }, }; #endif /* CONFIG_PM */ @@ -618,9 +645,18 @@ nl80211_coalesce_policy[NUM_NL80211_ATTR_COALESCE_RULE] = { /* policy for GTK rekey offload attributes */ static const struct nla_policy nl80211_rekey_policy[NUM_NL80211_REKEY_DATA] = { - [NL80211_REKEY_DATA_KEK] = { .len = NL80211_KEK_LEN }, - [NL80211_REKEY_DATA_KCK] = { .len = NL80211_KCK_LEN }, - [NL80211_REKEY_DATA_REPLAY_CTR] = { .len = NL80211_REPLAY_CTR_LEN }, + [NL80211_REKEY_DATA_KEK] = { + .type = NLA_EXACT_LEN_WARN, + .len = NL80211_KEK_LEN, + }, + [NL80211_REKEY_DATA_KCK] = { + .type = NLA_EXACT_LEN_WARN, + .len = NL80211_KCK_LEN, + }, + [NL80211_REKEY_DATA_REPLAY_CTR] = { + .type = NLA_EXACT_LEN_WARN, + .len = NL80211_REPLAY_CTR_LEN + }, }; static const struct nla_policy @@ -634,7 +670,10 @@ static const struct nla_policy nl80211_match_policy[NL80211_SCHED_SCAN_MATCH_ATTR_MAX + 1] = { [NL80211_SCHED_SCAN_MATCH_ATTR_SSID] = { .type = NLA_BINARY, .len = IEEE80211_MAX_SSID_LEN }, - [NL80211_SCHED_SCAN_MATCH_ATTR_BSSID] = { .len = ETH_ALEN }, + [NL80211_SCHED_SCAN_MATCH_ATTR_BSSID] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_SCHED_SCAN_MATCH_ATTR_RSSI] = { .type = NLA_U32 }, [NL80211_SCHED_SCAN_MATCH_PER_BAND_RSSI] = NLA_POLICY_NESTED(nl80211_match_band_rssi_policy), @@ -666,7 +705,10 @@ nl80211_nan_func_policy[NL80211_NAN_FUNC_ATTR_MAX + 1] = { [NL80211_NAN_FUNC_SUBSCRIBE_ACTIVE] = { .type = NLA_FLAG }, [NL80211_NAN_FUNC_FOLLOW_UP_ID] = { .type = NLA_U8 }, [NL80211_NAN_FUNC_FOLLOW_UP_REQ_ID] = { .type = NLA_U8 }, - [NL80211_NAN_FUNC_FOLLOW_UP_DEST] = { .len = ETH_ALEN }, + [NL80211_NAN_FUNC_FOLLOW_UP_DEST] = { + .type = NLA_EXACT_LEN_WARN, + .len = ETH_ALEN + }, [NL80211_NAN_FUNC_CLOSE_RANGE] = { .type = NLA_FLAG }, [NL80211_NAN_FUNC_TTL] = { .type = NLA_U32 }, [NL80211_NAN_FUNC_SERVICE_INFO] = { .type = NLA_BINARY, @@ -4060,7 +4102,10 @@ static const struct nla_policy nl80211_txattr_policy[NL80211_TXRATE_MAX + 1] = { .len = NL80211_MAX_SUPP_RATES }, [NL80211_TXRATE_HT] = { .type = NLA_BINARY, .len = NL80211_MAX_SUPP_HT_RATES }, - [NL80211_TXRATE_VHT] = { .len = sizeof(struct nl80211_txrate_vht)}, + [NL80211_TXRATE_VHT] = { + .type = NLA_EXACT_LEN_WARN, + .len = sizeof(struct nl80211_txrate_vht), + }, [NL80211_TXRATE_GI] = { .type = NLA_U8 }, }; From 9e084bb9805226a3e53e9e28dcd9f3a8d4af35c8 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 29 May 2019 10:57:08 +0100 Subject: [PATCH 026/145] selftests: bpf: move sub-register zero extension checks into subreg.c It is better to centralize all sub-register zero extension checks into an independent file. This patch takes the first step to move existing sub-register zero extension checks into subreg.c. Acked-by: Jakub Kicinski Reviewed-by: Quentin Monnet Signed-off-by: Jiong Wang Signed-off-by: Daniel Borkmann --- .../selftests/bpf/verifier/basic_instr.c | 39 ------------------- tools/testing/selftests/bpf/verifier/subreg.c | 39 +++++++++++++++++++ 2 files changed, 39 insertions(+), 39 deletions(-) create mode 100644 tools/testing/selftests/bpf/verifier/subreg.c diff --git a/tools/testing/selftests/bpf/verifier/basic_instr.c b/tools/testing/selftests/bpf/verifier/basic_instr.c index 4d844089938e..ed91a7b9a456 100644 --- a/tools/testing/selftests/bpf/verifier/basic_instr.c +++ b/tools/testing/selftests/bpf/verifier/basic_instr.c @@ -132,42 +132,3 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, -{ - "and32 reg zero extend check", - .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, -2), - BPF_ALU32_REG(BPF_AND, BPF_REG_0, BPF_REG_2), - BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), - BPF_EXIT_INSN(), - }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, - .result = ACCEPT, - .retval = 0, -}, -{ - "or32 reg zero extend check", - .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, -2), - BPF_ALU32_REG(BPF_OR, BPF_REG_0, BPF_REG_2), - BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), - BPF_EXIT_INSN(), - }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, - .result = ACCEPT, - .retval = 0, -}, -{ - "xor32 reg zero extend check", - .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, 0), - BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_2), - BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), - BPF_EXIT_INSN(), - }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, - .result = ACCEPT, - .retval = 0, -}, diff --git a/tools/testing/selftests/bpf/verifier/subreg.c b/tools/testing/selftests/bpf/verifier/subreg.c new file mode 100644 index 000000000000..edeca3bea35e --- /dev/null +++ b/tools/testing/selftests/bpf/verifier/subreg.c @@ -0,0 +1,39 @@ +{ + "or32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, -2), + BPF_ALU32_REG(BPF_OR, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "and32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, -2), + BPF_ALU32_REG(BPF_AND, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "xor32 reg zero extend check", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, From c25d60c12534ada10312a43f1c584fa7384aa2e7 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 29 May 2019 10:57:09 +0100 Subject: [PATCH 027/145] selftests: bpf: complete sub-register zero extension checks eBPF ISA specification requires high 32-bit cleared when only low 32-bit sub-register is written. JIT back-ends must guarantee this semantics when doing code-gen. This patch complete unit tests for all of those insns that could be visible to JIT back-ends and defining sub-registers, if JIT back-ends failed to guarantee the mentioned semantics, these unit tests will fail. Acked-by: Jakub Kicinski Reviewed-by: Quentin Monnet Signed-off-by: Jiong Wang Signed-off-by: Daniel Borkmann --- tools/testing/selftests/bpf/verifier/subreg.c | 520 +++++++++++++++++- 1 file changed, 507 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/bpf/verifier/subreg.c b/tools/testing/selftests/bpf/verifier/subreg.c index edeca3bea35e..4c4133c80440 100644 --- a/tools/testing/selftests/bpf/verifier/subreg.c +++ b/tools/testing/selftests/bpf/verifier/subreg.c @@ -1,39 +1,533 @@ +/* This file contains sub-register zero extension checks for insns defining + * sub-registers, meaning: + * - All insns under BPF_ALU class. Their BPF_ALU32 variants or narrow width + * forms (BPF_END) could define sub-registers. + * - Narrow direct loads, BPF_B/H/W | BPF_LDX. + * - BPF_LD is not exposed to JIT back-ends, so no need for testing. + * + * "get_prandom_u32" is used to initialize low 32-bit of some registers to + * prevent potential optimizations done by verifier or JIT back-ends which could + * optimize register back into constant when range info shows one register is a + * constant. + */ { - "or32 reg zero extend check", + "add32 reg zero extend check", .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, -2), - BPF_ALU32_REG(BPF_OR, BPF_REG_0, BPF_REG_2), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL), + BPF_ALU32_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), BPF_EXIT_INSN(), }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "add32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + /* An insn could have no effect on the low 32-bit, for example: + * a = a + 0 + * a = a | 0 + * a = a & -1 + * But, they should still zero high 32-bit. + */ + BPF_ALU32_IMM(BPF_ADD, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_ADD, BPF_REG_0, -2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "sub32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x1ffffffffULL), + BPF_ALU32_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "sub32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_SUB, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_SUB, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mul32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL), + BPF_ALU32_REG(BPF_MUL, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mul32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, -1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "div32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_ALU32_REG(BPF_DIV, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "div32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_DIV, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_DIV, BPF_REG_0, 2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "or32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL), + BPF_ALU32_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "or32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_OR, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_OR, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, .result = ACCEPT, .retval = 0, }, { "and32 reg zero extend check", .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, -2), - BPF_ALU32_REG(BPF_AND, BPF_REG_0, BPF_REG_2), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x100000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x1ffffffffULL), + BPF_ALU32_REG(BPF_AND, BPF_REG_0, BPF_REG_1), BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), BPF_EXIT_INSN(), }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 0, +}, +{ + "and32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_AND, BPF_REG_0, -1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_AND, BPF_REG_0, -2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "lsh32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x100000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 1), + BPF_ALU32_REG(BPF_LSH, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "lsh32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_LSH, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_LSH, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "rsh32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 1), + BPF_ALU32_REG(BPF_RSH, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "rsh32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_RSH, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_RSH, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "neg32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_NEG, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mod32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_0, -1), + BPF_ALU32_REG(BPF_MOD, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mod32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_MOD, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_MOD, BPF_REG_0, 2), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, .result = ACCEPT, .retval = 0, }, { "xor32 reg zero extend check", .insns = { - BPF_MOV64_IMM(BPF_REG_0, -1), - BPF_MOV64_IMM(BPF_REG_2, 0), - BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_2), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL), + BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "xor32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_XOR, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mov32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x100000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_0), + BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL), + BPF_MOV32_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "mov32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_MOV32_IMM(BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "arsh32 reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 1), + BPF_ALU32_REG(BPF_ARSH, BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "arsh32 imm zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_ARSH, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_ALU32_IMM(BPF_ARSH, BPF_REG_0, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "end16 (to_le) reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_ENDIAN(BPF_TO_LE, BPF_REG_0, 16), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "end32 (to_le) reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_ENDIAN(BPF_TO_LE, BPF_REG_0, 32), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "end16 (to_be) reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_ENDIAN(BPF_TO_BE, BPF_REG_0, 16), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "end32 (to_be) reg zero extend check", + .insns = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_6), + BPF_ENDIAN(BPF_TO_BE, BPF_REG_0, 32), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "ldx_b zero extend check", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -4), + BPF_ST_MEM(BPF_W, BPF_REG_6, 0, 0xfaceb00c), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_6, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "ldx_h zero extend check", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -4), + BPF_ST_MEM(BPF_W, BPF_REG_6, 0, 0xfaceb00c), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_6, 0), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .retval = 0, +}, +{ + "ldx_w zero extend check", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -4), + BPF_ST_MEM(BPF_W, BPF_REG_6, 0, 0xfaceb00c), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + BPF_LD_IMM64(BPF_REG_1, 0x1000000000ULL), + BPF_ALU64_REG(BPF_OR, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6, 0), BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32), BPF_EXIT_INSN(), }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, .retval = 0, }, From 5fac1718e706d94a7078addee9075cfaea00ca8c Mon Sep 17 00:00:00 2001 From: Alakesh Haloi Date: Tue, 28 May 2019 19:02:18 +0000 Subject: [PATCH 028/145] selftests: bpf: fix compiler warning in flow_dissector test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add missing header file following compiler warning: prog_tests/flow_dissector.c: In function ‘tx_tap’: prog_tests/flow_dissector.c:175:9: warning: implicit declaration of function ‘writev’; did you mean ‘write’? [-Wimplicit-function-declaration] return writev(fd, iov, ARRAY_SIZE(iov)); ^~~~~~ write Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Alakesh Haloi Acked-by: Song Liu Signed-off-by: Daniel Borkmann --- tools/testing/selftests/bpf/prog_tests/flow_dissector.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c index fbd1d88a6095..c938283ac232 100644 --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c @@ -3,6 +3,7 @@ #include #include #include +#include #define CHECK_FLOW_KEYS(desc, got, expected) \ CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0, \ From 13ec7f10b87f5fc04c4ccbd491c94c7980236a74 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 29 May 2019 14:52:19 +0200 Subject: [PATCH 029/145] mwifiex: Fix possible buffer overflows at parsing bss descriptor mwifiex_update_bss_desc_with_ie() calls memcpy() unconditionally in a couple places without checking the destination size. Since the source is given from user-space, this may trigger a heap buffer overflow. Fix it by putting the length check before performing memcpy(). This fix addresses CVE-2019-3846. Reported-by: huangwen Signed-off-by: Takashi Iwai Signed-off-by: Kalle Valo --- drivers/net/wireless/marvell/mwifiex/scan.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c index 935778ec9a1b..64ab6fe78c0d 100644 --- a/drivers/net/wireless/marvell/mwifiex/scan.c +++ b/drivers/net/wireless/marvell/mwifiex/scan.c @@ -1247,6 +1247,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, } switch (element_id) { case WLAN_EID_SSID: + if (element_len > IEEE80211_MAX_SSID_LEN) + return -EINVAL; bss_entry->ssid.ssid_len = element_len; memcpy(bss_entry->ssid.ssid, (current_ptr + 2), element_len); @@ -1256,6 +1258,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_SUPP_RATES: + if (element_len > MWIFIEX_SUPPORTED_RATES) + return -EINVAL; memcpy(bss_entry->data_rates, current_ptr + 2, element_len); memcpy(bss_entry->supported_rates, current_ptr + 2, From 685c9b7750bfacd6fc1db50d86579980593b7869 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 29 May 2019 14:52:20 +0200 Subject: [PATCH 030/145] mwifiex: Abort at too short BSS descriptor element Currently mwifiex_update_bss_desc_with_ie() implicitly assumes that the source descriptor entries contain the enough size for each type and performs copying without checking the source size. This may lead to read over boundary. Fix this by putting the source size check in appropriate places. Signed-off-by: Takashi Iwai Signed-off-by: Kalle Valo --- drivers/net/wireless/marvell/mwifiex/scan.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c index 64ab6fe78c0d..c269a0de9413 100644 --- a/drivers/net/wireless/marvell/mwifiex/scan.c +++ b/drivers/net/wireless/marvell/mwifiex/scan.c @@ -1269,6 +1269,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_FH_PARAMS: + if (element_len + 2 < sizeof(*fh_param_set)) + return -EINVAL; fh_param_set = (struct ieee_types_fh_param_set *) current_ptr; memcpy(&bss_entry->phy_param_set.fh_param_set, @@ -1277,6 +1279,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_DS_PARAMS: + if (element_len + 2 < sizeof(*ds_param_set)) + return -EINVAL; ds_param_set = (struct ieee_types_ds_param_set *) current_ptr; @@ -1288,6 +1292,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_CF_PARAMS: + if (element_len + 2 < sizeof(*cf_param_set)) + return -EINVAL; cf_param_set = (struct ieee_types_cf_param_set *) current_ptr; memcpy(&bss_entry->ss_param_set.cf_param_set, @@ -1296,6 +1302,8 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_IBSS_PARAMS: + if (element_len + 2 < sizeof(*ibss_param_set)) + return -EINVAL; ibss_param_set = (struct ieee_types_ibss_param_set *) current_ptr; @@ -1305,10 +1313,14 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_ERP_INFO: + if (!element_len) + return -EINVAL; bss_entry->erp_flags = *(current_ptr + 2); break; case WLAN_EID_PWR_CONSTRAINT: + if (!element_len) + return -EINVAL; bss_entry->local_constraint = *(current_ptr + 2); bss_entry->sensed_11h = true; break; @@ -1349,6 +1361,9 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter, break; case WLAN_EID_VENDOR_SPECIFIC: + if (element_len + 2 < sizeof(vendor_ie->vend_hdr)) + return -EINVAL; + vendor_ie = (struct ieee_types_vendor_specific *) current_ptr; From cfd4921049269ee6765b4a1cb820b95d0df5dda5 Mon Sep 17 00:00:00 2001 From: Michal Rostecki Date: Wed, 29 May 2019 20:31:09 +0200 Subject: [PATCH 031/145] libbpf: Return btf_fd for load_sk_storage_btf Before this change, function load_sk_storage_btf expected that libbpf__probe_raw_btf was returning a BTF descriptor, but in fact it was returning an information about whether the probe was successful (0 or 1). load_sk_storage_btf was using that value as an argument of the close function, which was resulting in closing stdout and thus terminating the process which called that function. That bug was visible in bpftool. `bpftool feature` subcommand was always exiting too early (because of closed stdout) and it didn't display all requested probes. `bpftool -j feature` or `bpftool -p feature` were not returning a valid json object. This change renames the libbpf__probe_raw_btf function to libbpf__load_raw_btf, which now returns a BTF descriptor, as expected in load_sk_storage_btf. v2: - Fix typo in the commit message. v3: - Simplify BTF descriptor handling in bpf_object__probe_btf_* functions. - Rename libbpf__probe_raw_btf function to libbpf__load_raw_btf and return a BTF descriptor. v4: - Fix typo in the commit message. Fixes: d7c4b3980c18 ("libbpf: detect supported kernel BTF features and sanitize BTF") Signed-off-by: Michal Rostecki Acked-by: Andrii Nakryiko Acked-by: Song Liu Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/libbpf.c | 28 ++++++++++++++++------------ tools/lib/bpf/libbpf_internal.h | 4 ++-- tools/lib/bpf/libbpf_probes.c | 13 ++++--------- 3 files changed, 22 insertions(+), 23 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 197b574406b3..5d046cc7b207 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1645,14 +1645,16 @@ static int bpf_object__probe_btf_func(struct bpf_object *obj) /* FUNC x */ /* [3] */ BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), }; - int res; + int btf_fd; - res = libbpf__probe_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (res < 0) - return res; - if (res > 0) + btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs)); + if (btf_fd >= 0) { obj->caps.btf_func = 1; + close(btf_fd); + return 1; + } + return 0; } @@ -1670,14 +1672,16 @@ static int bpf_object__probe_btf_datasec(struct bpf_object *obj) BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), BTF_VAR_SECINFO_ENC(2, 0, 4), }; - int res; + int btf_fd; - res = libbpf__probe_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (res < 0) - return res; - if (res > 0) + btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs)); + if (btf_fd >= 0) { obj->caps.btf_datasec = 1; + close(btf_fd); + return 1; + } + return 0; } diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index f3025b4d90e1..dfab8012185c 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -34,7 +34,7 @@ do { \ #define pr_info(fmt, ...) __pr(LIBBPF_INFO, fmt, ##__VA_ARGS__) #define pr_debug(fmt, ...) __pr(LIBBPF_DEBUG, fmt, ##__VA_ARGS__) -int libbpf__probe_raw_btf(const char *raw_types, size_t types_len, - const char *str_sec, size_t str_len); +int libbpf__load_raw_btf(const char *raw_types, size_t types_len, + const char *str_sec, size_t str_len); #endif /* __LIBBPF_LIBBPF_INTERNAL_H */ diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index 5e2aa83f637a..6635a31a7a16 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -133,8 +133,8 @@ bool bpf_probe_prog_type(enum bpf_prog_type prog_type, __u32 ifindex) return errno != EINVAL && errno != EOPNOTSUPP; } -int libbpf__probe_raw_btf(const char *raw_types, size_t types_len, - const char *str_sec, size_t str_len) +int libbpf__load_raw_btf(const char *raw_types, size_t types_len, + const char *str_sec, size_t str_len) { struct btf_header hdr = { .magic = BTF_MAGIC, @@ -157,14 +157,9 @@ int libbpf__probe_raw_btf(const char *raw_types, size_t types_len, memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len); btf_fd = bpf_load_btf(raw_btf, btf_len, NULL, 0, false); - if (btf_fd < 0) { - free(raw_btf); - return 0; - } - close(btf_fd); free(raw_btf); - return 1; + return btf_fd; } static int load_sk_storage_btf(void) @@ -190,7 +185,7 @@ static int load_sk_storage_btf(void) BTF_MEMBER_ENC(23, 2, 32),/* struct bpf_spin_lock l; */ }; - return libbpf__probe_raw_btf((char *)types, sizeof(types), + return libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs)); } From 1e692f09e091bf5c8b38384f297d6dae5dbf0f12 Mon Sep 17 00:00:00 2001 From: Luke Nelson Date: Thu, 30 May 2019 15:29:22 -0700 Subject: [PATCH 032/145] bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In BPF, 32-bit ALU operations should zero-extend their results into the 64-bit registers. The current BPF JIT on RISC-V emits incorrect instructions that perform sign extension only (e.g., addw, subw) on 32-bit add, sub, lsh, rsh, arsh, and neg. This behavior diverges from the interpreter and JITs for other architectures. This patch fixes the bugs by performing zero extension on the destination register of 32-bit ALU operations. Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") Cc: Xi Wang Signed-off-by: Luke Nelson Acked-by: Song Liu Acked-by: Björn Töpel Reviewed-by: Palmer Dabbelt Signed-off-by: Alexei Starovoitov --- arch/riscv/net/bpf_jit_comp.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c index e5c8d675bd6e..426d5c33ea90 100644 --- a/arch/riscv/net/bpf_jit_comp.c +++ b/arch/riscv/net/bpf_jit_comp.c @@ -751,10 +751,14 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, case BPF_ALU | BPF_ADD | BPF_X: case BPF_ALU64 | BPF_ADD | BPF_X: emit(is64 ? rv_add(rd, rd, rs) : rv_addw(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_SUB | BPF_X: case BPF_ALU64 | BPF_SUB | BPF_X: emit(is64 ? rv_sub(rd, rd, rs) : rv_subw(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_AND | BPF_X: case BPF_ALU64 | BPF_AND | BPF_X: @@ -795,14 +799,20 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, case BPF_ALU | BPF_LSH | BPF_X: case BPF_ALU64 | BPF_LSH | BPF_X: emit(is64 ? rv_sll(rd, rd, rs) : rv_sllw(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_RSH | BPF_X: case BPF_ALU64 | BPF_RSH | BPF_X: emit(is64 ? rv_srl(rd, rd, rs) : rv_srlw(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_ARSH | BPF_X: case BPF_ALU64 | BPF_ARSH | BPF_X: emit(is64 ? rv_sra(rd, rd, rs) : rv_sraw(rd, rd, rs), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; /* dst = -dst */ @@ -810,6 +820,8 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, case BPF_ALU64 | BPF_NEG: emit(is64 ? rv_sub(rd, RV_REG_ZERO, rd) : rv_subw(rd, RV_REG_ZERO, rd), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; /* dst = BSWAP##imm(dst) */ @@ -964,14 +976,20 @@ out_be: case BPF_ALU | BPF_LSH | BPF_K: case BPF_ALU64 | BPF_LSH | BPF_K: emit(is64 ? rv_slli(rd, rd, imm) : rv_slliw(rd, rd, imm), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_RSH | BPF_K: case BPF_ALU64 | BPF_RSH | BPF_K: emit(is64 ? rv_srli(rd, rd, imm) : rv_srliw(rd, rd, imm), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; case BPF_ALU | BPF_ARSH | BPF_K: case BPF_ALU64 | BPF_ARSH | BPF_K: emit(is64 ? rv_srai(rd, rd, imm) : rv_sraiw(rd, rd, imm), ctx); + if (!is64) + emit_zext_32(rd, ctx); break; /* JUMP off */ From 23f57bfac7c283746ffba5caf4046b152074b2d9 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 29 May 2019 16:39:49 +0300 Subject: [PATCH 033/145] iwlwifi: mvm: remove d3_sram debugfs file This debugfs file is really old, and cannot work properly since the unified image support. Rather than trying to make it work, which is difficult now due to multiple images (LMAC/UMAC etc.) just remove it - we no longer need it since we properly do a FW coredump even in D3 cases. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/mvm/d3.c | 22 ------- .../net/wireless/intel/iwlwifi/mvm/debugfs.c | 57 ------------------- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 - drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 3 - 4 files changed, 84 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c index 60f5d337f16d..e7e68fb2bd29 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c @@ -1972,26 +1972,6 @@ out: } } -static void iwl_mvm_read_d3_sram(struct iwl_mvm *mvm) -{ -#ifdef CONFIG_IWLWIFI_DEBUGFS - const struct fw_img *img = &mvm->fw->img[IWL_UCODE_WOWLAN]; - u32 len = img->sec[IWL_UCODE_SECTION_DATA].len; - u32 offs = img->sec[IWL_UCODE_SECTION_DATA].offset; - - if (!mvm->store_d3_resume_sram) - return; - - if (!mvm->d3_resume_sram) { - mvm->d3_resume_sram = kzalloc(len, GFP_KERNEL); - if (!mvm->d3_resume_sram) - return; - } - - iwl_trans_read_mem_bytes(mvm->trans, offs, mvm->d3_resume_sram, len); -#endif -} - static void iwl_mvm_d3_disconnect_iter(void *data, u8 *mac, struct ieee80211_vif *vif) { @@ -2054,8 +2034,6 @@ static int __iwl_mvm_resume(struct iwl_mvm *mvm, bool test) } iwl_fw_dbg_read_d3_debug_data(&mvm->fwrt); - /* query SRAM first in case we want event logging */ - iwl_mvm_read_d3_sram(mvm); if (iwl_mvm_check_rt_status(mvm, vif)) { set_bit(STATUS_FW_ERROR, &mvm->trans->status); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c index d4ff6b44de2c..5b1bb76c5d28 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c @@ -1557,59 +1557,6 @@ static ssize_t iwl_dbgfs_bcast_filters_macs_write(struct iwl_mvm *mvm, } #endif -#ifdef CONFIG_PM_SLEEP -static ssize_t iwl_dbgfs_d3_sram_write(struct iwl_mvm *mvm, char *buf, - size_t count, loff_t *ppos) -{ - int store; - - if (sscanf(buf, "%d", &store) != 1) - return -EINVAL; - - mvm->store_d3_resume_sram = store; - - return count; -} - -static ssize_t iwl_dbgfs_d3_sram_read(struct file *file, char __user *user_buf, - size_t count, loff_t *ppos) -{ - struct iwl_mvm *mvm = file->private_data; - const struct fw_img *img; - int ofs, len, pos = 0; - size_t bufsz, ret; - char *buf; - u8 *ptr = mvm->d3_resume_sram; - - img = &mvm->fw->img[IWL_UCODE_WOWLAN]; - len = img->sec[IWL_UCODE_SECTION_DATA].len; - - bufsz = len * 4 + 256; - buf = kzalloc(bufsz, GFP_KERNEL); - if (!buf) - return -ENOMEM; - - pos += scnprintf(buf, bufsz, "D3 SRAM capture: %sabled\n", - mvm->store_d3_resume_sram ? "en" : "dis"); - - if (ptr) { - for (ofs = 0; ofs < len; ofs += 16) { - pos += scnprintf(buf + pos, bufsz - pos, - "0x%.4x %16ph\n", ofs, ptr + ofs); - } - } else { - pos += scnprintf(buf + pos, bufsz - pos, - "(no data captured)\n"); - } - - ret = simple_read_from_buffer(user_buf, count, ppos, buf, pos); - - kfree(buf); - - return ret; -} -#endif - #define PRINT_MVM_REF(ref) do { \ if (mvm->refs[ref]) \ pos += scnprintf(buf + pos, bufsz - pos, \ @@ -1940,9 +1887,6 @@ MVM_DEBUGFS_READ_WRITE_FILE_OPS(bcast_filters, 256); MVM_DEBUGFS_READ_WRITE_FILE_OPS(bcast_filters_macs, 256); #endif -#ifdef CONFIG_PM_SLEEP -MVM_DEBUGFS_READ_WRITE_FILE_OPS(d3_sram, 8); -#endif #ifdef CONFIG_ACPI MVM_DEBUGFS_READ_FILE_OPS(sar_geo_profile); #endif @@ -2159,7 +2103,6 @@ void iwl_mvm_dbgfs_register(struct iwl_mvm *mvm, struct dentry *dbgfs_dir) #endif #ifdef CONFIG_PM_SLEEP - MVM_DEBUGFS_ADD_FILE(d3_sram, mvm->debugfs_dir, 0600); MVM_DEBUGFS_ADD_FILE(d3_test, mvm->debugfs_dir, 0400); debugfs_create_bool("d3_wake_sysassert", 0600, mvm->debugfs_dir, &mvm->d3_wake_sysassert); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 8dc2a9850bc5..7b829a5be773 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -1039,8 +1039,6 @@ struct iwl_mvm { #ifdef CONFIG_IWLWIFI_DEBUGFS bool d3_wake_sysassert; bool d3_test_active; - bool store_d3_resume_sram; - void *d3_resume_sram; u32 d3_test_pme_ptr; struct ieee80211_vif *keep_vif; u32 last_netdetect_scans; /* no. of scans in the last net-detect wake */ diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index acd2fda12466..004de67f9157 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -918,9 +918,6 @@ static void iwl_op_mode_mvm_stop(struct iwl_op_mode *op_mode) kfree(mvm->error_recovery_buf); mvm->error_recovery_buf = NULL; -#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_IWLWIFI_DEBUGFS) - kfree(mvm->d3_resume_sram); -#endif iwl_trans_op_mode_leave(mvm->trans); iwl_phy_db_free(mvm->phy_db); From b3500b472c880b5abe90ffd5c4a25aa736f906ad Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Wed, 29 May 2019 16:39:50 +0300 Subject: [PATCH 034/145] iwlwifi: fix load in rfkill flow for unified firmware When we have a single image (same firmware image for INIT and OPERATIONAL), we couldn't load the driver and register to the stack if we had hardware RF-Kill asserted. Fix this. This required a few changes: 1) Run the firmware as part of the INIT phase even if its ucode_type is not IWL_UCODE_INIT. 2) Send the commands that are sent to the unified image in INIT flow even in RF-Kill. 3) Don't ask the transport to stop the hardware upon RF-Kill interrupt if the RF-Kill is asserted. 4) Allow the RF-Kill interrupt to take us out of L1A so that the RF-Kill interrupt will be received by the host (to enable the radio). Signed-off-by: Emmanuel Grumbach Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 23 ++++++++++++++----- .../net/wireless/intel/iwlwifi/mvm/mac80211.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 17 ++++++++++---- .../wireless/intel/iwlwifi/pcie/internal.h | 2 +- 5 files changed, 33 insertions(+), 13 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c index ab68b5d53ec9..153717587aeb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -311,6 +311,8 @@ static int iwl_mvm_load_ucode_wait_alive(struct iwl_mvm *mvm, int ret; enum iwl_ucode_type old_type = mvm->fwrt.cur_fw_img; static const u16 alive_cmd[] = { MVM_ALIVE }; + bool run_in_rfkill = + ucode_type == IWL_UCODE_INIT || iwl_mvm_has_unified_ucode(mvm); if (ucode_type == IWL_UCODE_REGULAR && iwl_fw_dbg_conf_usniffer(mvm->fw, FW_DBG_START_FROM_ALIVE) && @@ -328,7 +330,12 @@ static int iwl_mvm_load_ucode_wait_alive(struct iwl_mvm *mvm, alive_cmd, ARRAY_SIZE(alive_cmd), iwl_alive_fn, &alive_data); - ret = iwl_trans_start_fw(mvm->trans, fw, ucode_type == IWL_UCODE_INIT); + /* + * We want to load the INIT firmware even in RFKILL + * For the unified firmware case, the ucode_type is not + * INIT, but we still need to run it. + */ + ret = iwl_trans_start_fw(mvm->trans, fw, run_in_rfkill); if (ret) { iwl_fw_set_current_image(&mvm->fwrt, old_type); iwl_remove_notification(&mvm->notif_wait, &alive_wait); @@ -433,7 +440,8 @@ static int iwl_run_unified_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) * commands */ ret = iwl_mvm_send_cmd_pdu(mvm, WIDE_ID(SYSTEM_GROUP, - INIT_EXTENDED_CFG_CMD), 0, + INIT_EXTENDED_CFG_CMD), + CMD_SEND_IN_RFKILL, sizeof(init_cfg), &init_cfg); if (ret) { IWL_ERR(mvm, "Failed to run init config command: %d\n", @@ -457,7 +465,8 @@ static int iwl_run_unified_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) } ret = iwl_mvm_send_cmd_pdu(mvm, WIDE_ID(REGULATORY_AND_NVM_GROUP, - NVM_ACCESS_COMPLETE), 0, + NVM_ACCESS_COMPLETE), + CMD_SEND_IN_RFKILL, sizeof(nvm_complete), &nvm_complete); if (ret) { IWL_ERR(mvm, "Failed to run complete NVM access: %d\n", @@ -482,6 +491,8 @@ static int iwl_run_unified_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) } } + mvm->rfkill_safe_init_done = true; + return 0; error: @@ -526,7 +537,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) lockdep_assert_held(&mvm->mutex); - if (WARN_ON_ONCE(mvm->calibrating)) + if (WARN_ON_ONCE(mvm->rfkill_safe_init_done)) return 0; iwl_init_notification_wait(&mvm->notif_wait, @@ -576,7 +587,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) goto remove_notif; } - mvm->calibrating = true; + mvm->rfkill_safe_init_done = true; /* Send TX valid antennas before triggering calibrations */ ret = iwl_send_tx_ant_cfg(mvm, iwl_mvm_get_valid_tx_ant(mvm)); @@ -612,7 +623,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm) remove_notif: iwl_remove_notification(&mvm->notif_wait, &calib_wait); out: - mvm->calibrating = false; + mvm->rfkill_safe_init_done = false; if (iwlmvm_mod_params.init_dbg && !mvm->nvm_data) { /* we want to debug INIT and we have no NVM - fake */ mvm->nvm_data = kzalloc(sizeof(struct iwl_nvm_data) + diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index 5c52469288be..fdbabca0280e 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -1209,7 +1209,7 @@ static void iwl_mvm_restart_cleanup(struct iwl_mvm *mvm) mvm->scan_status = 0; mvm->ps_disabled = false; - mvm->calibrating = false; + mvm->rfkill_safe_init_done = false; /* just in case one was running */ iwl_mvm_cleanup_roc_te(mvm); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 7b829a5be773..02efcf2189c4 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -880,7 +880,7 @@ struct iwl_mvm { struct iwl_mvm_vif *bf_allowed_vif; bool hw_registered; - bool calibrating; + bool rfkill_safe_init_done; bool support_umac_log; u32 ampdu_ref; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 004de67f9157..fad3bf563712 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -1209,7 +1209,8 @@ void iwl_mvm_set_hw_ctkill_state(struct iwl_mvm *mvm, bool state) static bool iwl_mvm_set_hw_rfkill_state(struct iwl_op_mode *op_mode, bool state) { struct iwl_mvm *mvm = IWL_OP_MODE_GET_MVM(op_mode); - bool calibrating = READ_ONCE(mvm->calibrating); + bool rfkill_safe_init_done = READ_ONCE(mvm->rfkill_safe_init_done); + bool unified = iwl_mvm_has_unified_ucode(mvm); if (state) set_bit(IWL_MVM_STATUS_HW_RFKILL, &mvm->status); @@ -1218,15 +1219,23 @@ static bool iwl_mvm_set_hw_rfkill_state(struct iwl_op_mode *op_mode, bool state) iwl_mvm_set_rfkill_state(mvm); - /* iwl_run_init_mvm_ucode is waiting for results, abort it */ - if (calibrating) + /* iwl_run_init_mvm_ucode is waiting for results, abort it. */ + if (rfkill_safe_init_done) iwl_abort_notification_waits(&mvm->notif_wait); + /* + * Don't ask the transport to stop the firmware. We'll do it + * after cfg80211 takes us down. + */ + if (unified) + return false; + /* * Stop the device if we run OPERATIONAL firmware or if we are in the * middle of the calibrations. */ - return state && (mvm->fwrt.cur_fw_img != IWL_UCODE_INIT || calibrating); + return state && (mvm->fwrt.cur_fw_img != IWL_UCODE_INIT || + rfkill_safe_init_done); } static void iwl_mvm_free_skb(struct iwl_op_mode *op_mode, struct sk_buff *skb) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h index b513037dc066..85973dd57234 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h @@ -928,7 +928,7 @@ static inline void iwl_enable_rfkill_int(struct iwl_trans *trans) MSIX_HW_INT_CAUSES_REG_RF_KILL); } - if (trans->cfg->device_family == IWL_DEVICE_FAMILY_9000) { + if (trans->cfg->device_family >= IWL_DEVICE_FAMILY_9000) { /* * On 9000-series devices this bit isn't enabled by default, so * when we power down the device we need set the bit to allow it From 44f61b5c832c4085fcf476484efeaeef96dcfb8b Mon Sep 17 00:00:00 2001 From: Shahar S Matityahu Date: Wed, 29 May 2019 16:39:51 +0300 Subject: [PATCH 035/145] iwlwifi: clear persistence bit according to device family The driver attempts to clear persistence bit on any device familiy even though only 9000 and 22000 families require it. Clear the bit only on the relevant device families. Each HW has different address to the write protection register. Use the right register for each HW Signed-off-by: Shahar S Matityahu Fixes: 8954e1eb2270 ("iwlwifi: trans: Clear persistence bit when starting the FW") Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/iwl-prph.h | 7 ++- .../net/wireless/intel/iwlwifi/pcie/trans.c | 48 +++++++++++++------ 2 files changed, 40 insertions(+), 15 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-prph.h b/drivers/net/wireless/intel/iwlwifi/iwl-prph.h index 8e6a0c363c0d..925f308764bf 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-prph.h +++ b/drivers/net/wireless/intel/iwlwifi/iwl-prph.h @@ -408,7 +408,12 @@ enum aux_misc_master1_en { #define AUX_MISC_MASTER1_SMPHR_STATUS 0xA20800 #define RSA_ENABLE 0xA24B08 #define PREG_AUX_BUS_WPROT_0 0xA04CC0 -#define PREG_PRPH_WPROT_0 0xA04CE0 + +/* device family 9000 WPROT register */ +#define PREG_PRPH_WPROT_9000 0xA04CE0 +/* device family 22000 WPROT register */ +#define PREG_PRPH_WPROT_22000 0xA04D00 + #define SB_CPU_1_STATUS 0xA01E30 #define SB_CPU_2_STATUS 0xA01E34 #define UMAG_SB_CPU_1_STATUS 0xA038C0 diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 803fcbac4152..e9d1075d91db 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1698,10 +1698,40 @@ static int iwl_pcie_init_msix_handler(struct pci_dev *pdev, return 0; } +static int iwl_trans_pcie_clear_persistence_bit(struct iwl_trans *trans) +{ + u32 hpm, wprot; + + switch (trans->cfg->device_family) { + case IWL_DEVICE_FAMILY_9000: + wprot = PREG_PRPH_WPROT_9000; + break; + case IWL_DEVICE_FAMILY_22000: + wprot = PREG_PRPH_WPROT_22000; + break; + default: + return 0; + } + + hpm = iwl_read_umac_prph_no_grab(trans, HPM_DEBUG); + if (hpm != 0xa5a5a5a0 && (hpm & PERSISTENCE_BIT)) { + u32 wprot_val = iwl_read_umac_prph_no_grab(trans, wprot); + + if (wprot_val & PREG_WFPM_ACCESS) { + IWL_ERR(trans, + "Error, can not clear persistence bit\n"); + return -EPERM; + } + iwl_write_umac_prph_no_grab(trans, HPM_DEBUG, + hpm & ~PERSISTENCE_BIT); + } + + return 0; +} + static int _iwl_trans_pcie_start_hw(struct iwl_trans *trans, bool low_power) { struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); - u32 hpm; int err; lockdep_assert_held(&trans_pcie->mutex); @@ -1712,19 +1742,9 @@ static int _iwl_trans_pcie_start_hw(struct iwl_trans *trans, bool low_power) return err; } - hpm = iwl_read_umac_prph_no_grab(trans, HPM_DEBUG); - if (hpm != 0xa5a5a5a0 && (hpm & PERSISTENCE_BIT)) { - int wfpm_val = iwl_read_umac_prph_no_grab(trans, - PREG_PRPH_WPROT_0); - - if (wfpm_val & PREG_WFPM_ACCESS) { - IWL_ERR(trans, - "Error, can not clear persistence bit\n"); - return -EPERM; - } - iwl_write_umac_prph_no_grab(trans, HPM_DEBUG, - hpm & ~PERSISTENCE_BIT); - } + err = iwl_trans_pcie_clear_persistence_bit(trans); + if (err) + return err; iwl_trans_pcie_sw_reset(trans); From cc5470df4495049170d49466415680ee3c2a9a42 Mon Sep 17 00:00:00 2001 From: Shahar S Matityahu Date: Wed, 29 May 2019 16:39:52 +0300 Subject: [PATCH 036/145] iwlwifi: print fseq info upon fw assert Read fseq info from FW registers and print it upon fw assert. The print is needed since the fseq version coming from the TLV might not be the actual version that is used. Signed-off-by: Shahar S Matityahu Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 39 +++++++++++++++++++ drivers/net/wireless/intel/iwlwifi/fw/dbg.h | 2 + drivers/net/wireless/intel/iwlwifi/iwl-prph.h | 15 ++++++- .../net/wireless/intel/iwlwifi/mvm/utils.c | 2 + .../net/wireless/intel/iwlwifi/pcie/trans.c | 3 +- 5 files changed, 59 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c index 5f52e40a2903..33d7bc5500db 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c @@ -2747,3 +2747,42 @@ void iwl_fw_dbg_periodic_trig_handler(struct timer_list *t) jiffies + msecs_to_jiffies(collect_interval)); } } + +#define FSEQ_REG(x) { .addr = (x), .str = #x, } + +void iwl_fw_error_print_fseq_regs(struct iwl_fw_runtime *fwrt) +{ + struct iwl_trans *trans = fwrt->trans; + unsigned long flags; + int i; + struct { + u32 addr; + const char *str; + } fseq_regs[] = { + FSEQ_REG(FSEQ_ERROR_CODE), + FSEQ_REG(FSEQ_TOP_INIT_VERSION), + FSEQ_REG(FSEQ_CNVIO_INIT_VERSION), + FSEQ_REG(FSEQ_OTP_VERSION), + FSEQ_REG(FSEQ_TOP_CONTENT_VERSION), + FSEQ_REG(FSEQ_ALIVE_TOKEN), + FSEQ_REG(FSEQ_CNVI_ID), + FSEQ_REG(FSEQ_CNVR_ID), + FSEQ_REG(CNVI_AUX_MISC_CHIP), + FSEQ_REG(CNVR_AUX_MISC_CHIP), + FSEQ_REG(CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM), + FSEQ_REG(CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR), + }; + + if (!iwl_trans_grab_nic_access(trans, &flags)) + return; + + IWL_ERR(fwrt, "Fseq Registers:\n"); + + for (i = 0; i < ARRAY_SIZE(fseq_regs); i++) + IWL_ERR(fwrt, "0x%08X | %s\n", + iwl_read_prph_no_grab(trans, fseq_regs[i].addr), + fseq_regs[i].str); + + iwl_trans_release_nic_access(trans, &flags); +} +IWL_EXPORT_SYMBOL(iwl_fw_error_print_fseq_regs); diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.h b/drivers/net/wireless/intel/iwlwifi/fw/dbg.h index 2a9e560a906b..fd0ad220e961 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.h @@ -471,4 +471,6 @@ static inline void iwl_fw_error_collect(struct iwl_fw_runtime *fwrt) } void iwl_fw_dbg_periodic_trig_handler(struct timer_list *t); + +void iwl_fw_error_print_fseq_regs(struct iwl_fw_runtime *fwrt); #endif /* __iwl_fw_dbg_h__ */ diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-prph.h b/drivers/net/wireless/intel/iwlwifi/iwl-prph.h index 925f308764bf..8d930bfe0727 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-prph.h +++ b/drivers/net/wireless/intel/iwlwifi/iwl-prph.h @@ -395,7 +395,11 @@ enum { WFPM_AUX_CTL_AUX_IF_MAC_OWNER_MSK = 0x80000000, }; -#define AUX_MISC_REG 0xA200B0 +#define CNVI_AUX_MISC_CHIP 0xA200B0 +#define CNVR_AUX_MISC_CHIP 0xA2B800 +#define CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM 0xA29890 +#define CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR 0xA29938 + enum { HW_STEP_LOCATION_BITS = 24, }; @@ -447,4 +451,13 @@ enum { #define UREG_DOORBELL_TO_ISR6 0xA05C04 #define UREG_DOORBELL_TO_ISR6_NMI_BIT BIT(0) + +#define FSEQ_ERROR_CODE 0xA340C8 +#define FSEQ_TOP_INIT_VERSION 0xA34038 +#define FSEQ_CNVIO_INIT_VERSION 0xA3403C +#define FSEQ_OTP_VERSION 0xA340FC +#define FSEQ_TOP_CONTENT_VERSION 0xA340F4 +#define FSEQ_ALIVE_TOKEN 0xA340F0 +#define FSEQ_CNVI_ID 0xA3408C +#define FSEQ_CNVR_ID 0xA34090 #endif /* __iwl_prph_h__ */ diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c index b9914efc55c4..cc56ab88fb43 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c @@ -596,6 +596,8 @@ void iwl_mvm_dump_nic_error_log(struct iwl_mvm *mvm) iwl_mvm_dump_lmac_error_log(mvm, 1); iwl_mvm_dump_umac_error_log(mvm); + + iwl_fw_error_print_fseq_regs(&mvm->fwrt); } int iwl_mvm_reconfig_scd(struct iwl_mvm *mvm, int queue, int fifo, int sta_id, diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index e9d1075d91db..21da18af0155 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -3546,7 +3546,8 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev, hw_step |= ENABLE_WFPM; iwl_write_umac_prph_no_grab(trans, WFPM_CTRL_REG, hw_step); - hw_step = iwl_read_prph_no_grab(trans, AUX_MISC_REG); + hw_step = iwl_read_prph_no_grab(trans, + CNVI_AUX_MISC_CHIP); hw_step = (hw_step >> HW_STEP_LOCATION_BITS) & 0xF; if (hw_step == 0x3) trans->hw_rev = (trans->hw_rev & 0xFFFFFFF3) | From b17dc0632a17fbfe66b34ee7c24e1cc10cfc503e Mon Sep 17 00:00:00 2001 From: Matt Chen Date: Wed, 29 May 2019 16:39:53 +0300 Subject: [PATCH 037/145] iwlwifi: fix AX201 killer sku loading firmware issue When try to bring up the AX201 2 killer sku, we run into: [81261.392463] iwlwifi 0000:01:00.0: loaded firmware version 46.8c20f243.0 op_mode iwlmvm [81261.407407] iwlwifi 0000:01:00.0: Detected Intel(R) Dual Band Wireless AX 22000, REV=0x340 [81262.424778] iwlwifi 0000:01:00.0: Collecting data: trigger 16 fired. [81262.673359] iwlwifi 0000:01:00.0: Start IWL Error Log Dump: [81262.673365] iwlwifi 0000:01:00.0: Status: 0x00000000, count: -906373681 [81262.673368] iwlwifi 0000:01:00.0: Loaded firmware version: 46.8c20f243.0 [81262.673371] iwlwifi 0000:01:00.0: 0x507C015D | ADVANCED_SYSASSERT Fix this issue by adding 2 more cfg to avoid modifying the original cfg configuration. Signed-off-by: Matt Chen Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 21da18af0155..dfa1bed124aa 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -3598,7 +3598,9 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct pci_dev *pdev, } } else if (CSR_HW_RF_ID_TYPE_CHIP_ID(trans->hw_rf_id) == CSR_HW_RF_ID_TYPE_CHIP_ID(CSR_HW_RF_ID_TYPE_HR) && - (trans->cfg != &iwl_ax200_cfg_cc || + ((trans->cfg != &iwl_ax200_cfg_cc && + trans->cfg != &killer1650x_2ax_cfg && + trans->cfg != &killer1650w_2ax_cfg) || trans->hw_rev == CSR_HW_REV_TYPE_QNJ_B0)) { u32 hw_status; From a8627176b0de7ba3f4524f641ddff4abf23ae4e4 Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Wed, 29 May 2019 16:39:54 +0300 Subject: [PATCH 038/145] iwlwifi: Fix double-free problems in iwl_req_fw_callback() In the error handling code of iwl_req_fw_callback(), iwl_dealloc_ucode() is called to free data. In iwl_drv_stop(), iwl_dealloc_ucode() is called again, which can cause double-free problems. To fix this bug, the call to iwl_dealloc_ucode() in iwl_req_fw_callback() is deleted. This bug is found by a runtime fuzzing tool named FIZZER written by us. Signed-off-by: Jia-Ju Bai Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c index 852d3cbfc719..fba242284507 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c @@ -1597,7 +1597,6 @@ static void iwl_req_fw_callback(const struct firmware *ucode_raw, void *context) goto free; out_free_fw: - iwl_dealloc_ucode(drv); release_firmware(ucode_raw); out_unbind: complete(&drv->request_firmware_complete); From 5f4d55d5791a8b7150dbaba239e92719ae0f94d4 Mon Sep 17 00:00:00 2001 From: Lior Cohen Date: Wed, 29 May 2019 16:39:55 +0300 Subject: [PATCH 039/145] iwlwifi: mvm: change TLC config cmd sent by rs to be async The TLC_MNG_CONFIG sync cmd sent by the rs leads to a kernel warning of sleeping while in rcu read-side critical section. The fix is to change the command to be ASYNC (not blocking for the response anymore). Signed-off-by: Lior Cohen Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo --- drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c index 659e21b2d4e7..be62f499c595 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c @@ -441,7 +441,8 @@ void rs_fw_rate_init(struct iwl_mvm *mvm, struct ieee80211_sta *sta, */ sta->max_amsdu_len = max_amsdu_len; - ret = iwl_mvm_send_cmd_pdu(mvm, cmd_id, 0, sizeof(cfg_cmd), &cfg_cmd); + ret = iwl_mvm_send_cmd_pdu(mvm, cmd_id, CMD_ASYNC, sizeof(cfg_cmd), + &cfg_cmd); if (ret) IWL_ERR(mvm, "Failed to send rate scale config (%d)\n", ret); } From 69ae4f6aac1578575126319d3f55550e7e440449 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 31 May 2019 15:18:41 +0200 Subject: [PATCH 040/145] mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies() A few places in mwifiex_uap_parse_tail_ies() perform memcpy() unconditionally, which may lead to either buffer overflow or read over boundary. This patch addresses the issues by checking the read size and the destination size at each place more properly. Along with the fixes, the patch cleans up the code slightly by introducing a temporary variable for the token size, and unifies the error path with the standard goto statement. Reported-by: huangwen Signed-off-by: Takashi Iwai Signed-off-by: Kalle Valo --- drivers/net/wireless/marvell/mwifiex/ie.c | 47 +++++++++++++++-------- 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/ie.c b/drivers/net/wireless/marvell/mwifiex/ie.c index 6845eb57b39a..653d347a9a19 100644 --- a/drivers/net/wireless/marvell/mwifiex/ie.c +++ b/drivers/net/wireless/marvell/mwifiex/ie.c @@ -329,6 +329,8 @@ static int mwifiex_uap_parse_tail_ies(struct mwifiex_private *priv, struct ieee80211_vendor_ie *vendorhdr; u16 gen_idx = MWIFIEX_AUTO_IDX_MASK, ie_len = 0; int left_len, parsed_len = 0; + unsigned int token_len; + int err = 0; if (!info->tail || !info->tail_len) return 0; @@ -344,6 +346,12 @@ static int mwifiex_uap_parse_tail_ies(struct mwifiex_private *priv, */ while (left_len > sizeof(struct ieee_types_header)) { hdr = (void *)(info->tail + parsed_len); + token_len = hdr->len + sizeof(struct ieee_types_header); + if (token_len > left_len) { + err = -EINVAL; + goto out; + } + switch (hdr->element_id) { case WLAN_EID_SSID: case WLAN_EID_SUPP_RATES: @@ -361,17 +369,20 @@ static int mwifiex_uap_parse_tail_ies(struct mwifiex_private *priv, if (cfg80211_find_vendor_ie(WLAN_OUI_MICROSOFT, WLAN_OUI_TYPE_MICROSOFT_WMM, (const u8 *)hdr, - hdr->len + sizeof(struct ieee_types_header))) + token_len)) break; /* fall through */ default: - memcpy(gen_ie->ie_buffer + ie_len, hdr, - hdr->len + sizeof(struct ieee_types_header)); - ie_len += hdr->len + sizeof(struct ieee_types_header); + if (ie_len + token_len > IEEE_MAX_IE_SIZE) { + err = -EINVAL; + goto out; + } + memcpy(gen_ie->ie_buffer + ie_len, hdr, token_len); + ie_len += token_len; break; } - left_len -= hdr->len + sizeof(struct ieee_types_header); - parsed_len += hdr->len + sizeof(struct ieee_types_header); + left_len -= token_len; + parsed_len += token_len; } /* parse only WPA vendor IE from tail, WMM IE is configured by @@ -381,15 +392,17 @@ static int mwifiex_uap_parse_tail_ies(struct mwifiex_private *priv, WLAN_OUI_TYPE_MICROSOFT_WPA, info->tail, info->tail_len); if (vendorhdr) { - memcpy(gen_ie->ie_buffer + ie_len, vendorhdr, - vendorhdr->len + sizeof(struct ieee_types_header)); - ie_len += vendorhdr->len + sizeof(struct ieee_types_header); + token_len = vendorhdr->len + sizeof(struct ieee_types_header); + if (ie_len + token_len > IEEE_MAX_IE_SIZE) { + err = -EINVAL; + goto out; + } + memcpy(gen_ie->ie_buffer + ie_len, vendorhdr, token_len); + ie_len += token_len; } - if (!ie_len) { - kfree(gen_ie); - return 0; - } + if (!ie_len) + goto out; gen_ie->ie_index = cpu_to_le16(gen_idx); gen_ie->mgmt_subtype_mask = cpu_to_le16(MGMT_MASK_BEACON | @@ -399,13 +412,15 @@ static int mwifiex_uap_parse_tail_ies(struct mwifiex_private *priv, if (mwifiex_update_uap_custom_ie(priv, gen_ie, &gen_idx, NULL, NULL, NULL, NULL)) { - kfree(gen_ie); - return -1; + err = -EINVAL; + goto out; } priv->gen_idx = gen_idx; + + out: kfree(gen_ie); - return 0; + return err; } /* This function parses different IEs-head & tail IEs, beacon IEs, From 4ac30c4b3659efac031818c418beb51e630d512d Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 31 May 2019 15:29:11 -0700 Subject: [PATCH 041/145] bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err __udp6_lib_err() may be called when handling icmpv6 message. For example, the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called which may call reuseport_select_sock(). reuseport_select_sock() will call into a bpf_prog (if there is one). reuseport_select_sock() is expecting the skb->data pointing to the transport header (udphdr in this case). For example, run_bpf_filter() is pulling the transport header. However, in the __udp6_lib_err() path, the skb->data is pointing to the ipv6hdr instead of the udphdr. One option is to pull and push the ipv6hdr in __udp6_lib_err(). Instead of doing this, this patch follows how the original commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") was done in IPv4, which has passed a NULL skb pointer to reuseport_select_sock(). Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Cc: Craig Gallek Signed-off-by: Martin KaFai Lau Acked-by: Song Liu Acked-by: Craig Gallek Signed-off-by: Alexei Starovoitov --- net/ipv6/udp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 07fa579dfb96..133e6370f89c 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -515,7 +515,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, struct net *net = dev_net(skb->dev); sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, - inet6_iif(skb), inet6_sdif(skb), udptable, skb); + inet6_iif(skb), inet6_sdif(skb), udptable, NULL); if (!sk) { /* No socket for error: try tunnels before discarding */ sk = ERR_PTR(-ENOENT); From 257a525fe2e49584842c504a92c27097407f778f Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 31 May 2019 15:29:13 -0700 Subject: [PATCH 042/145] bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro When the commit a6024562ffd7 ("udp: Add GRO functions to UDP socket") added udp[46]_lib_lookup_skb to the udp_gro code path, it broke the reuseport_select_sock() assumption that skb->data is pointing to the transport header. This patch follows an earlier __udp6_lib_err() fix by passing a NULL skb to avoid calling the reuseport's bpf_prog. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Cc: Tom Herbert Signed-off-by: Martin KaFai Lau Acked-by: Song Liu Signed-off-by: Alexei Starovoitov --- net/ipv4/udp.c | 6 +++++- net/ipv6/udp.c | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 8fb250ed53d4..85db0e3d7f3f 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -503,7 +503,11 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, __be16 sport, __be16 dport) { - return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table); + const struct iphdr *iph = ip_hdr(skb); + + return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, + iph->daddr, dport, inet_iif(skb), + inet_sdif(skb), &udp_table, NULL); } EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 133e6370f89c..4e52c37bb836 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -243,7 +243,7 @@ struct sock *udp6_lib_lookup_skb(struct sk_buff *skb, return __udp6_lib_lookup(dev_net(skb->dev), &iph->saddr, sport, &iph->daddr, dport, inet6_iif(skb), - inet6_sdif(skb), &udp_table, skb); + inet6_sdif(skb), &udp_table, NULL); } EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb); From 25a7991c84f6cf68cd1ea2ed3ba5674fb9e8c8be Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Tue, 4 Jun 2019 10:35:05 +0800 Subject: [PATCH 043/145] selftests/bpf: move test_lirc_mode2_user to TEST_GEN_PROGS_EXTENDED test_lirc_mode2_user is included in test_lirc_mode2.sh test and should not be run directly. Fixes: 6bdd533cee9a ("bpf: add selftest for lirc_mode2 type program") Signed-off-by: Hangbin Liu Signed-off-by: Daniel Borkmann --- tools/testing/selftests/bpf/Makefile | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 66f2dca1dee1..e36356e2377e 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -21,8 +21,8 @@ LDLIBS += -lcap -lelf -lrt -lpthread # Order correspond to 'make run_tests' order TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \ - test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \ - test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \ + test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \ + test_cgroup_storage test_select_reuseport test_section_names \ test_netcnt test_tcpnotify_user test_sock_fields test_sysctl BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) @@ -63,7 +63,8 @@ TEST_PROGS_EXTENDED := with_addr.sh \ # Compile but not part of 'make run_tests' TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \ - flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user + flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ + test_lirc_mode2_user include ../lib.mk From 1884c066579a7a274dd981a4d9639ca63db66a23 Mon Sep 17 00:00:00 2001 From: Krzesimir Nowak Date: Wed, 5 Jun 2019 21:17:06 +0200 Subject: [PATCH 044/145] tools: bpftool: Fix JSON output when lookup fails In commit 9a5ab8bf1d6d ("tools: bpftool: turn err() and info() macros into functions") one case of error reporting was special cased, so it could report a lookup error for a specific key when dumping the map element. What the code forgot to do is to wrap the key and value keys into a JSON object, so an example output of pretty JSON dump of a sockhash map (which does not support looking up its values) is: [ "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00" ], "value": { "error": "Operation not supported" }, "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01" ], "value": { "error": "Operation not supported" } ] Note the key-value pairs inside the toplevel array. They should be wrapped inside a JSON object, otherwise it is an invalid JSON. This commit fixes this, so the output now is: [{ "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00" ], "value": { "error": "Operation not supported" } },{ "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01" ], "value": { "error": "Operation not supported" } } ] Fixes: 9a5ab8bf1d6d ("tools: bpftool: turn err() and info() macros into functions") Cc: Quentin Monnet Signed-off-by: Krzesimir Nowak Acked-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann --- tools/bpf/bpftool/map.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 3ec82904ccec..5da5a7311f13 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -716,12 +716,14 @@ static int dump_map_elem(int fd, void *key, void *value, return 0; if (json_output) { + jsonw_start_object(json_wtr); jsonw_name(json_wtr, "key"); print_hex_data_json(key, map_info->key_size); jsonw_name(json_wtr, "value"); jsonw_start_object(json_wtr); jsonw_string_field(json_wtr, "error", strerror(lookup_errno)); jsonw_end_object(json_wtr); + jsonw_end_object(json_wtr); } else { const char *msg = NULL; From 983695fa676568fc0fe5ddd995c7267aabc24632 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:48:57 +0200 Subject: [PATCH 045/145] bpf: fix unconnected udp hooks Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e3779 ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Signed-off-by: Daniel Borkmann Acked-by: Andrey Ignatov Acked-by: Martin KaFai Lau Acked-by: Martynas Pumputis Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup.h | 8 ++++++++ include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 8 ++++++++ kernel/bpf/verifier.c | 12 ++++++++---- net/core/filter.c | 2 ++ net/ipv4/udp.c | 4 ++++ net/ipv6/udp.c | 4 ++++ 7 files changed, 36 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index cb3c6b3b89c8..a7f7a98ec39d 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -238,6 +238,12 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) \ BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_SENDMSG, t_ctx) +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP4_RECVMSG, NULL) + +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_RECVMSG, NULL) + #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \ ({ \ int __ret = 0; \ @@ -339,6 +345,8 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; }) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 63e0cf66f01a..e4114a7e4451 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -192,6 +192,8 @@ enum bpf_attach_type { BPF_LIRC_MODE2, BPF_FLOW_DISSECTOR, BPF_CGROUP_SYSCTL, + BPF_CGROUP_UDP4_RECVMSG, + BPF_CGROUP_UDP6_RECVMSG, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cb5440b02e82..e8ba3a153691 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1581,6 +1581,8 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type, case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: + case BPF_CGROUP_UDP4_RECVMSG: + case BPF_CGROUP_UDP6_RECVMSG: return 0; default: return -EINVAL; @@ -1875,6 +1877,8 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: + case BPF_CGROUP_UDP4_RECVMSG: + case BPF_CGROUP_UDP6_RECVMSG: ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; break; case BPF_CGROUP_SOCK_OPS: @@ -1960,6 +1964,8 @@ static int bpf_prog_detach(const union bpf_attr *attr) case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: + case BPF_CGROUP_UDP4_RECVMSG: + case BPF_CGROUP_UDP6_RECVMSG: ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; break; case BPF_CGROUP_SOCK_OPS: @@ -2011,6 +2017,8 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: + case BPF_CGROUP_UDP4_RECVMSG: + case BPF_CGROUP_UDP6_RECVMSG: case BPF_CGROUP_SOCK_OPS: case BPF_CGROUP_DEVICE: case BPF_CGROUP_SYSCTL: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 95f9354495ad..d2c8a6677ac4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5361,9 +5361,12 @@ static int check_return_code(struct bpf_verifier_env *env) struct tnum range = tnum_range(0, 1); switch (env->prog->type) { + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: + if (env->prog->expected_attach_type == BPF_CGROUP_UDP4_RECVMSG || + env->prog->expected_attach_type == BPF_CGROUP_UDP6_RECVMSG) + range = tnum_range(1, 1); case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: - case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SYSCTL: @@ -5380,16 +5383,17 @@ static int check_return_code(struct bpf_verifier_env *env) } if (!tnum_in(range, reg->var_off)) { + char tn_buf[48]; + verbose(env, "At program exit the register R0 "); if (!tnum_is_unknown(reg->var_off)) { - char tn_buf[48]; - tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); verbose(env, "has value %s", tn_buf); } else { verbose(env, "has unknown scalar value"); } - verbose(env, " should have been 0 or 1\n"); + tnum_strn(tn_buf, sizeof(tn_buf), range); + verbose(env, " should have been in %s\n", tn_buf); return -EINVAL; } return 0; diff --git a/net/core/filter.c b/net/core/filter.c index fdcc504d2dec..2814d785c110 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6748,6 +6748,7 @@ static bool sock_addr_is_valid_access(int off, int size, case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_UDP4_SENDMSG: + case BPF_CGROUP_UDP4_RECVMSG: break; default: return false; @@ -6758,6 +6759,7 @@ static bool sock_addr_is_valid_access(int off, int size, case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UDP6_SENDMSG: + case BPF_CGROUP_UDP6_RECVMSG: break; default: return false; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 85db0e3d7f3f..2d862823cbb7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1783,6 +1783,10 @@ try_again: sin->sin_addr.s_addr = ip_hdr(skb)->saddr; memset(sin->sin_zero, 0, sizeof(sin->sin_zero)); *addr_len = sizeof(*sin); + + if (cgroup_bpf_enabled) + BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, + (struct sockaddr *)sin); } if (udp_sk(sk)->gro_enabled) diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4e52c37bb836..15570d7b9b61 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -369,6 +369,10 @@ try_again: inet6_iif(skb)); } *addr_len = sizeof(*sin6); + + if (cgroup_bpf_enabled) + BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, + (struct sockaddr *)sin6); } if (udp_sk(sk)->gro_enabled) From 3dbc6adac1f3b83fd4c39899c747da7b417e3ffc Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:48:58 +0200 Subject: [PATCH 046/145] bpf: sync tooling uapi header Sync BPF uapi header in order to pull in BPF_CGROUP_UDP{4,6}_RECVMSG attach types. This is done and preferred as an extra patch in order to ease sync of libbpf. Signed-off-by: Daniel Borkmann Acked-by: Andrey Ignatov Acked-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- tools/include/uapi/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 63e0cf66f01a..e4114a7e4451 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -192,6 +192,8 @@ enum bpf_attach_type { BPF_LIRC_MODE2, BPF_FLOW_DISSECTOR, BPF_CGROUP_SYSCTL, + BPF_CGROUP_UDP4_RECVMSG, + BPF_CGROUP_UDP6_RECVMSG, __MAX_BPF_ATTACH_TYPE }; From 9bb59ac1f6c362f14b58187bc56e737780c52c19 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:48:59 +0200 Subject: [PATCH 047/145] bpf, libbpf: enable recvmsg attach types Another trivial patch to libbpf in order to enable identifying and attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG by section name. Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- tools/lib/bpf/libbpf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 5d046cc7b207..151f7ac1882e 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -3210,6 +3210,10 @@ static const struct { BPF_CGROUP_UDP4_SENDMSG), BPF_EAPROG_SEC("cgroup/sendmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG), + BPF_EAPROG_SEC("cgroup/recvmsg4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_UDP4_RECVMSG), + BPF_EAPROG_SEC("cgroup/recvmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_UDP6_RECVMSG), BPF_EAPROG_SEC("cgroup/sysctl", BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL), }; From 000aa1250d572171807b47fb9cd3fadfbcc36ad0 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:49:00 +0200 Subject: [PATCH 048/145] bpf, bpftool: enable recvmsg attach types Trivial patch to bpftool in order to complete enabling attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG. Signed-off-by: Daniel Borkmann Acked-by: Andrey Ignatov Acked-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- tools/bpf/bpftool/Documentation/bpftool-cgroup.rst | 6 +++++- tools/bpf/bpftool/Documentation/bpftool-prog.rst | 2 +- tools/bpf/bpftool/bash-completion/bpftool | 5 +++-- tools/bpf/bpftool/cgroup.c | 5 ++++- tools/bpf/bpftool/prog.c | 3 ++- 5 files changed, 15 insertions(+), 6 deletions(-) diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst index ac26876389c2..e744b3e4e56a 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst @@ -29,7 +29,7 @@ CGROUP COMMANDS | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } | *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** | | **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** | -| **sendmsg4** | **sendmsg6** | **sysctl** } +| **sendmsg4** | **sendmsg6** | **recvmsg4** | **recvmsg6** | **sysctl** } | *ATTACH_FLAGS* := { **multi** | **override** } DESCRIPTION @@ -86,6 +86,10 @@ DESCRIPTION unconnected udp4 socket (since 4.18); **sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected udp6 socket (since 4.18); + **recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for + an unconnected udp4 socket (since 5.2); + **recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for + an unconnected udp6 socket (since 5.2); **sysctl** sysctl access (since 5.2). **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG* diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst index e8118544d118..018ecef8dc13 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst @@ -40,7 +40,7 @@ PROG COMMANDS | **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** | | **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** | | **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** | -| **cgroup/sysctl** +| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** | } | *ATTACH_TYPE* := { | **msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector** diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool index 50e402a5a9c8..4300adf6e5ab 100644 --- a/tools/bpf/bpftool/bash-completion/bpftool +++ b/tools/bpf/bpftool/bash-completion/bpftool @@ -371,6 +371,7 @@ _bpftool() lirc_mode2 cgroup/bind4 cgroup/bind6 \ cgroup/connect4 cgroup/connect6 \ cgroup/sendmsg4 cgroup/sendmsg6 \ + cgroup/recvmsg4 cgroup/recvmsg6 \ cgroup/post_bind4 cgroup/post_bind6 \ cgroup/sysctl" -- \ "$cur" ) ) @@ -666,7 +667,7 @@ _bpftool() attach|detach) local ATTACH_TYPES='ingress egress sock_create sock_ops \ device bind4 bind6 post_bind4 post_bind6 connect4 \ - connect6 sendmsg4 sendmsg6 sysctl' + connect6 sendmsg4 sendmsg6 recvmsg4 recvmsg6 sysctl' local ATTACH_FLAGS='multi override' local PROG_TYPE='id pinned tag' case $prev in @@ -676,7 +677,7 @@ _bpftool() ;; ingress|egress|sock_create|sock_ops|device|bind4|bind6|\ post_bind4|post_bind6|connect4|connect6|sendmsg4|\ - sendmsg6|sysctl) + sendmsg6|recvmsg4|recvmsg6|sysctl) COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \ "$cur" ) ) return 0 diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c index 7e22f115c8c1..73ec8ea33fb4 100644 --- a/tools/bpf/bpftool/cgroup.c +++ b/tools/bpf/bpftool/cgroup.c @@ -25,7 +25,8 @@ " ATTACH_TYPE := { ingress | egress | sock_create |\n" \ " sock_ops | device | bind4 | bind6 |\n" \ " post_bind4 | post_bind6 | connect4 |\n" \ - " connect6 | sendmsg4 | sendmsg6 | sysctl }" + " connect6 | sendmsg4 | sendmsg6 |\n" \ + " recvmsg4 | recvmsg6 | sysctl }" static const char * const attach_type_strings[] = { [BPF_CGROUP_INET_INGRESS] = "ingress", @@ -42,6 +43,8 @@ static const char * const attach_type_strings[] = { [BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4", [BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6", [BPF_CGROUP_SYSCTL] = "sysctl", + [BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4", + [BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6", [__MAX_BPF_ATTACH_TYPE] = NULL, }; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 26336bad0442..7a4e21a31523 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -1063,7 +1063,8 @@ static int do_help(int argc, char **argv) " sk_reuseport | flow_dissector | cgroup/sysctl |\n" " cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n" " cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n" - " cgroup/sendmsg4 | cgroup/sendmsg6 }\n" + " cgroup/sendmsg4 | cgroup/sendmsg6 | cgroup/recvmsg4 |\n" + " cgroup/recvmsg6 }\n" " ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n" " flow_dissector }\n" " " HELP_SPEC_OPTIONS "\n" From 1812291e7661673cc29f52b38d9cc39540dee08e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:49:01 +0200 Subject: [PATCH 049/145] bpf: more msg_name rewrite tests to test_sock_addr Extend test_sock_addr for recvmsg test cases, bigger parts of the sendmsg code can be reused for this. Below are the strace view of the recvmsg rewrites; the sendmsg side does not have a BPF prog connected to it for the context of this test: IPv4 test case: [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x13 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0 [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5 [pid 4846] bind(5, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, 128) = 0 [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 6 [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999995}) [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET, sin_port=htons(4040), sin_addr=inet_addr("192.168.1.254")}, msg_namelen=128->16, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] close(6) = 0 [pid 4846] close(5) = 0 [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x13 /* BPF_??? */}, 112) = 0 IPv6 test case: [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x14 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0 [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 5 [pid 4846] bind(5, {sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 128) = 0 [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 6 [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999996}) [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(6060), inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128->28, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] close(6) = 0 [pid 4846] close(5) = 0 [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x14 /* BPF_??? */}, 112) = 0 test_sock_addr run w/o strace view: # ./test_sock_addr.sh [...] Test case: recvmsg4: return code ok .. [PASS] Test case: recvmsg4: return code !ok .. [PASS] Test case: recvmsg6: return code ok .. [PASS] Test case: recvmsg6: return code !ok .. [PASS] Test case: recvmsg4: rewrite IP & port (asm) .. [PASS] Test case: recvmsg6: rewrite IP & port (asm) .. [PASS] [...] Signed-off-by: Daniel Borkmann Acked-by: Andrey Ignatov Acked-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_sock_addr.c | 213 +++++++++++++++++-- 1 file changed, 197 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c index 3f110eaaf29c..4ecde2392327 100644 --- a/tools/testing/selftests/bpf/test_sock_addr.c +++ b/tools/testing/selftests/bpf/test_sock_addr.c @@ -76,6 +76,7 @@ struct sock_addr_test { enum { LOAD_REJECT, ATTACH_REJECT, + ATTACH_OKAY, SYSCALL_EPERM, SYSCALL_ENOTSUPP, SUCCESS, @@ -88,9 +89,13 @@ static int connect4_prog_load(const struct sock_addr_test *test); static int connect6_prog_load(const struct sock_addr_test *test); static int sendmsg_allow_prog_load(const struct sock_addr_test *test); static int sendmsg_deny_prog_load(const struct sock_addr_test *test); +static int recvmsg_allow_prog_load(const struct sock_addr_test *test); +static int recvmsg_deny_prog_load(const struct sock_addr_test *test); static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test); +static int recvmsg4_rw_asm_prog_load(const struct sock_addr_test *test); static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test); static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test); +static int recvmsg6_rw_asm_prog_load(const struct sock_addr_test *test); static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test); static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test); static int sendmsg6_rw_wildcard_prog_load(const struct sock_addr_test *test); @@ -507,6 +512,92 @@ static struct sock_addr_test tests[] = { SRC6_REWRITE_IP, SYSCALL_EPERM, }, + + /* recvmsg */ + { + "recvmsg4: return code ok", + recvmsg_allow_prog_load, + BPF_CGROUP_UDP4_RECVMSG, + BPF_CGROUP_UDP4_RECVMSG, + AF_INET, + SOCK_DGRAM, + NULL, + 0, + NULL, + 0, + NULL, + ATTACH_OKAY, + }, + { + "recvmsg4: return code !ok", + recvmsg_deny_prog_load, + BPF_CGROUP_UDP4_RECVMSG, + BPF_CGROUP_UDP4_RECVMSG, + AF_INET, + SOCK_DGRAM, + NULL, + 0, + NULL, + 0, + NULL, + LOAD_REJECT, + }, + { + "recvmsg6: return code ok", + recvmsg_allow_prog_load, + BPF_CGROUP_UDP6_RECVMSG, + BPF_CGROUP_UDP6_RECVMSG, + AF_INET6, + SOCK_DGRAM, + NULL, + 0, + NULL, + 0, + NULL, + ATTACH_OKAY, + }, + { + "recvmsg6: return code !ok", + recvmsg_deny_prog_load, + BPF_CGROUP_UDP6_RECVMSG, + BPF_CGROUP_UDP6_RECVMSG, + AF_INET6, + SOCK_DGRAM, + NULL, + 0, + NULL, + 0, + NULL, + LOAD_REJECT, + }, + { + "recvmsg4: rewrite IP & port (asm)", + recvmsg4_rw_asm_prog_load, + BPF_CGROUP_UDP4_RECVMSG, + BPF_CGROUP_UDP4_RECVMSG, + AF_INET, + SOCK_DGRAM, + SERV4_REWRITE_IP, + SERV4_REWRITE_PORT, + SERV4_REWRITE_IP, + SERV4_REWRITE_PORT, + SERV4_IP, + SUCCESS, + }, + { + "recvmsg6: rewrite IP & port (asm)", + recvmsg6_rw_asm_prog_load, + BPF_CGROUP_UDP6_RECVMSG, + BPF_CGROUP_UDP6_RECVMSG, + AF_INET6, + SOCK_DGRAM, + SERV6_REWRITE_IP, + SERV6_REWRITE_PORT, + SERV6_REWRITE_IP, + SERV6_REWRITE_PORT, + SERV6_IP, + SUCCESS, + }, }; static int mk_sockaddr(int domain, const char *ip, unsigned short port, @@ -765,8 +856,8 @@ static int connect6_prog_load(const struct sock_addr_test *test) return load_path(test, CONNECT6_PROG_PATH); } -static int sendmsg_ret_only_prog_load(const struct sock_addr_test *test, - int32_t rc) +static int xmsg_ret_only_prog_load(const struct sock_addr_test *test, + int32_t rc) { struct bpf_insn insns[] = { /* return rc */ @@ -778,12 +869,22 @@ static int sendmsg_ret_only_prog_load(const struct sock_addr_test *test, static int sendmsg_allow_prog_load(const struct sock_addr_test *test) { - return sendmsg_ret_only_prog_load(test, /*rc*/ 1); + return xmsg_ret_only_prog_load(test, /*rc*/ 1); } static int sendmsg_deny_prog_load(const struct sock_addr_test *test) { - return sendmsg_ret_only_prog_load(test, /*rc*/ 0); + return xmsg_ret_only_prog_load(test, /*rc*/ 0); +} + +static int recvmsg_allow_prog_load(const struct sock_addr_test *test) +{ + return xmsg_ret_only_prog_load(test, /*rc*/ 1); +} + +static int recvmsg_deny_prog_load(const struct sock_addr_test *test) +{ + return xmsg_ret_only_prog_load(test, /*rc*/ 0); } static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test) @@ -838,6 +939,47 @@ static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test) return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); } +static int recvmsg4_rw_asm_prog_load(const struct sock_addr_test *test) +{ + struct sockaddr_in src4_rw_addr; + + if (mk_sockaddr(AF_INET, SERV4_IP, SERV4_PORT, + (struct sockaddr *)&src4_rw_addr, + sizeof(src4_rw_addr)) == -1) + return -1; + + struct bpf_insn insns[] = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + + /* if (sk.family == AF_INET && */ + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, + offsetof(struct bpf_sock_addr, family)), + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 6), + + /* sk.type == SOCK_DGRAM) { */ + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, + offsetof(struct bpf_sock_addr, type)), + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 4), + + /* user_ip4 = src4_rw_addr.sin_addr */ + BPF_MOV32_IMM(BPF_REG_7, src4_rw_addr.sin_addr.s_addr), + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, + offsetof(struct bpf_sock_addr, user_ip4)), + + /* user_port = src4_rw_addr.sin_port */ + BPF_MOV32_IMM(BPF_REG_7, src4_rw_addr.sin_port), + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, + offsetof(struct bpf_sock_addr, user_port)), + /* } */ + + /* return 1 */ + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_EXIT_INSN(), + }; + + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); +} + static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test) { return load_path(test, SENDMSG4_PROG_PATH); @@ -901,6 +1043,39 @@ static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test) return sendmsg6_rw_dst_asm_prog_load(test, SERV6_REWRITE_IP); } +static int recvmsg6_rw_asm_prog_load(const struct sock_addr_test *test) +{ + struct sockaddr_in6 src6_rw_addr; + + if (mk_sockaddr(AF_INET6, SERV6_IP, SERV6_PORT, + (struct sockaddr *)&src6_rw_addr, + sizeof(src6_rw_addr)) == -1) + return -1; + + struct bpf_insn insns[] = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + + /* if (sk.family == AF_INET6) { */ + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, + offsetof(struct bpf_sock_addr, family)), + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET6, 10), + + STORE_IPV6(user_ip6, src6_rw_addr.sin6_addr.s6_addr32), + + /* user_port = dst6_rw_addr.sin6_port */ + BPF_MOV32_IMM(BPF_REG_7, src6_rw_addr.sin6_port), + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, + offsetof(struct bpf_sock_addr, user_port)), + /* } */ + + /* return 1 */ + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_EXIT_INSN(), + }; + + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); +} + static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test) { return sendmsg6_rw_dst_asm_prog_load(test, SERV6_V4MAPPED_IP); @@ -1282,13 +1457,13 @@ out: return err; } -static int run_sendmsg_test_case(const struct sock_addr_test *test) +static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg) { socklen_t addr_len = sizeof(struct sockaddr_storage); - struct sockaddr_storage expected_src_addr; - struct sockaddr_storage requested_addr; struct sockaddr_storage expected_addr; - struct sockaddr_storage real_src_addr; + struct sockaddr_storage server_addr; + struct sockaddr_storage sendmsg_addr; + struct sockaddr_storage recvmsg_addr; int clientfd = -1; int servfd = -1; int set_cmsg; @@ -1297,20 +1472,19 @@ static int run_sendmsg_test_case(const struct sock_addr_test *test) if (test->type != SOCK_DGRAM) goto err; - if (init_addrs(test, &requested_addr, &expected_addr, - &expected_src_addr)) + if (init_addrs(test, &sendmsg_addr, &server_addr, &expected_addr)) goto err; /* Prepare server to sendmsg to */ - servfd = start_server(test->type, &expected_addr, addr_len); + servfd = start_server(test->type, &server_addr, addr_len); if (servfd == -1) goto err; - for (set_cmsg = 0; set_cmsg <= 1; ++set_cmsg) { + for (set_cmsg = 0; set_cmsg <= max_cmsg; ++set_cmsg) { if (clientfd >= 0) close(clientfd); - clientfd = sendmsg_to_server(test->type, &requested_addr, + clientfd = sendmsg_to_server(test->type, &sendmsg_addr, addr_len, set_cmsg, /*flags*/0, &err); if (err) @@ -1330,10 +1504,10 @@ static int run_sendmsg_test_case(const struct sock_addr_test *test) * specific packet may differ from the one used by default and * returned by getsockname(2). */ - if (recvmsg_from_client(servfd, &real_src_addr) == -1) + if (recvmsg_from_client(servfd, &recvmsg_addr) == -1) goto err; - if (cmp_addr(&real_src_addr, &expected_src_addr, /*cmp_port*/0)) + if (cmp_addr(&recvmsg_addr, &expected_addr, /*cmp_port*/0)) goto err; } @@ -1366,6 +1540,9 @@ static int run_test_case(int cgfd, const struct sock_addr_test *test) goto out; } else if (test->expected_result == ATTACH_REJECT || err) { goto err; + } else if (test->expected_result == ATTACH_OKAY) { + err = 0; + goto out; } switch (test->attach_type) { @@ -1379,7 +1556,11 @@ static int run_test_case(int cgfd, const struct sock_addr_test *test) break; case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: - err = run_sendmsg_test_case(test); + err = run_xmsg_test_case(test, 1); + break; + case BPF_CGROUP_UDP4_RECVMSG: + case BPF_CGROUP_UDP6_RECVMSG: + err = run_xmsg_test_case(test, 0); break; default: goto err; From b714560f7b38de9f03b8670890ba130d4cc5604e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 7 Jun 2019 01:49:02 +0200 Subject: [PATCH 050/145] bpf: expand section tests for test_section_names Add cgroup/recvmsg{4,6} to test_section_names as well. Test run output: # ./test_section_names libbpf: failed to guess program type based on ELF section name 'InvAliD' libbpf: supported section(type) names are: [...] libbpf: failed to guess attach type based on ELF section name 'InvAliD' libbpf: attachable section(type) names are: [...] libbpf: failed to guess program type based on ELF section name 'cgroup' libbpf: supported section(type) names are: [...] libbpf: failed to guess attach type based on ELF section name 'cgroup' libbpf: attachable section(type) names are: [...] Summary: 38 PASSED, 0 FAILED Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_section_names.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/testing/selftests/bpf/test_section_names.c b/tools/testing/selftests/bpf/test_section_names.c index bebd4fbca1f4..dee2f2eceb0f 100644 --- a/tools/testing/selftests/bpf/test_section_names.c +++ b/tools/testing/selftests/bpf/test_section_names.c @@ -119,6 +119,16 @@ static struct sec_name_test tests[] = { {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG}, {0, BPF_CGROUP_UDP6_SENDMSG}, }, + { + "cgroup/recvmsg4", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_RECVMSG}, + {0, BPF_CGROUP_UDP4_RECVMSG}, + }, + { + "cgroup/recvmsg6", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_RECVMSG}, + {0, BPF_CGROUP_UDP6_RECVMSG}, + }, { "cgroup/sysctl", {0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL}, From 0ed89d777dd6ed3110f1c64c898f84b4ce685e57 Mon Sep 17 00:00:00 2001 From: Alexander Dahl Date: Tue, 22 Jan 2019 14:55:42 +0100 Subject: [PATCH 051/145] can: usb: Kconfig: Remove duplicate menu entry This seems to have slipped in by accident when sorting the entries. Fixes: ffbdd9172ee2f53020f763574b4cdad8d9760a4f Signed-off-by: Alexander Dahl Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/Kconfig | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/net/can/usb/Kconfig b/drivers/net/can/usb/Kconfig index ac3522b77303..4b3d0ddcda79 100644 --- a/drivers/net/can/usb/Kconfig +++ b/drivers/net/can/usb/Kconfig @@ -102,12 +102,6 @@ config CAN_PEAK_USB (see also http://www.peak-system.com). -config CAN_MCBA_USB - tristate "Microchip CAN BUS Analyzer interface" - ---help--- - This driver supports the CAN BUS Analyzer interface - from Microchip (http://www.microchip.com/development-tools/). - config CAN_UCAN tristate "Theobroma Systems UCAN interface" ---help--- From 247e5356a709eb49a0d95ff2a7f07dac05c8252c Mon Sep 17 00:00:00 2001 From: Joakim Zhang Date: Thu, 31 Jan 2019 09:37:22 +0000 Subject: [PATCH 052/145] can: flexcan: fix timeout when set small bitrate Current we can meet timeout issue when setting a small bitrate like 10000 as follows on i.MX6UL EVK board (ipg clock = 66MHZ, per clock = 30MHZ): | root@imx6ul7d:~# ip link set can0 up type can bitrate 10000 A link change request failed with some changes committed already. Interface can0 may have been left with an inconsistent configuration, please check. | RTNETLINK answers: Connection timed out It is caused by calling of flexcan_chip_unfreeze() timeout. Originally the code is using usleep_range(10, 20) for unfreeze operation, but the patch (8badd65 can: flexcan: avoid calling usleep_range from interrupt context) changed it into udelay(10) which is only a half delay of before, there're also some other delay changes. After double to FLEXCAN_TIMEOUT_US to 100 can fix the issue. Meanwhile, Rasmus Villemoes reported that even with a timeout of 100, flexcan_probe() fails on the MPC8309, which requires a value of at least 140 to work reliably. 250 works for everyone. Signed-off-by: Joakim Zhang Reviewed-by: Dong Aisheng Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- drivers/net/can/flexcan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index 1c66fb2ad76b..f97c628eb2ad 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -166,7 +166,7 @@ #define FLEXCAN_MB_CNT_LENGTH(x) (((x) & 0xf) << 16) #define FLEXCAN_MB_CNT_TIMESTAMP(x) ((x) & 0xffff) -#define FLEXCAN_TIMEOUT_US (50) +#define FLEXCAN_TIMEOUT_US (250) /* FLEXCAN hardware feature flags * From 904044dd8fff43e289c11a2f90fa532e946a1d8b Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Tue, 11 Sep 2018 14:47:46 +0300 Subject: [PATCH 053/145] can: xilinx_can: use correct bittiming_const for CAN FD core Commit 9e5f1b273e6a ("can: xilinx_can: add support for Xilinx CAN FD core") added a new can_bittiming_const structure for CAN FD cores that support larger values for tseg1, tseg2, and sjw than previous Xilinx CAN cores, but the commit did not actually take that into use. Fix that. Tested with CAN FD core on a ZynqMP board. Fixes: 9e5f1b273e6a ("can: xilinx_can: add support for Xilinx CAN FD core") Reported-by: Shubhrajyoti Datta Signed-off-by: Anssi Hannula Cc: Michal Simek Reviewed-by: Shubhrajyoti Datta Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- drivers/net/can/xilinx_can.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index f2024404b8d6..63203ff452b5 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -1435,7 +1435,7 @@ static const struct xcan_devtype_data xcan_canfd_data = { XCAN_FLAG_RXMNF | XCAN_FLAG_TX_MAILBOXES | XCAN_FLAG_RX_FIFO_MULTI, - .bittiming_const = &xcan_bittiming_const, + .bittiming_const = &xcan_bittiming_const_canfd, .btr_ts2_shift = XCAN_BTR_TS2_SHIFT_CANFD, .btr_sjw_shift = XCAN_BTR_SJW_SHIFT_CANFD, .bus_clk_name = "s_axi_aclk", From 0df82dcd55832a99363ab7f9fab954fcacdac3ae Mon Sep 17 00:00:00 2001 From: Sean Nyekjaer Date: Tue, 7 May 2019 11:34:37 +0200 Subject: [PATCH 054/145] dt-bindings: can: mcp251x: add mcp25625 support Fully compatible with mcp2515, the mcp25625 have integrated transceiver. This patch add the mcp25625 to the device tree bindings documentation. Signed-off-by: Sean Nyekjaer Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt index 188c8bd4eb67..5a0111d4de58 100644 --- a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt +++ b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt @@ -4,6 +4,7 @@ Required properties: - compatible: Should be one of the following: - "microchip,mcp2510" for MCP2510. - "microchip,mcp2515" for MCP2515. + - "microchip,mcp25625" for MCP25625. - reg: SPI chip select. - clocks: The clock feeding the CAN controller. - interrupts: Should contain IRQ line for the CAN controller. From 35b7fa4d07c43ad79b88e6462119e7140eae955c Mon Sep 17 00:00:00 2001 From: Sean Nyekjaer Date: Tue, 7 May 2019 11:34:36 +0200 Subject: [PATCH 055/145] can: mcp251x: add support for mcp25625 Fully compatible with mcp2515, the mcp25625 have integrated transceiver. This patch adds support for the mcp25625 to the existing mcp251x driver. Signed-off-by: Sean Nyekjaer Signed-off-by: Marc Kleine-Budde --- drivers/net/can/spi/Kconfig | 5 +++-- drivers/net/can/spi/mcp251x.c | 25 ++++++++++++++++--------- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/net/can/spi/Kconfig b/drivers/net/can/spi/Kconfig index 2e7e535e9237..1c50788055cb 100644 --- a/drivers/net/can/spi/Kconfig +++ b/drivers/net/can/spi/Kconfig @@ -9,9 +9,10 @@ config CAN_HI311X Driver for the Holt HI311x SPI CAN controllers. config CAN_MCP251X - tristate "Microchip MCP251x SPI CAN controllers" + tristate "Microchip MCP251x and MCP25625 SPI CAN controllers" depends on HAS_DMA ---help--- - Driver for the Microchip MCP251x SPI CAN controllers. + Driver for the Microchip MCP251x and MCP25625 SPI CAN + controllers. endmenu diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index e90817608645..da64e71a62ee 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -1,5 +1,5 @@ /* - * CAN bus driver for Microchip 251x CAN Controller with SPI Interface + * CAN bus driver for Microchip 251x/25625 CAN Controller with SPI Interface * * MCP2510 support and bug fixes by Christian Pellegrin * @@ -41,7 +41,7 @@ * static struct spi_board_info spi_board_info[] = { * { * .modalias = "mcp2510", - * // or "mcp2515" depending on your controller + * // "mcp2515" or "mcp25625" depending on your controller * .platform_data = &mcp251x_info, * .irq = IRQ_EINT13, * .max_speed_hz = 2*1000*1000, @@ -238,6 +238,7 @@ static const struct can_bittiming_const mcp251x_bittiming_const = { enum mcp251x_model { CAN_MCP251X_MCP2510 = 0x2510, CAN_MCP251X_MCP2515 = 0x2515, + CAN_MCP251X_MCP25625 = 0x25625, }; struct mcp251x_priv { @@ -280,7 +281,6 @@ static inline int mcp251x_is_##_model(struct spi_device *spi) \ } MCP251X_IS(2510); -MCP251X_IS(2515); static void mcp251x_clean(struct net_device *net) { @@ -639,7 +639,7 @@ static int mcp251x_hw_reset(struct spi_device *spi) /* Wait for oscillator startup timer after reset */ mdelay(MCP251X_OST_DELAY_MS); - + reg = mcp251x_read_reg(spi, CANSTAT); if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF) return -ENODEV; @@ -820,9 +820,8 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id) /* receive buffer 0 */ if (intf & CANINTF_RX0IF) { mcp251x_hw_rx(spi, 0); - /* - * Free one buffer ASAP - * (The MCP2515 does this automatically.) + /* Free one buffer ASAP + * (The MCP2515/25625 does this automatically.) */ if (mcp251x_is_2510(spi)) mcp251x_write_bits(spi, CANINTF, CANINTF_RX0IF, 0x00); @@ -831,7 +830,7 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id) /* receive buffer 1 */ if (intf & CANINTF_RX1IF) { mcp251x_hw_rx(spi, 1); - /* the MCP2515 does this automatically */ + /* The MCP2515/25625 does this automatically. */ if (mcp251x_is_2510(spi)) clear_intf |= CANINTF_RX1IF; } @@ -1006,6 +1005,10 @@ static const struct of_device_id mcp251x_of_match[] = { .compatible = "microchip,mcp2515", .data = (void *)CAN_MCP251X_MCP2515, }, + { + .compatible = "microchip,mcp25625", + .data = (void *)CAN_MCP251X_MCP25625, + }, { } }; MODULE_DEVICE_TABLE(of, mcp251x_of_match); @@ -1019,6 +1022,10 @@ static const struct spi_device_id mcp251x_id_table[] = { .name = "mcp2515", .driver_data = (kernel_ulong_t)CAN_MCP251X_MCP2515, }, + { + .name = "mcp25625", + .driver_data = (kernel_ulong_t)CAN_MCP251X_MCP25625, + }, { } }; MODULE_DEVICE_TABLE(spi, mcp251x_id_table); @@ -1259,5 +1266,5 @@ module_spi_driver(mcp251x_can_driver); MODULE_AUTHOR("Chris Elston , " "Christian Pellegrin "); -MODULE_DESCRIPTION("Microchip 251x CAN driver"); +MODULE_DESCRIPTION("Microchip 251x/25625 CAN driver"); MODULE_LICENSE("GPL v2"); From 3e82f2f34c930a2a0a9e69fdc2de2f2f1388b442 Mon Sep 17 00:00:00 2001 From: Eugen Hristev Date: Mon, 4 Mar 2019 14:44:13 +0000 Subject: [PATCH 056/145] can: m_can: implement errata "Needless activation of MRAF irq" During frame reception while the MCAN is in Error Passive state and the Receive Error Counter has thevalue MCAN_ECR.REC = 127, it may happen that MCAN_IR.MRAF is set although there was no Message RAM access failure. If MCAN_IR.MRAF is enabled, an interrupt to the Host CPU is generated. Work around: The Message RAM Access Failure interrupt routine needs to check whether MCAN_ECR.RP = '1' and MCAN_ECR.REC = '127'. In this case, reset MCAN_IR.MRAF. No further action is required. This affects versions older than 3.2.0 Errata explained on Sama5d2 SoC which includes this hardware block: http://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Family-Silicon-Errata-and-Data-Sheet-Clarification-DS80000803B.pdf chapter 6.2 Reproducibility: If 2 devices with m_can are connected back to back, configuring different bitrate on them will lead to interrupt storm on the receiving side, with error "Message RAM access failure occurred". Another way is to have a bad hardware connection. Bad wire connection can lead to this issue as well. This patch fixes the issue according to provided workaround. Signed-off-by: Eugen Hristev Reviewed-by: Ludovic Desroches Signed-off-by: Marc Kleine-Budde --- drivers/net/can/m_can/m_can.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 9b449400376b..deb274a19ba0 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -822,6 +822,27 @@ static int m_can_poll(struct napi_struct *napi, int quota) if (!irqstatus) goto end; + /* Errata workaround for issue "Needless activation of MRAF irq" + * During frame reception while the MCAN is in Error Passive state + * and the Receive Error Counter has the value MCAN_ECR.REC = 127, + * it may happen that MCAN_IR.MRAF is set although there was no + * Message RAM access failure. + * If MCAN_IR.MRAF is enabled, an interrupt to the Host CPU is generated + * The Message RAM Access Failure interrupt routine needs to check + * whether MCAN_ECR.RP = ’1’ and MCAN_ECR.REC = 127. + * In this case, reset MCAN_IR.MRAF. No further action is required. + */ + if ((priv->version <= 31) && (irqstatus & IR_MRAF) && + (m_can_read(priv, M_CAN_ECR) & ECR_RP)) { + struct can_berr_counter bec; + + __m_can_get_berr_counter(dev, &bec); + if (bec.rxerr == 127) { + m_can_write(priv, M_CAN_IR, IR_MRAF); + irqstatus &= ~IR_MRAF; + } + } + psr = m_can_read(priv, M_CAN_PSR); if (irqstatus & IR_ERR_STATE) work_done += m_can_handle_state_errors(dev, psr); From c5a3aed1cd3152429348ee1fe5cdcca65fe901ce Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 16 May 2019 22:36:26 +0800 Subject: [PATCH 057/145] can: af_can: Fix error path of can_init() This patch add error path for can_init() to avoid possible crash if some error occurs. Fixes: 0d66548a10cb ("[CAN]: Add PF_CAN core module") Signed-off-by: YueHaibing Acked-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- net/can/af_can.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/net/can/af_can.c b/net/can/af_can.c index e8fd5dc1780a..743470680127 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -952,6 +952,8 @@ static struct pernet_operations can_pernet_ops __read_mostly = { static __init int can_init(void) { + int err; + /* check for correct padding to be able to use the structs similarly */ BUILD_BUG_ON(offsetof(struct can_frame, can_dlc) != offsetof(struct canfd_frame, len) || @@ -965,15 +967,31 @@ static __init int can_init(void) if (!rcv_cache) return -ENOMEM; - register_pernet_subsys(&can_pernet_ops); + err = register_pernet_subsys(&can_pernet_ops); + if (err) + goto out_pernet; /* protocol register */ - sock_register(&can_family_ops); - register_netdevice_notifier(&can_netdev_notifier); + err = sock_register(&can_family_ops); + if (err) + goto out_sock; + err = register_netdevice_notifier(&can_netdev_notifier); + if (err) + goto out_notifier; + dev_add_pack(&can_packet); dev_add_pack(&canfd_packet); return 0; + +out_notifier: + sock_unregister(PF_CAN); +out_sock: + unregister_pernet_subsys(&can_pernet_ops); +out_pernet: + kmem_cache_destroy(rcv_cache); + + return err; } static __exit void can_exit(void) From eb503004a7e563d543c9cb869907156de7efe720 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Tue, 4 Jun 2019 15:49:42 -0300 Subject: [PATCH 058/145] can: flexcan: Remove unneeded registration message Currently the following message is observed when the flexcan driver is probed: flexcan 2090000.flexcan: device registered (reg_base=(ptrval), irq=23) The reason for printing 'ptrval' is explained at Documentation/core-api/printk-formats.rst: "Pointers printed without a specifier extension (i.e unadorned %p) are hashed to prevent leaking information about the kernel memory layout. This has the added benefit of providing a unique identifier. On 64-bit machines the first 32 bits are zeroed. The kernel will print ``(ptrval)`` until it gathers enough entropy." Instead of passing %pK, which can print the correct address, simply remove the entire message as it is not really that useful. Signed-off-by: Fabio Estevam Signed-off-by: Marc Kleine-Budde --- drivers/net/can/flexcan.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index f97c628eb2ad..f2fe344593d5 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -1583,9 +1583,6 @@ static int flexcan_probe(struct platform_device *pdev) dev_dbg(&pdev->dev, "failed to setup stop-mode\n"); } - dev_info(&pdev->dev, "device registered (reg_base=%p, irq=%d)\n", - priv->regs, dev->irq); - return 0; failed_register: From fd704bd5ee749d560e86c4f1fd2ef486d8abf7cf Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Fri, 7 Jun 2019 16:46:07 -0400 Subject: [PATCH 059/145] can: purge socket error queue on sock destruct CAN supports software tx timestamps as of the below commit. Purge any queued timestamp packets on socket destroy. Fixes: 51f31cabe3ce ("ip: support for TX timestamps on UDP and RAW sockets") Reported-by: syzbot+a90604060cb40f5bdd16@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- net/can/af_can.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/can/af_can.c b/net/can/af_can.c index 743470680127..80281ef2ccbd 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -99,6 +99,7 @@ EXPORT_SYMBOL(can_ioctl); static void can_sock_destruct(struct sock *sk) { skb_queue_purge(&sk->sk_receive_queue); + skb_queue_purge(&sk->sk_error_queue); } static const struct can_proto *can_get_proto(int protocol) From 6a6fabbfa3e8c656ff906ae999fb6856410fa4cd Mon Sep 17 00:00:00 2001 From: Edward Srouji Date: Thu, 23 May 2019 19:45:38 +0300 Subject: [PATCH 060/145] net/mlx5: Update pci error handler entries and command translation Add missing entries for create/destroy UCTX and UMEM commands. This could get us wrong "unknown FW command" error in flows where we unbind the device or reset the driver. Also the translation of these commands from opcodes to string was missing. Fixes: 6e3722baac04 ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation") Signed-off-by: Edward Srouji Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index d2ab8cd8ad9f..e94686c42000 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -441,6 +441,10 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_CREATE_GENERAL_OBJECT: case MLX5_CMD_OP_MODIFY_GENERAL_OBJECT: case MLX5_CMD_OP_QUERY_GENERAL_OBJECT: + case MLX5_CMD_OP_CREATE_UCTX: + case MLX5_CMD_OP_DESTROY_UCTX: + case MLX5_CMD_OP_CREATE_UMEM: + case MLX5_CMD_OP_DESTROY_UMEM: case MLX5_CMD_OP_ALLOC_MEMIC: *status = MLX5_DRIVER_STATUS_ABORTED; *synd = MLX5_DRIVER_SYND; @@ -629,6 +633,10 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(ALLOC_MEMIC); MLX5_COMMAND_STR_CASE(DEALLOC_MEMIC); MLX5_COMMAND_STR_CASE(QUERY_HOST_PARAMS); + MLX5_COMMAND_STR_CASE(CREATE_UCTX); + MLX5_COMMAND_STR_CASE(DESTROY_UCTX); + MLX5_COMMAND_STR_CASE(CREATE_UMEM); + MLX5_COMMAND_STR_CASE(DESTROY_UMEM); default: return "unknown command opcode"; } } From dd80857bf388abd0c64dd3aa4fbf7d407deba819 Mon Sep 17 00:00:00 2001 From: Alaa Hleihel Date: Sun, 19 May 2019 11:11:49 +0300 Subject: [PATCH 061/145] net/mlx5: Avoid reloading already removed devices Prior to reloading a device we must first verify that it was not already removed. Otherwise, the attempt to remove the device will do nothing, and in that case we will end up proceeding with adding an new device that no one was expecting to remove, leaving behind used resources such as EQs that causes a failure to destroy comp EQs and syndrome (0x30f433). Fix that by making sure that we try to remove and add a device (based on a protocol) only if the device is already added. Fixes: c5447c70594b ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes") Signed-off-by: Alaa Hleihel Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/dev.c | 25 +++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c index ebc046fa97d3..f6b1da99e6c2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c @@ -248,11 +248,32 @@ void mlx5_unregister_interface(struct mlx5_interface *intf) } EXPORT_SYMBOL(mlx5_unregister_interface); +/* Must be called with intf_mutex held */ +static bool mlx5_has_added_dev_by_protocol(struct mlx5_core_dev *mdev, int protocol) +{ + struct mlx5_device_context *dev_ctx; + struct mlx5_interface *intf; + bool found = false; + + list_for_each_entry(intf, &intf_list, list) { + if (intf->protocol == protocol) { + dev_ctx = mlx5_get_device(intf, &mdev->priv); + if (dev_ctx && test_bit(MLX5_INTERFACE_ADDED, &dev_ctx->state)) + found = true; + break; + } + } + + return found; +} + void mlx5_reload_interface(struct mlx5_core_dev *mdev, int protocol) { mutex_lock(&mlx5_intf_mutex); - mlx5_remove_dev_by_protocol(mdev, protocol); - mlx5_add_dev_by_protocol(mdev, protocol); + if (mlx5_has_added_dev_by_protocol(mdev, protocol)) { + mlx5_remove_dev_by_protocol(mdev, protocol); + mlx5_add_dev_by_protocol(mdev, protocol); + } mutex_unlock(&mlx5_intf_mutex); } From d3cbd4254df881777e2efb68ee10ede0d9dc0647 Mon Sep 17 00:00:00 2001 From: Chris Mi Date: Thu, 16 May 2019 17:36:43 +0800 Subject: [PATCH 062/145] net/mlx5e: Add ndo_set_feature for uplink representor After we have a dedicated uplink representor, the new netdev ops doesn't support ndo_set_feature. Because of that, we can't change some features, eg. rxvlan. Now add it back. In this patch, I also do a cleanup for the features flag handling, eg. remove duplicate NETIF_F_HW_TC flag setting. Fixes: aec002f6f82c ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Chris Mi Reviewed-by: Roi Dayan Reviewed-by: Vlad Buslov Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +-- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 ++++++---- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 3a183d690e23..ab027f57725c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1112,6 +1112,7 @@ void mlx5e_del_vxlan_port(struct net_device *netdev, struct udp_tunnel_info *ti) netdev_features_t mlx5e_features_check(struct sk_buff *skb, struct net_device *netdev, netdev_features_t features); +int mlx5e_set_features(struct net_device *netdev, netdev_features_t features); #ifdef CONFIG_MLX5_ESWITCH int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac); int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate, int max_tx_rate); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index c65cefd84eda..cd490ae330d8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3635,8 +3635,7 @@ static int mlx5e_handle_feature(struct net_device *netdev, return 0; } -static int mlx5e_set_features(struct net_device *netdev, - netdev_features_t features) +int mlx5e_set_features(struct net_device *netdev, netdev_features_t features) { netdev_features_t oper_features = netdev->features; int err = 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 9aea9c5b2ce8..2f406b161bcf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1351,6 +1351,7 @@ static const struct net_device_ops mlx5e_netdev_ops_uplink_rep = { .ndo_get_vf_stats = mlx5e_get_vf_stats, .ndo_set_vf_vlan = mlx5e_uplink_rep_set_vf_vlan, .ndo_get_port_parent_id = mlx5e_rep_get_port_parent_id, + .ndo_set_features = mlx5e_set_features, }; bool mlx5e_eswitch_rep(struct net_device *netdev) @@ -1425,10 +1426,9 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev) netdev->watchdog_timeo = 15 * HZ; + netdev->features |= NETIF_F_NETNS_LOCAL; - netdev->features |= NETIF_F_HW_TC | NETIF_F_NETNS_LOCAL; - netdev->hw_features |= NETIF_F_HW_TC; - + netdev->hw_features |= NETIF_F_HW_TC; netdev->hw_features |= NETIF_F_SG; netdev->hw_features |= NETIF_F_IP_CSUM; netdev->hw_features |= NETIF_F_IPV6_CSUM; @@ -1437,7 +1437,9 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev) netdev->hw_features |= NETIF_F_TSO6; netdev->hw_features |= NETIF_F_RXCSUM; - if (rep->vport != MLX5_VPORT_UPLINK) + if (rep->vport == MLX5_VPORT_UPLINK) + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX; + else netdev->features |= NETIF_F_VLAN_CHALLENGED; netdev->features |= netdev->hw_features; From 57c70d8740f740498a52f9c0c0d7295829b944de Mon Sep 17 00:00:00 2001 From: Shay Agroskin Date: Sun, 28 Apr 2019 10:14:23 +0300 Subject: [PATCH 063/145] net/mlx5e: Replace reciprocal_scale in TX select queue function The TX queue index returned by the fallback function ranges between [0,NUM CHANNELS - 1] if QoS isn't set and [0, (NUM CHANNELS)*(NUM TCs) -1] otherwise. Our HW uses different TC mapping than the fallback function (which is denoted as 'up', user priority) so we only need to extract a channel number out of the returned value. Since (NUM CHANNELS)*(NUM TCs) is a relatively small number, using reciprocal scale almost always returns zero. We instead access the 'txq2sq' table to extract the sq (and with it the channel number) associated with the tx queue, thus getting a more evenly distributed channel number. Perf: Rx/Tx side with Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz and ConnectX-5. Used 'iperf' UDP traffic, 10 threads, and priority 5. Before: 0.566Mpps After: 2.37Mpps As expected, releasing the existing bottleneck of steering all traffic to TX queue zero significantly improves transmission rates. Fixes: 7ccdd0841b30 ("net/mlx5e: Fix select queue callback") Signed-off-by: Shay Agroskin Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 12 ++++++------ 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index ab027f57725c..cc6797e24571 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -385,6 +385,7 @@ struct mlx5e_txqsq { /* control path */ struct mlx5_wq_ctrl wq_ctrl; struct mlx5e_channel *channel; + int ch_ix; int txq_ix; u32 rate_limit; struct work_struct recover_work; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index cd490ae330d8..564692227c16 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1082,6 +1082,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c, sq->clock = &mdev->clock; sq->mkey_be = c->mkey_be; sq->channel = c; + sq->ch_ix = c->ix; sq->txq_ix = txq_ix; sq->uar_map = mdev->mlx5e_res.bfreg.map; sq->min_inline_mode = params->tx_min_inline_mode; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 195a7d903cec..701e5dc75bb0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -113,13 +113,13 @@ static inline int mlx5e_get_dscp_up(struct mlx5e_priv *priv, struct sk_buff *skb u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev) { - int channel_ix = netdev_pick_tx(dev, skb, NULL); + int txq_ix = netdev_pick_tx(dev, skb, NULL); struct mlx5e_priv *priv = netdev_priv(dev); u16 num_channels; int up = 0; if (!netdev_get_num_tc(dev)) - return channel_ix; + return txq_ix; #ifdef CONFIG_MLX5_CORE_EN_DCB if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP) @@ -129,14 +129,14 @@ u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb, if (skb_vlan_tag_present(skb)) up = skb_vlan_tag_get_prio(skb); - /* channel_ix can be larger than num_channels since + /* txq_ix can be larger than num_channels since * dev->num_real_tx_queues = num_channels * num_tc */ num_channels = priv->channels.params.num_channels; - if (channel_ix >= num_channels) - channel_ix = reciprocal_scale(channel_ix, num_channels); + if (txq_ix >= num_channels) + txq_ix = priv->txq2sq[txq_ix]->ch_ix; - return priv->channel_tc2txq[channel_ix][up]; + return priv->channel_tc2txq[txq_ix][up]; } static inline int mlx5e_skb_l2_header_offset(struct sk_buff *skb) From b83c0730167c7ea6c03bffceefb86ae710ab30e2 Mon Sep 17 00:00:00 2001 From: Raed Salem Date: Sun, 2 Jun 2019 12:04:08 +0300 Subject: [PATCH 064/145] net/mlx5e: Fix source port matching in fdb peer flow rule The cited commit changed the initialization placement of the eswitch attributes so it is done prior to parse tc actions function call, including among others the in_rep and in_mdev fields which are mistakenly reassigned inside the parse actions function. This breaks the source port matching criteria of the peer redirect rule. Fix by removing the now redundant reassignment of the already initialized fields. Fixes: 988ab9c7363a ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper") Signed-off-by: Raed Salem Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 31cd02f11499..e40c60d1631f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2812,9 +2812,6 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, if (!flow_action_has_entries(flow_action)) return -EINVAL; - attr->in_rep = rpriv->rep; - attr->in_mdev = priv->mdev; - flow_action_for_each(i, act, flow_action) { switch (act->id) { case FLOW_ACTION_DROP: From 47c9d2c99ddeecca61c97618857b70fc7901658b Mon Sep 17 00:00:00 2001 From: Alaa Hleihel Date: Sun, 26 May 2019 11:56:27 +0300 Subject: [PATCH 065/145] net/mlx5e: Avoid detaching non-existing netdev under switchdev mode After introducing dedicated uplink representor, the netdev instance set over the esw manager vport (PF) became no longer in use, so it was removed in the cited commit once we're on switchdev mode. However, the mlx5e_detach function was not updated accordingly, and it still tries to detach a non-existing netdev, causing a kernel crash. This patch fixes this issue. Fixes: aec002f6f82c ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Alaa Hleihel Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 564692227c16..a8e8350b38aa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5108,6 +5108,11 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv) struct mlx5e_priv *priv = vpriv; struct net_device *netdev = priv->netdev; +#ifdef CONFIG_MLX5_ESWITCH + if (MLX5_ESWITCH_MANAGER(mdev) && vpriv == mdev) + return; +#endif + if (!netif_device_present(netdev)) return; From 45e7d4c0c1727d362012a62eb57254ea71a2d591 Mon Sep 17 00:00:00 2001 From: Eli Britstein Date: Sun, 2 Jun 2019 13:47:59 +0000 Subject: [PATCH 066/145] net/mlx5e: Support tagged tunnel over bond Stacked devices like bond interface may have a VLAN device on top of them. Detect lag state correctly under this condition, and return the correct routed net device, according to it the encap header is built. Fixes: e32ee6c78efa ("net/mlx5e: Support tunnel encap over tagged Ethernet") Signed-off-by: Eli Britstein Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index fe5d4d7f15ed..231e7cdfc6f7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -11,24 +11,25 @@ static int get_route_and_out_devs(struct mlx5e_priv *priv, struct net_device **route_dev, struct net_device **out_dev) { + struct net_device *uplink_dev, *uplink_upper, *real_dev; struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; - struct net_device *uplink_dev, *uplink_upper; bool dst_is_lag_dev; + real_dev = is_vlan_dev(dev) ? vlan_dev_real_dev(dev) : dev; uplink_dev = mlx5_eswitch_uplink_get_proto_dev(esw, REP_ETH); uplink_upper = netdev_master_upper_dev_get(uplink_dev); dst_is_lag_dev = (uplink_upper && netif_is_lag_master(uplink_upper) && - dev == uplink_upper && + real_dev == uplink_upper && mlx5_lag_is_sriov(priv->mdev)); /* if the egress device isn't on the same HW e-switch or * it's a LAG device, use the uplink */ - if (!netdev_port_same_parent_id(priv->netdev, dev) || + if (!netdev_port_same_parent_id(priv->netdev, real_dev) || dst_is_lag_dev) { - *route_dev = uplink_dev; - *out_dev = *route_dev; + *route_dev = dev; + *out_dev = uplink_dev; } else { *route_dev = dev; if (is_vlan_dev(*route_dev)) From c3fee640bcf52c34a25b767f2b0eda82e97a1f3b Mon Sep 17 00:00:00 2001 From: Enrico Weigelt Date: Thu, 6 Jun 2019 16:43:17 +0200 Subject: [PATCH 067/145] net: ipv4: fib_semantics: fix uninitialized variable fix an uninitialized variable: CC net/ipv4/fib_semantics.o net/ipv4/fib_semantics.c: In function 'fib_check_nh_v4_gw': net/ipv4/fib_semantics.c:1027:12: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] if (!tbl || err) { ^~ Signed-off-by: Enrico Weigelt Signed-off-by: David S. Miller --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index b80410673915..bfa49a88d03a 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -964,7 +964,7 @@ static int fib_check_nh_v4_gw(struct net *net, struct fib_nh *nh, u32 table, { struct net_device *dev; struct fib_result res; - int err; + int err = 0; if (nh->fib_nh_flags & RTNH_F_ONLINK) { unsigned int addr_type; From 65a3c497c0e965a552008db8bc2653f62bc925a1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 6 Jun 2019 14:32:34 -0700 Subject: [PATCH 068/145] ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero Before taking a refcount, make sure the object is not already scheduled for deletion. Same fix is needed in ipv6_flowlabel_opt() Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Signed-off-by: David S. Miller --- net/ipv6/ip6_flowlabel.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c index 2f3eb7dc45da..545e339b8c4f 100644 --- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -250,9 +250,9 @@ struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label) rcu_read_lock_bh(); for_each_sk_fl_rcu(np, sfl) { struct ip6_flowlabel *fl = sfl->fl; - if (fl->label == label) { + + if (fl->label == label && atomic_inc_not_zero(&fl->users)) { fl->lastuse = jiffies; - atomic_inc(&fl->users); rcu_read_unlock_bh(); return fl; } @@ -618,7 +618,8 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen) goto done; } fl1 = sfl->fl; - atomic_inc(&fl1->users); + if (!atomic_inc_not_zero(&fl1->users)) + fl1 = NULL; break; } } From a9520543b123bbd7275a0ab8d0375a5412683b41 Mon Sep 17 00:00:00 2001 From: Michael Schmitz Date: Fri, 7 Jun 2019 17:37:34 +1200 Subject: [PATCH 069/145] net: phy: rename Asix Electronics PHY driver [Resent to net instead of net-next - may clash with Anders Roxell's patch series addressing duplicate module names] Commit 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver") introduced a new PHY driver drivers/net/phy/asix.c that causes a module name conflict with a pre-existiting driver (drivers/net/usb/asix.c). The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded by that driver via its PHY ID. A rename of the driver looks unproblematic. Rename PHY driver to ax88796b.c in order to resolve name conflict. Signed-off-by: Michael Schmitz Tested-by: Michael Schmitz Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver") Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- drivers/net/ethernet/8390/Kconfig | 2 +- drivers/net/phy/Kconfig | 2 +- drivers/net/phy/Makefile | 2 +- drivers/net/phy/{asix.c => ax88796b.c} | 0 4 files changed, 3 insertions(+), 3 deletions(-) rename drivers/net/phy/{asix.c => ax88796b.c} (100%) diff --git a/drivers/net/ethernet/8390/Kconfig b/drivers/net/ethernet/8390/Kconfig index bb09319feedf..2a3e2450968e 100644 --- a/drivers/net/ethernet/8390/Kconfig +++ b/drivers/net/ethernet/8390/Kconfig @@ -50,7 +50,7 @@ config XSURF100 tristate "Amiga XSurf 100 AX88796/NE2000 clone support" depends on ZORRO select AX88796 - select ASIX_PHY + select AX88796B_PHY help This driver is for the Individual Computers X-Surf 100 Ethernet card (based on the Asix AX88796 chip). If you have such a card, diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index f99f27800fdb..1d406c6df790 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -254,7 +254,7 @@ config AQUANTIA_PHY ---help--- Currently supports the Aquantia AQ1202, AQ2104, AQR105, AQR405 -config ASIX_PHY +config AX88796B_PHY tristate "Asix PHYs" help Currently supports the Asix Electronics PHY found in the X-Surf 100 diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 27d7f9f3b0de..5b5c8669499e 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -52,7 +52,7 @@ ifdef CONFIG_HWMON aquantia-objs += aquantia_hwmon.o endif obj-$(CONFIG_AQUANTIA_PHY) += aquantia.o -obj-$(CONFIG_ASIX_PHY) += asix.o +obj-$(CONFIG_AX88796B_PHY) += ax88796b.o obj-$(CONFIG_AT803X_PHY) += at803x.o obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o obj-$(CONFIG_BCM7XXX_PHY) += bcm7xxx.o diff --git a/drivers/net/phy/asix.c b/drivers/net/phy/ax88796b.c similarity index 100% rename from drivers/net/phy/asix.c rename to drivers/net/phy/ax88796b.c From 2f3f7d1fa0d1039b24a55d127ed190f196fc3e79 Mon Sep 17 00:00:00 2001 From: George Wilkie Date: Fri, 7 Jun 2019 11:49:41 +0100 Subject: [PATCH 070/145] mpls: fix warning with multi-label encap If you configure a route with multiple labels, e.g. ip route add 10.10.3.0/24 encap mpls 16/100 via 10.10.2.2 dev ens4 A warning is logged: kernel: [ 130.561819] netlink: 'ip': attribute type 1 has an invalid length. This happens because mpls_iptunnel_policy has set the type of MPLS_IPTUNNEL_DST to fixed size NLA_U32. Change it to a minimum size. nla_get_labels() does the remaining validation. Fixes: e3e4712ec096 ("mpls: ip tunnel support") Signed-off-by: George Wilkie Reviewed-by: David Ahern Signed-off-by: David S. Miller --- net/mpls/mpls_iptunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index 500596130760..d25e91d7bdc1 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -23,7 +23,7 @@ #include "internal.h" static const struct nla_policy mpls_iptunnel_policy[MPLS_IPTUNNEL_MAX + 1] = { - [MPLS_IPTUNNEL_DST] = { .type = NLA_U32 }, + [MPLS_IPTUNNEL_DST] = { .len = sizeof(u32) }, [MPLS_IPTUNNEL_TTL] = { .type = NLA_U8 }, }; From 1f94608b0ce141be5286dde31270590bdf35b86a Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Fri, 7 Jun 2019 16:03:53 -0500 Subject: [PATCH 071/145] ibmvnic: Do not close unopened driver during reset Check driver state before halting it during a reset. If the driver is not running, do nothing. Otherwise, a request to deactivate a down link can cause an error and the reset will fail. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller --- drivers/net/ethernet/ibm/ibmvnic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 3da392bfd659..bc2a91205eec 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1745,7 +1745,8 @@ static int do_reset(struct ibmvnic_adapter *adapter, ibmvnic_cleanup(netdev); - if (adapter->reset_reason != VNIC_RESET_MOBILITY && + if (reset_state == VNIC_OPEN && + adapter->reset_reason != VNIC_RESET_MOBILITY && adapter->reset_reason != VNIC_RESET_FAILOVER) { rc = __ibmvnic_close(netdev); if (rc) From be32a24372cf162e825332da1a7ccef058d4f20b Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Fri, 7 Jun 2019 16:03:54 -0500 Subject: [PATCH 072/145] ibmvnic: Refresh device multicast list after reset It was observed that multicast packets were no longer received after a device reset. The fix is to resend the current multicast list to the backing device after recovery. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller --- drivers/net/ethernet/ibm/ibmvnic.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index bc2a91205eec..9e9f4096db58 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1845,6 +1845,9 @@ static int do_reset(struct ibmvnic_adapter *adapter, return 0; } + /* refresh device's multicast list */ + ibmvnic_set_multi(netdev); + /* kick napi */ for (i = 0; i < adapter->req_rx_queues; i++) napi_schedule(&adapter->napi[i]); From 7c940b1a5291e5069d561f5b8f0e51db6b7a259a Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Fri, 7 Jun 2019 16:03:55 -0500 Subject: [PATCH 073/145] ibmvnic: Fix unchecked return codes of memory allocations The return values for these memory allocations are unchecked, which may cause an oops if the driver does not handle them after a failure. Fix by checking the function's return code. Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller --- drivers/net/ethernet/ibm/ibmvnic.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 9e9f4096db58..3da680073265 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -428,9 +428,10 @@ static int reset_rx_pools(struct ibmvnic_adapter *adapter) if (rx_pool->buff_size != be64_to_cpu(size_array[i])) { free_long_term_buff(adapter, &rx_pool->long_term_buff); rx_pool->buff_size = be64_to_cpu(size_array[i]); - alloc_long_term_buff(adapter, &rx_pool->long_term_buff, - rx_pool->size * - rx_pool->buff_size); + rc = alloc_long_term_buff(adapter, + &rx_pool->long_term_buff, + rx_pool->size * + rx_pool->buff_size); } else { rc = reset_long_term_buff(adapter, &rx_pool->long_term_buff); @@ -696,9 +697,9 @@ static int init_tx_pools(struct net_device *netdev) return rc; } - init_one_tx_pool(netdev, &adapter->tso_pool[i], - IBMVNIC_TSO_BUFS, - IBMVNIC_TSO_BUF_SZ); + rc = init_one_tx_pool(netdev, &adapter->tso_pool[i], + IBMVNIC_TSO_BUFS, + IBMVNIC_TSO_BUF_SZ); if (rc) { release_tx_pools(adapter); return rc; From c1a9d65954c68e13a6adc0225b0d38188fff68ca Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Sat, 8 Jun 2019 14:50:19 +0200 Subject: [PATCH 074/145] mpls: fix af_mpls dependencies MPLS routing code relies on sysctl to work, so let it select PROC_SYSCTL. Reported-by: Randy Dunlap Suggested-by: David Ahern Signed-off-by: Matteo Croce Signed-off-by: David S. Miller --- net/mpls/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig index d9391beea980..2b802a48d5a6 100644 --- a/net/mpls/Kconfig +++ b/net/mpls/Kconfig @@ -26,6 +26,7 @@ config NET_MPLS_GSO config MPLS_ROUTING tristate "MPLS: routing support" depends on NET_IP_TUNNEL || NET_IP_TUNNEL=n + select PROC_SYSCTL ---help--- Add support for forwarding of mpls packets. From fcc2202a9d6e4578aca1af4f1954f61dc986ef74 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Fri, 7 Jun 2019 18:26:33 -0700 Subject: [PATCH 075/145] tcp: fix undo spurious SYNACK in passive Fast Open Commit 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") may cause tcp_fastretrans_alert() to warn about pending retransmission in Open state. This is triggered when the Fast Open server both sends data and has spurious SYNACK retransmission during the handshake, and the data packets were lost or reordered. The root cause is a bit complicated: (1) Upon receiving SYN-data: a full socket is created with snd_una = ISN + 1 by tcp_create_openreq_child() (2) On SYNACK timeout the server/sender enters CA_Loss state. (3) Upon receiving the final ACK to complete the handshake, sender does not mark FLAG_SND_UNA_ADVANCED since (1) Sender then calls tcp_process_loss since state is CA_loss by (2) (4) tcp_process_loss() does not invoke undo operations but instead mark REXMIT_LOST to force retransmission (5) tcp_rcv_synrecv_state_fastopen() calls tcp_try_undo_loss(). It changes state to CA_Open but has positive tp->retrans_out (6) Next ACK triggers the WARN_ON in tcp_fastretrans_alert() The step that goes wrong is (4) where the undo operation should have been invoked because the ACK successfully acknowledged the SYN sequence. This fixes that by specifically checking undo when the SYN-ACK sequence is acknowledged. Then after tcp_process_loss() the state would be further adjusted based in tcp_fastretrans_alert() to avoid triggering the warning in (6). Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 08a477e74cf3..38dfc308c0fb 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2648,7 +2648,7 @@ static void tcp_process_loss(struct sock *sk, int flag, int num_dupack, struct tcp_sock *tp = tcp_sk(sk); bool recovered = !before(tp->snd_una, tp->high_seq); - if ((flag & FLAG_SND_UNA_ADVANCED) && + if ((flag & FLAG_SND_UNA_ADVANCED || tp->fastopen_rsk) && tcp_try_undo_loss(sk, false)) return; From dce5ccccd1231c6eaec5ede80bce85f2ae536826 Mon Sep 17 00:00:00 2001 From: John Hurley Date: Sat, 8 Jun 2019 17:48:03 -0700 Subject: [PATCH 076/145] nfp: ensure skb network header is set for packet redirect Packets received at the NFP driver may be redirected to egress of another netdev (e.g. in the case of OvS internal ports). On the egress path, some processes, like TC egress hooks, may expect the network header offset field in the skb to be correctly set. If this is not the case there is potential for abnormal behaviour and even the triggering of BUG() calls. Set the skb network header field before the mac header pull when doing a packet redirect. Fixes: 27f54b582567 ("nfp: allow fallback packets from non-reprs") Signed-off-by: John Hurley Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller --- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index b82b684f52ce..36a3bd30cfd9 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -1867,6 +1867,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget) napi_gro_receive(&rx_ring->r_vec->napi, skb); } else { skb->dev = netdev; + skb_reset_network_header(skb); __skb_push(skb, ETH_HLEN); dev_queue_xmit(skb); } From da2577fdd0932ea4eefe73903f1130ee366767d2 Mon Sep 17 00:00:00 2001 From: Jonathan Lemon Date: Sat, 8 Jun 2019 12:54:19 -0700 Subject: [PATCH 077/145] bpf: lpm_trie: check left child of last leftmost node for NULL If the leftmost parent node of the tree has does not have a child on the left side, then trie_get_next_key (and bpftool map dump) will not look at the child on the right. This leads to the traversal missing elements. Lookup is not affected. Update selftest to handle this case. Reproducer: bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \ value 1 entries 256 name test_lpm flags 1 bpftool map update pinned /sys/fs/bpf/lpm key 8 0 0 0 0 0 value 1 bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0 0 128 value 2 bpftool map dump pinned /sys/fs/bpf/lpm Returns only 1 element. (2 expected) Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE") Signed-off-by: Jonathan Lemon Acked-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann --- kernel/bpf/lpm_trie.c | 9 +++-- tools/testing/selftests/bpf/test_lpm_map.c | 41 ++++++++++++++++++++-- 2 files changed, 45 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index e61630c2e50b..864e2a496376 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -716,9 +716,14 @@ find_leftmost: * have exact two children, so this function will never return NULL. */ for (node = search_root; node;) { - if (!(node->flags & LPM_TREE_NODE_FLAG_IM)) + if (node->flags & LPM_TREE_NODE_FLAG_IM) { + node = rcu_dereference(node->child[0]); + } else { next_node = node; - node = rcu_dereference(node->child[0]); + node = rcu_dereference(node->child[0]); + if (!node) + node = rcu_dereference(next_node->child[1]); + } } do_copy: next_key->prefixlen = next_node->prefixlen; diff --git a/tools/testing/selftests/bpf/test_lpm_map.c b/tools/testing/selftests/bpf/test_lpm_map.c index 02d7c871862a..006be3963977 100644 --- a/tools/testing/selftests/bpf/test_lpm_map.c +++ b/tools/testing/selftests/bpf/test_lpm_map.c @@ -573,13 +573,13 @@ static void test_lpm_get_next_key(void) /* add one more element (total two) */ key_p->prefixlen = 24; - inet_pton(AF_INET, "192.168.0.0", key_p->data); + inet_pton(AF_INET, "192.168.128.0", key_p->data); assert(bpf_map_update_elem(map_fd, key_p, &value, 0) == 0); memset(key_p, 0, key_size); assert(bpf_map_get_next_key(map_fd, NULL, key_p) == 0); assert(key_p->prefixlen == 24 && key_p->data[0] == 192 && - key_p->data[1] == 168 && key_p->data[2] == 0); + key_p->data[1] == 168 && key_p->data[2] == 128); memset(next_key_p, 0, key_size); assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == 0); @@ -592,7 +592,7 @@ static void test_lpm_get_next_key(void) /* Add one more element (total three) */ key_p->prefixlen = 24; - inet_pton(AF_INET, "192.168.128.0", key_p->data); + inet_pton(AF_INET, "192.168.0.0", key_p->data); assert(bpf_map_update_elem(map_fd, key_p, &value, 0) == 0); memset(key_p, 0, key_size); @@ -643,6 +643,41 @@ static void test_lpm_get_next_key(void) assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == -1 && errno == ENOENT); + /* Add one more element (total five) */ + key_p->prefixlen = 28; + inet_pton(AF_INET, "192.168.1.128", key_p->data); + assert(bpf_map_update_elem(map_fd, key_p, &value, 0) == 0); + + memset(key_p, 0, key_size); + assert(bpf_map_get_next_key(map_fd, NULL, key_p) == 0); + assert(key_p->prefixlen == 24 && key_p->data[0] == 192 && + key_p->data[1] == 168 && key_p->data[2] == 0); + + memset(next_key_p, 0, key_size); + assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == 0); + assert(next_key_p->prefixlen == 28 && next_key_p->data[0] == 192 && + next_key_p->data[1] == 168 && next_key_p->data[2] == 1 && + next_key_p->data[3] == 128); + + memcpy(key_p, next_key_p, key_size); + assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == 0); + assert(next_key_p->prefixlen == 24 && next_key_p->data[0] == 192 && + next_key_p->data[1] == 168 && next_key_p->data[2] == 1); + + memcpy(key_p, next_key_p, key_size); + assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == 0); + assert(next_key_p->prefixlen == 24 && next_key_p->data[0] == 192 && + next_key_p->data[1] == 168 && next_key_p->data[2] == 128); + + memcpy(key_p, next_key_p, key_size); + assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == 0); + assert(next_key_p->prefixlen == 16 && next_key_p->data[0] == 192 && + next_key_p->data[1] == 168); + + memcpy(key_p, next_key_p, key_size); + assert(bpf_map_get_next_key(map_fd, key_p, next_key_p) == -1 && + errno == ENOENT); + /* no exact matching key should return the first one in post order */ key_p->prefixlen = 22; inet_pton(AF_INET, "192.168.1.0", key_p->data); From 522924b583082f51b8a2406624a2f27c22119b20 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Fri, 7 Jun 2019 17:57:48 -0400 Subject: [PATCH 078/145] net: correct udp zerocopy refcnt also when zerocopy only on append The below patch fixes an incorrect zerocopy refcnt increment when appending with MSG_MORE to an existing zerocopy udp skb. send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt 1 send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt still 1 (bar frags) But it missed that zerocopy need not be passed at the first send. The right test whether the uarg is newly allocated and thus has extra refcnt 1 is not !skb, but !skb_zcopy. send(.., MSG_MORE); // send(.., MSG_ZEROCOPY); // refcnt 1 Fixes: 100f6d8e09905 ("net: correct zerocopy refcnt with udp MSG_MORE") Reported-by: syzbot Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller --- net/ipv4/ip_output.c | 2 +- net/ipv6/ip6_output.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 8c9189a41b13..16f9159234a2 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -918,7 +918,7 @@ static int __ip_append_data(struct sock *sk, uarg = sock_zerocopy_realloc(sk, length, skb_zcopy(skb)); if (!uarg) return -ENOBUFS; - extra_uref = !skb; /* only extra ref if !MSG_MORE */ + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 934c88f128ab..834475717110 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1340,7 +1340,7 @@ emsgsize: uarg = sock_zerocopy_realloc(sk, length, skb_zcopy(skb)); if (!uarg) return -ENOBUFS; - extra_uref = !skb; /* only extra ref if !MSG_MORE */ + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; From 309b66970ee2abf721ecd0876a48940fa0b99a35 Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Sun, 9 Jun 2019 23:26:21 +0900 Subject: [PATCH 079/145] net: openvswitch: do not free vport if register_netdevice() is failed. In order to create an internal vport, internal_dev_create() is used and that calls register_netdevice() internally. If register_netdevice() fails, it calls dev->priv_destructor() to free private data of netdev. actually, a private data of this is a vport. Hence internal_dev_create() should not free and use a vport after failure of register_netdevice(). Test command ovs-dpctl add-dp bonding_masters Splat looks like: [ 1035.667767] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1035.675958] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1035.676916] CPU: 1 PID: 1028 Comm: ovs-vswitchd Tainted: G B 5.2.0-rc3+ #240 [ 1035.676916] RIP: 0010:internal_dev_create+0x2e5/0x4e0 [openvswitch] [ 1035.676916] Code: 48 c1 ea 03 80 3c 02 00 0f 85 9f 01 00 00 4c 8b 23 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 60 05 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 86 01 00 00 49 8b bc 24 60 05 00 00 e8 e4 68 f4 [ 1035.713720] RSP: 0018:ffff88810dcb7578 EFLAGS: 00010206 [ 1035.713720] RAX: dffffc0000000000 RBX: ffff88810d13fe08 RCX: ffffffff84297704 [ 1035.713720] RDX: 00000000000000ac RSI: 0000000000000000 RDI: 0000000000000560 [ 1035.713720] RBP: 00000000ffffffef R08: fffffbfff0d3b881 R09: fffffbfff0d3b881 [ 1035.713720] R10: 0000000000000001 R11: fffffbfff0d3b880 R12: 0000000000000000 [ 1035.768776] R13: 0000607ee460b900 R14: ffff88810dcb7690 R15: ffff88810dcb7698 [ 1035.777709] FS: 00007f02095fc980(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 [ 1035.777709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1035.777709] CR2: 00007ffdf01d2f28 CR3: 0000000108258000 CR4: 00000000001006e0 [ 1035.777709] Call Trace: [ 1035.777709] ovs_vport_add+0x267/0x4f0 [openvswitch] [ 1035.777709] new_vport+0x15/0x1e0 [openvswitch] [ 1035.777709] ovs_vport_cmd_new+0x567/0xd10 [openvswitch] [ 1035.777709] ? ovs_dp_cmd_dump+0x490/0x490 [openvswitch] [ 1035.777709] ? __kmalloc+0x131/0x2e0 [ 1035.777709] ? genl_family_rcv_msg+0xa54/0x1030 [ 1035.777709] genl_family_rcv_msg+0x63a/0x1030 [ 1035.777709] ? genl_unregister_family+0x630/0x630 [ 1035.841681] ? debug_show_all_locks+0x2d0/0x2d0 [ ... ] Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Taehee Yoo Reviewed-by: Greg Rose Signed-off-by: David S. Miller --- net/openvswitch/vport-internal_dev.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 26f71cbf7527..5993405c25c1 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -170,7 +170,9 @@ static struct vport *internal_dev_create(const struct vport_parms *parms) { struct vport *vport; struct internal_dev *internal_dev; + struct net_device *dev; int err; + bool free_vport = true; vport = ovs_vport_alloc(0, &ovs_internal_vport_ops, parms); if (IS_ERR(vport)) { @@ -178,8 +180,9 @@ static struct vport *internal_dev_create(const struct vport_parms *parms) goto error; } - vport->dev = alloc_netdev(sizeof(struct internal_dev), - parms->name, NET_NAME_USER, do_setup); + dev = alloc_netdev(sizeof(struct internal_dev), + parms->name, NET_NAME_USER, do_setup); + vport->dev = dev; if (!vport->dev) { err = -ENOMEM; goto error_free_vport; @@ -200,8 +203,10 @@ static struct vport *internal_dev_create(const struct vport_parms *parms) rtnl_lock(); err = register_netdevice(vport->dev); - if (err) + if (err) { + free_vport = false; goto error_unlock; + } dev_set_promiscuity(vport->dev, 1); rtnl_unlock(); @@ -211,11 +216,12 @@ static struct vport *internal_dev_create(const struct vport_parms *parms) error_unlock: rtnl_unlock(); - free_percpu(vport->dev->tstats); + free_percpu(dev->tstats); error_free_netdev: - free_netdev(vport->dev); + free_netdev(dev); error_free_vport: - ovs_vport_free(vport); + if (free_vport) + ovs_vport_free(vport); error: return ERR_PTR(err); } From 8399a6930d12f5965230f4ff058228a4cc80c0b9 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Tue, 11 Jun 2019 00:27:05 +0200 Subject: [PATCH 080/145] vxlan: Don't assume linear buffers in error handler In commit c3a43b9fec8a ("vxlan: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits 26fc181e6cac ("fou, fou6: do not assume linear skbs") and 5355ed6388e2 ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the VXLAN header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault Fixes: c3a43b9fec8a ("vxlan: ICMP error lookup handler") Signed-off-by: Stefano Brivio Signed-off-by: David S. Miller --- drivers/net/vxlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 5994d5415a03..4c9bc29fe3d5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1766,7 +1766,7 @@ static int vxlan_err_lookup(struct sock *sk, struct sk_buff *skb) struct vxlanhdr *hdr; __be32 vni; - if (skb->len < VXLAN_HLEN) + if (!pskb_may_pull(skb, skb_transport_offset(skb) + VXLAN_HLEN)) return -EINVAL; hdr = vxlan_hdr(skb); From eccc73a6b2cb6c04bfbc40a0769f3c428dfba232 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Tue, 11 Jun 2019 00:27:06 +0200 Subject: [PATCH 081/145] geneve: Don't assume linear buffers in error handler In commit a07966447f39 ("geneve: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits 26fc181e6cac ("fou, fou6: do not assume linear skbs") and 5355ed6388e2 ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the GENEVE header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault Fixes: a07966447f39 ("geneve: ICMP error lookup handler") Signed-off-by: Stefano Brivio Signed-off-by: David S. Miller --- drivers/net/geneve.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 98d1a45c0606..25770122c219 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -395,7 +395,7 @@ static int geneve_udp_encap_err_lookup(struct sock *sk, struct sk_buff *skb) u8 zero_vni[3] = { 0 }; u8 *vni = zero_vni; - if (skb->len < GENEVE_BASE_HLEN) + if (!pskb_may_pull(skb, skb_transport_offset(skb) + GENEVE_BASE_HLEN)) return -EINVAL; geneveh = geneve_hdr(skb); From f12dd75959b0138f94da8ddcf43f2f3cf277216d Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 11 Jun 2019 14:45:57 -0700 Subject: [PATCH 082/145] bpf: net: Set sk_bpf_storage back to NULL for cloned sk The cloned sk should not carry its parent-listener's sk_bpf_storage. This patch fixes it by setting it back to NULL. Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Signed-off-by: Martin KaFai Lau Acked-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann --- net/core/sock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/sock.c b/net/core/sock.c index 2b3701958486..d90fd04622e5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1850,6 +1850,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) goto out; } RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL); +#ifdef CONFIG_BPF_SYSCALL + RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL); +#endif newsk->sk_err = 0; newsk->sk_err_soft = 0; From 01d76b5317003e019ace561a9b775f51aafdfdc4 Mon Sep 17 00:00:00 2001 From: Ilya Maximets Date: Fri, 7 Jun 2019 20:27:32 +0300 Subject: [PATCH 083/145] xdp: check device pointer before clearing We should not call 'ndo_bpf()' or 'dev_put()' with NULL argument. Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: Ilya Maximets Acked-by: Jonathan Lemon Signed-off-by: Daniel Borkmann --- net/xdp/xdp_umem.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 2b18223e7eb8..9c6de4f114f8 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -143,6 +143,9 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) struct netdev_bpf bpf; int err; + if (!umem->dev) + return; + if (umem->zc) { bpf.command = XDP_SETUP_XSK_UMEM; bpf.xsk.umem = NULL; @@ -156,11 +159,9 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) WARN(1, "failed to disable umem!\n"); } - if (umem->dev) { - rtnl_lock(); - xdp_clear_umem_at_qid(umem->dev, umem->queue_id); - rtnl_unlock(); - } + rtnl_lock(); + xdp_clear_umem_at_qid(umem->dev, umem->queue_id); + rtnl_unlock(); if (umem->zc) { dev_put(umem->dev); From ec66854c832c95f612756192d42ac1158593800f Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Wed, 12 Jun 2019 11:50:37 +0200 Subject: [PATCH 084/145] mpls: fix af_mpls dependencies for real Randy reported that selecting MPLS_ROUTING without PROC_FS breaks the build, because since commit c1a9d65954c6 ("mpls: fix af_mpls dependencies"), MPLS_ROUTING selects PROC_SYSCTL, but Kconfig's select doesn't recursively handle dependencies. Change the select into a dependency. Fixes: c1a9d65954c6 ("mpls: fix af_mpls dependencies") Reported-by: Randy Dunlap Signed-off-by: Matteo Croce Signed-off-by: David S. Miller --- net/mpls/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig index 2b802a48d5a6..d1ad69b7942a 100644 --- a/net/mpls/Kconfig +++ b/net/mpls/Kconfig @@ -26,7 +26,7 @@ config NET_MPLS_GSO config MPLS_ROUTING tristate "MPLS: routing support" depends on NET_IP_TUNNEL || NET_IP_TUNNEL=n - select PROC_SYSCTL + depends on PROC_SYSCTL ---help--- Add support for forwarding of mpls packets. From bb2e05e0c8dcc2c93a97410ce362da998b83155f Mon Sep 17 00:00:00 2001 From: Masanari Iida Date: Wed, 12 Jun 2019 21:29:34 +0900 Subject: [PATCH 085/145] linux-next: DOC: RDS: Fix a typo in rds.txt This patch fixes a spelling typo in rds.txt Signed-off-by: Masanari Iida Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- Documentation/networking/rds.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt index 0235ae69af2a..f2a0147c933d 100644 --- a/Documentation/networking/rds.txt +++ b/Documentation/networking/rds.txt @@ -389,7 +389,7 @@ Multipath RDS (mprds) a common (to all paths) part, and a per-path struct rds_conn_path. All I/O workqs and reconnect threads are driven from the rds_conn_path. Transports such as TCP that are multipath capable may then set up a - TPC socket per rds_conn_path, and this is managed by the transport via + TCP socket per rds_conn_path, and this is managed by the transport via the transport privatee cp_transport_data pointer. Transports announce themselves as multipath capable by setting the From f0d2ca1531377e7da888913e277eefac05a59b6f Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Wed, 12 Jun 2019 17:18:38 +0200 Subject: [PATCH 086/145] net: ethtool: Allow matching on vlan DEI bit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Using ethtool, users can specify a classification action matching on the full vlan tag, which includes the DEI bit (also previously called CFI). However, when converting the ethool_flow_spec to a flow_rule, we use dissector keys to represent the matching patterns. Since the vlan dissector key doesn't include the DEI bit, this information was silently discarded when translating the ethtool flow spec in to a flow_rule. This commit adds the DEI bit into the vlan dissector key, and allows propagating the information to the driver when parsing the ethtool flow spec. Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Reported-by: Michał Mirosław Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller --- include/net/flow_dissector.h | 1 + net/core/ethtool.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 7c5a8d9a8d2a..dfabc0503446 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -46,6 +46,7 @@ struct flow_dissector_key_tags { struct flow_dissector_key_vlan { u16 vlan_id:12, + vlan_dei:1, vlan_priority:3; __be16 vlan_tpid; }; diff --git a/net/core/ethtool.c b/net/core/ethtool.c index d08b1e19ce9c..4d1011b2e24f 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -3020,6 +3020,11 @@ ethtool_rx_flow_rule_create(const struct ethtool_rx_flow_spec_input *input) match->mask.vlan.vlan_id = ntohs(ext_m_spec->vlan_tci) & 0x0fff; + match->key.vlan.vlan_dei = + !!(ext_h_spec->vlan_tci & htons(0x1000)); + match->mask.vlan.vlan_dei = + !!(ext_m_spec->vlan_tci & htons(0x1000)); + match->key.vlan.vlan_priority = (ntohs(ext_h_spec->vlan_tci) & 0xe000) >> 13; match->mask.vlan.vlan_priority = From e1ae5c2ea4783b1fd87be250f9fcc9d9e1a6ba3f Mon Sep 17 00:00:00 2001 From: Stephen Suryaputra Date: Mon, 10 Jun 2019 10:32:50 -0400 Subject: [PATCH 087/145] vrf: Increment Icmp6InMsgs on the original netdev Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/addrconf.h | 16 ++++++++++++++++ net/ipv6/icmp.c | 17 +++++++++++------ net/ipv6/reassembly.c | 4 ++-- 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 2f67ae854ff0..becdad576859 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -309,6 +309,22 @@ static inline struct inet6_dev *__in6_dev_get(const struct net_device *dev) return rcu_dereference_rtnl(dev->ip6_ptr); } +/** + * __in6_dev_stats_get - get inet6_dev pointer for stats + * @dev: network device + * @skb: skb for original incoming interface if neeeded + * + * Caller must hold rcu_read_lock or RTNL, because this function + * does not take a reference on the inet6_dev. + */ +static inline struct inet6_dev *__in6_dev_stats_get(const struct net_device *dev, + const struct sk_buff *skb) +{ + if (netif_is_l3_master(dev)) + dev = dev_get_by_index_rcu(dev_net(dev), inet6_iif(skb)); + return __in6_dev_get(dev); +} + /** * __in6_dev_get_safely - get inet6_dev pointer from netdevice * @dev: network device diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index bafdd04a768d..375b4b4f9bf5 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -393,23 +393,28 @@ relookup_failed: return ERR_PTR(err); } -static int icmp6_iif(const struct sk_buff *skb) +static struct net_device *icmp6_dev(const struct sk_buff *skb) { - int iif = skb->dev->ifindex; + struct net_device *dev = skb->dev; /* for local traffic to local address, skb dev is the loopback * device. Check if there is a dst attached to the skb and if so * get the real device index. Same is needed for replies to a link * local address on a device enslaved to an L3 master device */ - if (unlikely(iif == LOOPBACK_IFINDEX || netif_is_l3_master(skb->dev))) { + if (unlikely(dev->ifindex == LOOPBACK_IFINDEX || netif_is_l3_master(skb->dev))) { const struct rt6_info *rt6 = skb_rt6_info(skb); if (rt6) - iif = rt6->rt6i_idev->dev->ifindex; + dev = rt6->rt6i_idev->dev; } - return iif; + return dev; +} + +static int icmp6_iif(const struct sk_buff *skb) +{ + return icmp6_dev(skb)->ifindex; } /* @@ -810,7 +815,7 @@ out: static int icmpv6_rcv(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); - struct net_device *dev = skb->dev; + struct net_device *dev = icmp6_dev(skb); struct inet6_dev *idev = __in6_dev_get(dev); const struct in6_addr *saddr, *daddr; struct icmp6hdr *hdr; diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 22369694b2fb..b2b2c0c38b87 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -298,7 +298,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, skb_network_header_len(skb)); rcu_read_lock(); - __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS); + __IP6_INC_STATS(net, __in6_dev_stats_get(dev, skb), IPSTATS_MIB_REASMOKS); rcu_read_unlock(); fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; @@ -312,7 +312,7 @@ out_oom: net_dbg_ratelimited("ip6_frag_reasm: no memory for reassembly\n"); out_fail: rcu_read_lock(); - __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS); + __IP6_INC_STATS(net, __in6_dev_stats_get(dev, skb), IPSTATS_MIB_REASMFAILS); rcu_read_unlock(); inet_frag_kill(&fq->q); return -1; From 648ee6cea7dde4a5cdf817e5d964fd60b22006a4 Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Wed, 12 Jun 2019 17:23:57 +0000 Subject: [PATCH 088/145] net: tls, correctly account for copied bytes with multiple sk_msgs tls_sw_do_sendpage needs to return the total number of bytes sent regardless of how many sk_msgs are allocated. Unfortunately, copied (the value we return up the stack) is zero'd before each new sk_msg is allocated so we only return the copied size of the last sk_msg used. The caller (splice, etc.) of sendpage will then believe only part of its data was sent and send the missing chunks again. However, because the data actually was sent the receiver will get multiple copies of the same data. To reproduce this do multiple sendfile calls with a length close to the max record size. This will in turn call splice/sendpage, sendpage may use multiple sk_msg in this case and then returns the incorrect number of bytes. This will cause splice to resend creating duplicate data on the receiver. Andre created a C program that can easily generate this case so we will push a similar selftest for this to bpf-next shortly. The fix is to _not_ zero the copied field so that the total sent bytes is returned. Reported-by: Steinar H. Gunderson Reported-by: Andre Tomt Tested-by: Andre Tomt Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend Signed-off-by: David S. Miller --- net/tls/tls_sw.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 960494f437ac..455a782c7658 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1143,7 +1143,6 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, full_record = false; record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; - copied = 0; copy = size; if (copy >= record_room) { copy = record_room; From ee02c26993262c96cc39d0f831f7589b10a44fd7 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 11 Jun 2019 10:19:40 +0300 Subject: [PATCH 089/145] mlxsw: spectrum: Use different seeds for ECMP and LAG hash The same hash function and seed are used for both ECMP and LAG hash. Therefore, when a LAG device is used as a nexthop device as part of an ECMP group, hash polarization can occur and all the traffic will be hashed to a single LAG slave. Fix this by using a different seed for the LAG hash. Fixes: fa73989f2697 ("mlxsw: spectrum: Use a stable ECMP/LAG seed") Signed-off-by: Ido Schimmel Reported-by: Alex Veber Tested-by: Alex Veber Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index dfe6b44baf63..23204356ad88 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -4280,13 +4280,16 @@ static void mlxsw_sp_traps_fini(struct mlxsw_sp *mlxsw_sp) } } +#define MLXSW_SP_LAG_SEED_INIT 0xcafecafe + static int mlxsw_sp_lag_init(struct mlxsw_sp *mlxsw_sp) { char slcr_pl[MLXSW_REG_SLCR_LEN]; u32 seed; int err; - seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), 0); + seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), + MLXSW_SP_LAG_SEED_INIT); mlxsw_reg_slcr_pack(slcr_pl, MLXSW_REG_SLCR_LAG_HASH_SMAC | MLXSW_REG_SLCR_LAG_HASH_DMAC | MLXSW_REG_SLCR_LAG_HASH_ETHERTYPE | From 83d5782681cc12b3d485a83cb34c46b2445f510c Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 11 Jun 2019 10:19:41 +0300 Subject: [PATCH 090/145] mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead The driver tries to periodically refresh neighbours that are used to reach nexthops. This is done by periodically calling neigh_event_send(). However, if the neighbour becomes dead, there is nothing we can do to return it to a connected state and the above function call is basically a NOP. This results in the nexthop never being written to the device's adjacency table and therefore never used to forward packets. Fix this by dropping our reference from the dead neighbour and associating the nexthop with a new neigbhour which we will try to refresh. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel Reported-by: Alex Veber Tested-by: Alex Veber Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- .../ethernet/mellanox/mlxsw/spectrum_router.c | 73 ++++++++++++++++++- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 1cda8a248b12..ef554739dd54 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -2363,7 +2363,7 @@ static void mlxsw_sp_router_probe_unresolved_nexthops(struct work_struct *work) static void mlxsw_sp_nexthop_neigh_update(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_neigh_entry *neigh_entry, - bool removing); + bool removing, bool dead); static enum mlxsw_reg_rauht_op mlxsw_sp_rauht_op(bool adding) { @@ -2507,7 +2507,8 @@ static void mlxsw_sp_router_neigh_event_work(struct work_struct *work) memcpy(neigh_entry->ha, ha, ETH_ALEN); mlxsw_sp_neigh_entry_update(mlxsw_sp, neigh_entry, entry_connected); - mlxsw_sp_nexthop_neigh_update(mlxsw_sp, neigh_entry, !entry_connected); + mlxsw_sp_nexthop_neigh_update(mlxsw_sp, neigh_entry, !entry_connected, + dead); if (!neigh_entry->connected && list_empty(&neigh_entry->nexthop_list)) mlxsw_sp_neigh_entry_destroy(mlxsw_sp, neigh_entry); @@ -3472,13 +3473,79 @@ static void __mlxsw_sp_nexthop_neigh_update(struct mlxsw_sp_nexthop *nh, nh->update = 1; } +static int +mlxsw_sp_nexthop_dead_neigh_replace(struct mlxsw_sp *mlxsw_sp, + struct mlxsw_sp_neigh_entry *neigh_entry) +{ + struct neighbour *n, *old_n = neigh_entry->key.n; + struct mlxsw_sp_nexthop *nh; + bool entry_connected; + u8 nud_state, dead; + int err; + + nh = list_first_entry(&neigh_entry->nexthop_list, + struct mlxsw_sp_nexthop, neigh_list_node); + + n = neigh_lookup(nh->nh_grp->neigh_tbl, &nh->gw_addr, nh->rif->dev); + if (!n) { + n = neigh_create(nh->nh_grp->neigh_tbl, &nh->gw_addr, + nh->rif->dev); + if (IS_ERR(n)) + return PTR_ERR(n); + neigh_event_send(n, NULL); + } + + mlxsw_sp_neigh_entry_remove(mlxsw_sp, neigh_entry); + neigh_entry->key.n = n; + err = mlxsw_sp_neigh_entry_insert(mlxsw_sp, neigh_entry); + if (err) + goto err_neigh_entry_insert; + + read_lock_bh(&n->lock); + nud_state = n->nud_state; + dead = n->dead; + read_unlock_bh(&n->lock); + entry_connected = nud_state & NUD_VALID && !dead; + + list_for_each_entry(nh, &neigh_entry->nexthop_list, + neigh_list_node) { + neigh_release(old_n); + neigh_clone(n); + __mlxsw_sp_nexthop_neigh_update(nh, !entry_connected); + mlxsw_sp_nexthop_group_refresh(mlxsw_sp, nh->nh_grp); + } + + neigh_release(n); + + return 0; + +err_neigh_entry_insert: + neigh_entry->key.n = old_n; + mlxsw_sp_neigh_entry_insert(mlxsw_sp, neigh_entry); + neigh_release(n); + return err; +} + static void mlxsw_sp_nexthop_neigh_update(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_neigh_entry *neigh_entry, - bool removing) + bool removing, bool dead) { struct mlxsw_sp_nexthop *nh; + if (list_empty(&neigh_entry->nexthop_list)) + return; + + if (dead) { + int err; + + err = mlxsw_sp_nexthop_dead_neigh_replace(mlxsw_sp, + neigh_entry); + if (err) + dev_err(mlxsw_sp->bus_info->dev, "Failed to replace dead neigh\n"); + return; + } + list_for_each_entry(nh, &neigh_entry->nexthop_list, neigh_list_node) { __mlxsw_sp_nexthop_neigh_update(nh, removing); From 45a69b70f54854a043530cc0f86fc30d77bc9620 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 11 Jun 2019 10:19:42 +0300 Subject: [PATCH 091/145] selftests: mlxsw: Test nexthop offload indication Test that IPv4 and IPv6 nexthops are correctly marked with offload indication in response to neighbour events. Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- .../selftests/drivers/net/mlxsw/rtnetlink.sh | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh b/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh index 1c30f302a1e7..5c39e5f6a480 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh @@ -28,6 +28,7 @@ ALL_TESTS=" vlan_interface_uppers_test bridge_extern_learn_test neigh_offload_test + nexthop_offload_test devlink_reload_test " NUM_NETIFS=2 @@ -607,6 +608,52 @@ neigh_offload_test() ip -4 address del 192.0.2.1/24 dev $swp1 } +nexthop_offload_test() +{ + # Test that IPv4 and IPv6 nexthops are marked as offloaded + RET=0 + + sysctl_set net.ipv6.conf.$swp2.keep_addr_on_down 1 + simple_if_init $swp1 192.0.2.1/24 2001:db8:1::1/64 + simple_if_init $swp2 192.0.2.2/24 2001:db8:1::2/64 + setup_wait + + ip -4 route add 198.51.100.0/24 vrf v$swp1 \ + nexthop via 192.0.2.2 dev $swp1 + ip -6 route add 2001:db8:2::/64 vrf v$swp1 \ + nexthop via 2001:db8:1::2 dev $swp1 + + ip -4 route show 198.51.100.0/24 vrf v$swp1 | grep -q offload + check_err $? "ipv4 nexthop not marked as offloaded when should" + ip -6 route show 2001:db8:2::/64 vrf v$swp1 | grep -q offload + check_err $? "ipv6 nexthop not marked as offloaded when should" + + ip link set dev $swp2 down + sleep 1 + + ip -4 route show 198.51.100.0/24 vrf v$swp1 | grep -q offload + check_fail $? "ipv4 nexthop marked as offloaded when should not" + ip -6 route show 2001:db8:2::/64 vrf v$swp1 | grep -q offload + check_fail $? "ipv6 nexthop marked as offloaded when should not" + + ip link set dev $swp2 up + setup_wait + + ip -4 route show 198.51.100.0/24 vrf v$swp1 | grep -q offload + check_err $? "ipv4 nexthop not marked as offloaded after neigh add" + ip -6 route show 2001:db8:2::/64 vrf v$swp1 | grep -q offload + check_err $? "ipv6 nexthop not marked as offloaded after neigh add" + + log_test "nexthop offload indication" + + ip -6 route del 2001:db8:2::/64 vrf v$swp1 + ip -4 route del 198.51.100.0/24 vrf v$swp1 + + simple_if_fini $swp2 192.0.2.2/24 2001:db8:1::2/64 + simple_if_fini $swp1 192.0.2.1/24 2001:db8:1::1/64 + sysctl_restore net.ipv6.conf.$swp2.keep_addr_on_down +} + devlink_reload_test() { # Test that after executing all the above configuration tests, a From e49f9adffb28c5acbaea46ddfe2d17b1373c2473 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 11 Jun 2019 10:19:43 +0300 Subject: [PATCH 092/145] mlxsw: spectrum_flower: Fix TOS matching The TOS value was not extracted correctly. Fix it. Fixes: 87996f91f739 ("mlxsw: spectrum_flower: Add support for ip tos") Reported-by: Alexander Petrovskiy Signed-off-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c index 15f804453cd6..96b23c856f4d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c @@ -247,8 +247,8 @@ static int mlxsw_sp_flower_parse_ip(struct mlxsw_sp *mlxsw_sp, match.mask->tos & 0x3); mlxsw_sp_acl_rulei_keymask_u32(rulei, MLXSW_AFK_ELEMENT_IP_DSCP, - match.key->tos >> 6, - match.mask->tos >> 6); + match.key->tos >> 2, + match.mask->tos >> 2); return 0; } From 0b0c0098348f61e864d85269614bce15f6bbfde7 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 11 Jun 2019 10:19:44 +0300 Subject: [PATCH 093/145] selftests: tc_flower: Add TOS matching test Signed-off-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- .../selftests/net/forwarding/tc_flower.sh | 36 ++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/forwarding/tc_flower.sh b/tools/testing/selftests/net/forwarding/tc_flower.sh index 29bcfa84aec7..124803eea4a9 100755 --- a/tools/testing/selftests/net/forwarding/tc_flower.sh +++ b/tools/testing/selftests/net/forwarding/tc_flower.sh @@ -2,7 +2,8 @@ # SPDX-License-Identifier: GPL-2.0 ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \ - match_src_ip_test match_ip_flags_test match_pcp_test match_vlan_test" + match_src_ip_test match_ip_flags_test match_pcp_test match_vlan_test \ + match_ip_tos_test" NUM_NETIFS=2 source tc_common.sh source lib.sh @@ -276,6 +277,39 @@ match_vlan_test() log_test "VLAN match ($tcflags)" } +match_ip_tos_test() +{ + RET=0 + + tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ + $tcflags dst_ip 192.0.2.2 ip_tos 0x20 action drop + tc filter add dev $h2 ingress protocol ip pref 2 handle 102 flower \ + $tcflags dst_ip 192.0.2.2 ip_tos 0x18 action drop + + $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ + -t ip tos=18 -q + + tc_check_packets "dev $h2 ingress" 101 1 + check_fail $? "Matched on a wrong filter (0x18)" + + tc_check_packets "dev $h2 ingress" 102 1 + check_err $? "Did not match on correct filter (0x18)" + + $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ + -t ip tos=20 -q + + tc_check_packets "dev $h2 ingress" 102 2 + check_fail $? "Matched on a wrong filter (0x20)" + + tc_check_packets "dev $h2 ingress" 101 1 + check_err $? "Did not match on correct filter (0x20)" + + tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower + tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower + + log_test "ip_tos match ($tcflags)" +} + setup_prepare() { h1=${NETIFS[p1]} From e891ce1dd2a54c41a3b70d822cd832735c03b892 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Tue, 11 Jun 2019 10:19:45 +0300 Subject: [PATCH 094/145] mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2 Due to an issue on Spectrum-2, in front-panel ports split four ways, 2 out of 32 port buffers cannot be used. To work around this, the next FW release will mark them as unused, and will report correspondingly lower total shared buffer size. mlxsw will pick up the new value through a query to cap_total_buffer_size resource. However the initial size for shared buffer pool 0 is hard-coded and therefore needs to be updated. Thus reduce the pool size by 2.7 MiB (which corresponds to 2/32 of the total size of 42 MiB), and round down to the whole number of cells. Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration") Signed-off-by: Petr Machata Acked-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c index 8512dd49e420..1537f70bc26d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c @@ -437,8 +437,8 @@ static const struct mlxsw_sp_sb_pr mlxsw_sp1_sb_prs[] = { MLXSW_SP1_SB_PR_CPU_SIZE, true, false), }; -#define MLXSW_SP2_SB_PR_INGRESS_SIZE 40960000 -#define MLXSW_SP2_SB_PR_EGRESS_SIZE 40960000 +#define MLXSW_SP2_SB_PR_INGRESS_SIZE 38128752 +#define MLXSW_SP2_SB_PR_EGRESS_SIZE 38128752 #define MLXSW_SP2_SB_PR_CPU_SIZE (256 * 1000) /* Order according to mlxsw_sp2_sb_pool_dess */ From 4b14cc313f076c37b646cee06a85f0db59cf216c Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 11 Jun 2019 10:19:46 +0300 Subject: [PATCH 095/145] mlxsw: spectrum: Disallow prio-tagged packets when PVID is removed When PVID is removed from a bridge port, the Linux bridge drops both untagged and prio-tagged packets. Align mlxsw with this behavior. Fixes: 148f472da5db ("mlxsw: reg: Add the Switch Port Acceptable Frame Types register") Acked-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index e8002bfc1e8f..7ed63ed657c7 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -997,7 +997,7 @@ static inline void mlxsw_reg_spaft_pack(char *payload, u8 local_port, MLXSW_REG_ZERO(spaft, payload); mlxsw_reg_spaft_local_port_set(payload, local_port); mlxsw_reg_spaft_allow_untagged_set(payload, allow_untagged); - mlxsw_reg_spaft_allow_prio_tagged_set(payload, true); + mlxsw_reg_spaft_allow_prio_tagged_set(payload, allow_untagged); mlxsw_reg_spaft_allow_tagged_set(payload, true); } From 46b0090a6636cf34c0e856f15dd03e15ba4cdda6 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Tue, 11 Jun 2019 11:51:42 +0200 Subject: [PATCH 096/145] net: mvpp2: prs: Fix parser range for VID filtering VID filtering is implemented in the Header Parser, with one range of 11 vids being assigned for each no-loopback port. Make sure we use the per-port range when looking for existing entries in the Parser. Since we used a global range instead of a per-port one, this causes VIDs to be removed from the whitelist from all ports of the same PPv2 instance. Fixes: 56beda3db602 ("net: mvpp2: Add hardware offloading for VLAN filtering") Suggested-by: Yuri Chipchev Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c index 392fd895f278..e0da4db3bf56 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c @@ -1905,8 +1905,7 @@ static int mvpp2_prs_ip6_init(struct mvpp2 *priv) } /* Find tcam entry with matched pair */ -static int mvpp2_prs_vid_range_find(struct mvpp2 *priv, int pmap, u16 vid, - u16 mask) +static int mvpp2_prs_vid_range_find(struct mvpp2_port *port, u16 vid, u16 mask) { unsigned char byte[2], enable[2]; struct mvpp2_prs_entry pe; @@ -1914,13 +1913,13 @@ static int mvpp2_prs_vid_range_find(struct mvpp2 *priv, int pmap, u16 vid, int tid; /* Go through the all entries with MVPP2_PRS_LU_VID */ - for (tid = MVPP2_PE_VID_FILT_RANGE_START; - tid <= MVPP2_PE_VID_FILT_RANGE_END; tid++) { - if (!priv->prs_shadow[tid].valid || - priv->prs_shadow[tid].lu != MVPP2_PRS_LU_VID) + for (tid = MVPP2_PRS_VID_PORT_FIRST(port->id); + tid <= MVPP2_PRS_VID_PORT_LAST(port->id); tid++) { + if (!port->priv->prs_shadow[tid].valid || + port->priv->prs_shadow[tid].lu != MVPP2_PRS_LU_VID) continue; - mvpp2_prs_init_from_hw(priv, &pe, tid); + mvpp2_prs_init_from_hw(port->priv, &pe, tid); mvpp2_prs_tcam_data_byte_get(&pe, 2, &byte[0], &enable[0]); mvpp2_prs_tcam_data_byte_get(&pe, 3, &byte[1], &enable[1]); @@ -1950,7 +1949,7 @@ int mvpp2_prs_vid_entry_add(struct mvpp2_port *port, u16 vid) memset(&pe, 0, sizeof(pe)); /* Scan TCAM and see if entry with this already exist */ - tid = mvpp2_prs_vid_range_find(priv, (1 << port->id), vid, mask); + tid = mvpp2_prs_vid_range_find(port, vid, mask); reg_val = mvpp2_read(priv, MVPP2_MH_REG(port->id)); if (reg_val & MVPP2_DSA_EXTENDED) @@ -2008,7 +2007,7 @@ void mvpp2_prs_vid_entry_remove(struct mvpp2_port *port, u16 vid) int tid; /* Scan TCAM and see if entry with this already exist */ - tid = mvpp2_prs_vid_range_find(priv, (1 << port->id), vid, 0xfff); + tid = mvpp2_prs_vid_range_find(port, vid, 0xfff); /* No such entry */ if (tid < 0) From 6b7a3430c163455cf8a514d636bda52b04654972 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Tue, 11 Jun 2019 11:51:43 +0200 Subject: [PATCH 097/145] net: mvpp2: prs: Use the correct helpers when removing all VID filters When removing all VID filters, the mvpp2_prs_vid_entry_remove would be called with the TCAM id incorrectly used as a VID, causing the wrong TCAM entries to be invalidated. Fix this by directly invalidating entries in the VID range. Fixes: 56beda3db602 ("net: mvpp2: Add hardware offloading for VLAN filtering") Suggested-by: Yuri Chipchev Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c index e0da4db3bf56..ae2240074d8e 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c @@ -2025,8 +2025,10 @@ void mvpp2_prs_vid_remove_all(struct mvpp2_port *port) for (tid = MVPP2_PRS_VID_PORT_FIRST(port->id); tid <= MVPP2_PRS_VID_PORT_LAST(port->id); tid++) { - if (priv->prs_shadow[tid].valid) - mvpp2_prs_vid_entry_remove(port, tid); + if (priv->prs_shadow[tid].valid) { + mvpp2_prs_hw_inv(priv, tid); + priv->prs_shadow[tid].valid = false; + } } } From b1d6c15b9d824a58c5415673f374fac19e8eccdf Mon Sep 17 00:00:00 2001 From: Martynas Pumputis Date: Wed, 12 Jun 2019 18:05:40 +0200 Subject: [PATCH 098/145] bpf: simplify definition of BPF_FIB_LOOKUP related flags Previously, the BPF_FIB_LOOKUP_{DIRECT,OUTPUT} flags in the BPF UAPI were defined with the help of BIT macro. This had the following issues: - In order to use any of the flags, a user was required to depend on . - No other flag in bpf.h uses the macro, so it seems that an unwritten convention is to use (1 << (nr)) to define BPF-related flags. Fixes: 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table") Signed-off-by: Martynas Pumputis Acked-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e4114a7e4451..a8b823c30b43 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3378,8 +3378,8 @@ struct bpf_raw_tracepoint_args { /* DIRECT: Skip the FIB rules and go to FIB table associated with device * OUTPUT: Do lookup from egress perspective; default is ingress */ -#define BPF_FIB_LOOKUP_DIRECT BIT(0) -#define BPF_FIB_LOOKUP_OUTPUT BIT(1) +#define BPF_FIB_LOOKUP_DIRECT (1U << 0) +#define BPF_FIB_LOOKUP_OUTPUT (1U << 1) enum { BPF_FIB_LKUP_RET_SUCCESS, /* lookup successful */ From 0e265747491cb182429adfbef7d002164566f25c Mon Sep 17 00:00:00 2001 From: Martynas Pumputis Date: Wed, 12 Jun 2019 18:05:41 +0200 Subject: [PATCH 099/145] bpf: sync BPF_FIB_LOOKUP flag changes with BPF uapi Sync the changes to the flags made in "bpf: simplify definition of BPF_FIB_LOOKUP related flags" with the BPF UAPI headers. Doing in a separate commit to ease syncing of github/libbpf. Signed-off-by: Martynas Pumputis Acked-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann --- tools/include/uapi/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index e4114a7e4451..a8b823c30b43 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3378,8 +3378,8 @@ struct bpf_raw_tracepoint_args { /* DIRECT: Skip the FIB rules and go to FIB table associated with device * OUTPUT: Do lookup from egress perspective; default is ingress */ -#define BPF_FIB_LOOKUP_DIRECT BIT(0) -#define BPF_FIB_LOOKUP_OUTPUT BIT(1) +#define BPF_FIB_LOOKUP_DIRECT (1U << 0) +#define BPF_FIB_LOOKUP_OUTPUT (1U << 1) enum { BPF_FIB_LKUP_RET_SUCCESS, /* lookup successful */ From 3e0682695199bad51dd898fe064d1564637ff77a Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Thu, 13 Jun 2019 00:21:39 +0530 Subject: [PATCH 100/145] bpf: fix div64 overflow tests to properly detect errors If the result of the division is LLONG_MIN, current tests do not detect the error since the return value is truncated to a 32-bit value and ends up being 0. Signed-off-by: Naveen N. Rao Signed-off-by: Daniel Borkmann --- .../testing/selftests/bpf/verifier/div_overflow.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/verifier/div_overflow.c b/tools/testing/selftests/bpf/verifier/div_overflow.c index bd3f38dbe796..acab4f00819f 100644 --- a/tools/testing/selftests/bpf/verifier/div_overflow.c +++ b/tools/testing/selftests/bpf/verifier/div_overflow.c @@ -29,8 +29,11 @@ "DIV64 overflow, check 1", .insns = { BPF_MOV64_IMM(BPF_REG_1, -1), - BPF_LD_IMM64(BPF_REG_0, LLONG_MIN), - BPF_ALU64_REG(BPF_DIV, BPF_REG_0, BPF_REG_1), + BPF_LD_IMM64(BPF_REG_2, LLONG_MIN), + BPF_ALU64_REG(BPF_DIV, BPF_REG_2, BPF_REG_1), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_2, 1), + BPF_MOV32_IMM(BPF_REG_0, 1), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, @@ -40,8 +43,11 @@ { "DIV64 overflow, check 2", .insns = { - BPF_LD_IMM64(BPF_REG_0, LLONG_MIN), - BPF_ALU64_IMM(BPF_DIV, BPF_REG_0, -1), + BPF_LD_IMM64(BPF_REG_1, LLONG_MIN), + BPF_ALU64_IMM(BPF_DIV, BPF_REG_1, -1), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_1, 1), + BPF_MOV32_IMM(BPF_REG_0, 1), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, From 758f2046ea040773ae8ea7f72dd3bbd8fa984501 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Thu, 13 Jun 2019 00:21:40 +0530 Subject: [PATCH 101/145] powerpc/bpf: use unsigned division instruction for 64-bit operations BPF_ALU64 div/mod operations are currently using signed division, unlike BPF_ALU32 operations. Fix the same. DIV64 and MOD64 overflow tests pass with this fix. Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Naveen N. Rao Signed-off-by: Daniel Borkmann --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/net/bpf_jit.h | 2 +- arch/powerpc/net/bpf_jit_comp64.c | 8 ++++---- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 493c5c943acd..2291daf39cd1 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -338,6 +338,7 @@ #define PPC_INST_MADDLD 0x10000033 #define PPC_INST_DIVWU 0x7c000396 #define PPC_INST_DIVD 0x7c0003d2 +#define PPC_INST_DIVDU 0x7c000392 #define PPC_INST_RLWINM 0x54000000 #define PPC_INST_RLWINM_DOT 0x54000001 #define PPC_INST_RLWIMI 0x50000000 diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index dcac37745b05..1e932898d430 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -116,7 +116,7 @@ ___PPC_RA(a) | IMM_L(i)) #define PPC_DIVWU(d, a, b) EMIT(PPC_INST_DIVWU | ___PPC_RT(d) | \ ___PPC_RA(a) | ___PPC_RB(b)) -#define PPC_DIVD(d, a, b) EMIT(PPC_INST_DIVD | ___PPC_RT(d) | \ +#define PPC_DIVDU(d, a, b) EMIT(PPC_INST_DIVDU | ___PPC_RT(d) | \ ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_AND(d, a, b) EMIT(PPC_INST_AND | ___PPC_RA(d) | \ ___PPC_RS(a) | ___PPC_RB(b)) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 21a1dcd4b156..e3fedeffe40f 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -399,12 +399,12 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */ case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */ if (BPF_OP(code) == BPF_MOD) { - PPC_DIVD(b2p[TMP_REG_1], dst_reg, src_reg); + PPC_DIVDU(b2p[TMP_REG_1], dst_reg, src_reg); PPC_MULD(b2p[TMP_REG_1], src_reg, b2p[TMP_REG_1]); PPC_SUB(dst_reg, dst_reg, b2p[TMP_REG_1]); } else - PPC_DIVD(dst_reg, dst_reg, src_reg); + PPC_DIVDU(dst_reg, dst_reg, src_reg); break; case BPF_ALU | BPF_MOD | BPF_K: /* (u32) dst %= (u32) imm */ case BPF_ALU | BPF_DIV | BPF_K: /* (u32) dst /= (u32) imm */ @@ -432,7 +432,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, break; case BPF_ALU64: if (BPF_OP(code) == BPF_MOD) { - PPC_DIVD(b2p[TMP_REG_2], dst_reg, + PPC_DIVDU(b2p[TMP_REG_2], dst_reg, b2p[TMP_REG_1]); PPC_MULD(b2p[TMP_REG_1], b2p[TMP_REG_1], @@ -440,7 +440,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, PPC_SUB(dst_reg, dst_reg, b2p[TMP_REG_1]); } else - PPC_DIVD(dst_reg, dst_reg, + PPC_DIVDU(dst_reg, dst_reg, b2p[TMP_REG_1]); break; } From 588f7d39b3592a36fb7702ae3b8bdd9be4621e2f Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 13 Feb 2019 15:13:30 +0100 Subject: [PATCH 102/145] mac80211: drop robust management frames from unknown TA When receiving a robust management frame, drop it if we don't have rx->sta since then we don't have a security association and thus couldn't possibly validate the frame. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 25577ede2986..fd3740000e87 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -3831,6 +3831,8 @@ static bool ieee80211_accept_frame(struct ieee80211_rx_data *rx) case NL80211_IFTYPE_STATION: if (!bssid && !sdata->u.mgd.use_4addr) return false; + if (ieee80211_is_robust_mgmt_frame(skb) && !rx->sta) + return false; if (multicast) return true; return ether_addr_equal(sdata->vif.addr, hdr->addr1); From 563572340173865a9a356e6bb02579e6998a876d Mon Sep 17 00:00:00 2001 From: Yibo Zhao Date: Fri, 14 Jun 2019 19:01:52 +0800 Subject: [PATCH 103/145] mac80211: only warn once on chanctx_conf being NULL In multiple SSID cases, it takes time to prepare every AP interface to be ready in initializing phase. If a sta already knows everything it needs to join one of the APs and sends authentication to the AP which is not fully prepared at this point of time, AP's channel context could be NULL. As a result, warning message occurs. Even worse, if the AP is under attack via tools such as MDK3 and massive authentication requests are received in a very short time, console will be hung due to kernel warning messages. WARN_ON_ONCE() could be a better way for indicating warning messages without duplicate messages to flood the console. Johannes: We still need to address the underlying problem, but we don't really have a good handle on it yet. Suppress the worst side-effects for now. Signed-off-by: Zhi Chen Signed-off-by: Yibo Zhao [johannes: add note, change subject] Signed-off-by: Johannes Berg --- net/mac80211/ieee80211_i.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index a8af4aafa117..682d0ab1bf89 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1435,7 +1435,7 @@ ieee80211_get_sband(struct ieee80211_sub_if_data *sdata) rcu_read_lock(); chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf); - if (WARN_ON(!chanctx_conf)) { + if (WARN_ON_ONCE(!chanctx_conf)) { rcu_read_unlock(); return NULL; } From ebb3ca3b4477bbc118976c77fe1913507df718ec Mon Sep 17 00:00:00 2001 From: Luca Coelho Date: Wed, 29 May 2019 15:25:29 +0300 Subject: [PATCH 104/145] cfg80211: use BIT_ULL in cfg80211_parse_mbssid_data() The seen_indices variable is u64 and in other parts of the code we assume mbssid_index_ie[2] can be up to 45, so we should use the 64-bit versions of BIT, namely, BIT_ULL(). Reported-by: Dan Carpented Signed-off-by: Luca Coelho Signed-off-by: Johannes Berg --- net/wireless/scan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/wireless/scan.c b/net/wireless/scan.c index c04f5451f89b..aa571d727903 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -1601,12 +1601,12 @@ static void cfg80211_parse_mbssid_data(struct wiphy *wiphy, continue; } - if (seen_indices & BIT(mbssid_index_ie[2])) + if (seen_indices & BIT_ULL(mbssid_index_ie[2])) /* We don't support legacy split of a profile */ net_dbg_ratelimited("Partial info for BSSID index %d\n", mbssid_index_ie[2]); - seen_indices |= BIT(mbssid_index_ie[2]); + seen_indices |= BIT_ULL(mbssid_index_ie[2]); non_tx_data->bssid_index = mbssid_index_ie[2]; non_tx_data->max_bssid_indicator = elem->data[0]; From f8891461a277ec0afc493fd30cd975a38048a038 Mon Sep 17 00:00:00 2001 From: Naftali Goldstein Date: Wed, 29 May 2019 15:25:30 +0300 Subject: [PATCH 105/145] mac80211: do not start any work during reconfigure flow It is not a good idea to try to perform any work (e.g. send an auth frame) during reconfigure flow. Prevent this from happening, and at the end of the reconfigure flow requeue all the works. Signed-off-by: Naftali Goldstein Signed-off-by: Luca Coelho Signed-off-by: Johannes Berg --- net/mac80211/ieee80211_i.h | 7 +++++++ net/mac80211/util.c | 4 ++++ 2 files changed, 11 insertions(+) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 682d0ab1bf89..a86fcae279a6 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -2037,6 +2037,13 @@ void __ieee80211_flush_queues(struct ieee80211_local *local, static inline bool ieee80211_can_run_worker(struct ieee80211_local *local) { + /* + * It's unsafe to try to do any work during reconfigure flow. + * When the flow ends the work will be requeued. + */ + if (local->in_reconfig) + return false; + /* * If quiescing is set, we are racing with __ieee80211_suspend. * __ieee80211_suspend flushes the workers after setting quiescing, diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 1c8384f81526..e2edc2a3cc8b 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -2480,6 +2480,10 @@ int ieee80211_reconfig(struct ieee80211_local *local) mutex_lock(&local->mtx); ieee80211_start_next_roc(local); mutex_unlock(&local->mtx); + + /* Requeue all works */ + list_for_each_entry(sdata, &local->interfaces, list) + ieee80211_queue_work(&local->hw, &sdata->work); } ieee80211_wake_queues_by_reason(hw, IEEE80211_MAX_QUEUE_MAP, From 1a473d6092d5d182914bea854ce0b21e6d12519d Mon Sep 17 00:00:00 2001 From: Mordechay Goodstein Date: Wed, 29 May 2019 15:25:31 +0300 Subject: [PATCH 106/145] cfg80211: util: fix bit count off by one The bits of Rx MCS Map in VHT capability were enumerated with index transform - index i -> (i + 1) bit => nss i. BUG! while it should be - index i -> (i + 1) bit => (i + 1) nss. The bug was exposed in commit a53b2a0b1245 ("iwlwifi: mvm: implement VHT extended NSS support in rs.c"), where iwlwifi started using the function. Signed-off-by: Mordechay Goodstein Fixes: b0aa75f0b1b2 ("ieee80211: add new VHT capability fields/parsing") Signed-off-by: Luca Coelho Signed-off-by: Johannes Berg --- net/wireless/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index b9d8ceb21327..1c39d6a2e850 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -1998,7 +1998,7 @@ int ieee80211_get_vht_max_nss(struct ieee80211_vht_cap *cap, continue; if (supp >= mcs_encoding) { - max_vht_nss = i; + max_vht_nss = i + 1; break; } } From 4f488fbca2a86cc7714a128952eead92cac279ab Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 10 Jun 2019 13:02:19 -0700 Subject: [PATCH 107/145] cfg80211: fix memory leak of wiphy device name In wiphy_new_nm(), if an error occurs after dev_set_name() and device_initialize() have already been called, it's necessary to call put_device() (via wiphy_free()) to avoid a memory leak. Reported-by: syzbot+7fddca22578bc67c3fe4@syzkaller.appspotmail.com Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Signed-off-by: Johannes Berg --- net/wireless/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/core.c b/net/wireless/core.c index 4e83892f1ac2..c58acca09301 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -513,7 +513,7 @@ use_default_name: &rdev->rfkill_ops, rdev); if (!rdev->rfkill) { - kfree(rdev); + wiphy_free(&rdev->wiphy); return NULL; } From b65842025335711e2a0259feb4dbadb0c9ffb6d9 Mon Sep 17 00:00:00 2001 From: Avraham Stern Date: Wed, 29 May 2019 15:25:28 +0300 Subject: [PATCH 108/145] cfg80211: report measurement start TSF correctly Instead of reporting the AP's TSF, host time was reported. Fix it. Signed-off-by: Avraham Stern Signed-off-by: Luca Coelho Signed-off-by: Johannes Berg --- net/wireless/pmsr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/wireless/pmsr.c b/net/wireless/pmsr.c index 1b190475359a..c09fbf09549d 100644 --- a/net/wireless/pmsr.c +++ b/net/wireless/pmsr.c @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* - * Copyright (C) 2018 Intel Corporation + * Copyright (C) 2018 - 2019 Intel Corporation */ #ifndef __PMSR_H #define __PMSR_H @@ -448,7 +448,7 @@ static int nl80211_pmsr_send_result(struct sk_buff *msg, if (res->ap_tsf_valid && nla_put_u64_64bit(msg, NL80211_PMSR_RESP_ATTR_AP_TSF, - res->host_time, NL80211_PMSR_RESP_ATTR_PAD)) + res->ap_tsf, NL80211_PMSR_RESP_ATTR_PAD)) goto error; if (res->final && nla_put_flag(msg, NL80211_PMSR_RESP_ATTR_FINAL)) From 385097a3675749cbc9e97c085c0e5dfe4269ca51 Mon Sep 17 00:00:00 2001 From: Young Xiao <92siuyang@gmail.com> Date: Fri, 14 Jun 2019 15:13:02 +0800 Subject: [PATCH 109/145] nfc: Ensure presence of required attributes in the deactivate_target handler Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to accessing them. This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Signed-off-by: Young Xiao <92siuyang@gmail.com> Signed-off-by: David S. Miller --- net/nfc/netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index 1180b3e58a0a..ea64c90b14e8 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -911,7 +911,8 @@ static int nfc_genl_deactivate_target(struct sk_buff *skb, u32 device_idx, target_idx; int rc; - if (!info->attrs[NFC_ATTR_DEVICE_INDEX]) + if (!info->attrs[NFC_ATTR_DEVICE_INDEX] || + !info->attrs[NFC_ATTR_TARGET_INDEX]) return -EINVAL; device_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]); From 4add700968c7761acba88e70a0aa3f44e5ad359d Mon Sep 17 00:00:00 2001 From: Russell King - ARM Linux admin Date: Fri, 14 Jun 2019 11:37:49 +0100 Subject: [PATCH 110/145] net: phylink: further mac_config documentation improvements While reviewing the DPAA2 work, it has become apparent that we need better documentation about which members of the phylink link state structure are valid in the mac_config call. Improve this documentation. Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/phylink.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 6411c624f63a..2d2e55dfea94 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -123,11 +123,20 @@ int mac_link_state(struct net_device *ndev, * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND. * @state: a pointer to a &struct phylink_link_state. * + * Note - not all members of @state are valid. In particular, + * @state->lp_advertising, @state->link, @state->an_complete are never + * guaranteed to be correct, and so any mac_config() implementation must + * never reference these fields. + * * The action performed depends on the currently selected mode: * * %MLO_AN_FIXED, %MLO_AN_PHY: * Configure the specified @state->speed, @state->duplex and - * @state->pause (%MLO_PAUSE_TX / %MLO_PAUSE_RX) mode. + * @state->pause (%MLO_PAUSE_TX / %MLO_PAUSE_RX) modes over a link + * specified by @state->interface. @state->advertising may be used, + * but is not required. Other members of @state must be ignored. + * + * Valid state members: interface, speed, duplex, pause, advertising. * * %MLO_AN_INBAND: * place the link in an inband negotiation mode (such as 802.3z @@ -150,6 +159,8 @@ int mac_link_state(struct net_device *ndev, * responsible for reading the configuration word and configuring * itself accordingly. * + * Valid state members: interface, an_enabled, pause, advertising. + * * Implementations are expected to update the MAC to reflect the * requested settings - i.o.w., if nothing has changed between two * calls, no action is expected. If only flow control settings have From d4dd153d551634683fccf8881f606fa9f3dfa1ef Mon Sep 17 00:00:00 2001 From: Toshiaki Makita Date: Fri, 14 Jun 2019 17:20:13 +0900 Subject: [PATCH 111/145] bpf, devmap: Fix premature entry free on destroying map dev_map_free() waits for flush_needed bitmap to be empty in order to ensure all flush operations have completed before freeing its entries. However the corresponding clear_bit() was called before using the entries, so the entries could be used after free. All access to the entries needs to be done before clearing the bit. It seems commit a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") accidentally changed the clear_bit() and memory access order. Note that the problem happens only in __dev_map_flush(), not in dev_map_flush_old(). dev_map_flush_old() is called only after nulling out the corresponding netdev_map entry, so dev_map_free() never frees the entry thus no such race happens there. Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") Signed-off-by: Toshiaki Makita Signed-off-by: Daniel Borkmann --- kernel/bpf/devmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 1e525d70f833..e001fb1a96b1 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -291,10 +291,10 @@ void __dev_map_flush(struct bpf_map *map) if (unlikely(!dev)) continue; - __clear_bit(bit, bitmap); - bq = this_cpu_ptr(dev->bulkq); bq_xmit_all(dev, bq, XDP_XMIT_FLUSH, true); + + __clear_bit(bit, bitmap); } } From edabf4d9dd905acd60048ea1579943801e3a4876 Mon Sep 17 00:00:00 2001 From: Toshiaki Makita Date: Fri, 14 Jun 2019 17:20:14 +0900 Subject: [PATCH 112/145] bpf, devmap: Add missing bulk queue free dev_map_free() forgot to free bulk queue when freeing its entries. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Signed-off-by: Toshiaki Makita Acked-by: Jesper Dangaard Brouer Signed-off-by: Daniel Borkmann --- kernel/bpf/devmap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index e001fb1a96b1..a126d95d12de 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -186,6 +186,7 @@ static void dev_map_free(struct bpf_map *map) if (!dev) continue; + free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } From 86723c8640633bee4b4588d3c7784ee7a0032f65 Mon Sep 17 00:00:00 2001 From: Toshiaki Makita Date: Fri, 14 Jun 2019 17:20:15 +0900 Subject: [PATCH 113/145] bpf, devmap: Add missing RCU read lock on flush .ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net uses RCU to detect it has setup the resources for tx. The assumption accidentally broke when introducing bulk queue in devmap. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Reported-by: David Ahern Signed-off-by: Toshiaki Makita Signed-off-by: Daniel Borkmann --- kernel/bpf/devmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index a126d95d12de..1defea4b2755 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -282,6 +282,7 @@ void __dev_map_flush(struct bpf_map *map) unsigned long *bitmap = this_cpu_ptr(dtab->flush_needed); u32 bit; + rcu_read_lock(); for_each_set_bit(bit, bitmap, map->max_entries) { struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]); struct xdp_bulk_queue *bq; @@ -297,6 +298,7 @@ void __dev_map_flush(struct bpf_map *map) __clear_bit(bit, bitmap); } + rcu_read_unlock(); } /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or @@ -389,6 +391,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev) int cpu; + rcu_read_lock(); for_each_online_cpu(cpu) { bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu); __clear_bit(dev->bit, bitmap); @@ -396,6 +399,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev) bq = per_cpu_ptr(dev->bulkq, cpu); bq_xmit_all(dev, bq, XDP_XMIT_FLUSH, false); } + rcu_read_unlock(); } } From fe8d9571dc50232b569242fac7ea6332a654f186 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Fri, 14 Jun 2019 15:43:28 -0700 Subject: [PATCH 114/145] bpf, x64: fix stack layout of JITed bpf code Since commit 177366bf7ceb the %rbp stopped pointing to %rbp of the previous stack frame. That broke frame pointer based stack unwinding. This commit is a partial revert of it. Note that the location of tail_call_cnt is fixed, since the verifier enforces MAX_BPF_STACK stack size for programs with tail calls. Fixes: 177366bf7ceb ("bpf: change x86 JITed program stack layout") Signed-off-by: Alexei Starovoitov --- arch/x86/net/bpf_jit_comp.c | 74 +++++++++++-------------------------- 1 file changed, 21 insertions(+), 53 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index afabf597c855..d88bc0935886 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -190,9 +190,7 @@ struct jit_context { #define BPF_MAX_INSN_SIZE 128 #define BPF_INSN_SAFETY 64 -#define AUX_STACK_SPACE 40 /* Space for RBX, R13, R14, R15, tailcnt */ - -#define PROLOGUE_SIZE 37 +#define PROLOGUE_SIZE 20 /* * Emit x86-64 prologue code for BPF program and check its size. @@ -203,44 +201,19 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf) u8 *prog = *pprog; int cnt = 0; - /* push rbp */ - EMIT1(0x55); - - /* mov rbp,rsp */ - EMIT3(0x48, 0x89, 0xE5); - - /* sub rsp, rounded_stack_depth + AUX_STACK_SPACE */ - EMIT3_off32(0x48, 0x81, 0xEC, - round_up(stack_depth, 8) + AUX_STACK_SPACE); - - /* sub rbp, AUX_STACK_SPACE */ - EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE); - - /* mov qword ptr [rbp+0],rbx */ - EMIT4(0x48, 0x89, 0x5D, 0); - /* mov qword ptr [rbp+8],r13 */ - EMIT4(0x4C, 0x89, 0x6D, 8); - /* mov qword ptr [rbp+16],r14 */ - EMIT4(0x4C, 0x89, 0x75, 16); - /* mov qword ptr [rbp+24],r15 */ - EMIT4(0x4C, 0x89, 0x7D, 24); - + EMIT1(0x55); /* push rbp */ + EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */ + /* sub rsp, rounded_stack_depth */ + EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); + EMIT1(0x53); /* push rbx */ + EMIT2(0x41, 0x55); /* push r13 */ + EMIT2(0x41, 0x56); /* push r14 */ + EMIT2(0x41, 0x57); /* push r15 */ if (!ebpf_from_cbpf) { - /* - * Clear the tail call counter (tail_call_cnt): for eBPF tail - * calls we need to reset the counter to 0. It's done in two - * instructions, resetting RAX register to 0, and moving it - * to the counter location. - */ - - /* xor eax, eax */ - EMIT2(0x31, 0xc0); - /* mov qword ptr [rbp+32], rax */ - EMIT4(0x48, 0x89, 0x45, 32); - + /* zero init tail_call_cnt */ + EMIT2(0x6a, 0x00); BUILD_BUG_ON(cnt != PROLOGUE_SIZE); } - *pprog = prog; } @@ -285,13 +258,13 @@ static void emit_bpf_tail_call(u8 **pprog) * if (tail_call_cnt > MAX_TAIL_CALL_CNT) * goto out; */ - EMIT2_off32(0x8B, 0x85, 36); /* mov eax, dword ptr [rbp + 36] */ + EMIT2_off32(0x8B, 0x85, -36 - MAX_BPF_STACK); /* mov eax, dword ptr [rbp - 548] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ #define OFFSET2 (30 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JA, OFFSET2); /* ja out */ label2 = cnt; EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ - EMIT2_off32(0x89, 0x85, 36); /* mov dword ptr [rbp + 36], eax */ + EMIT2_off32(0x89, 0x85, -36 - MAX_BPF_STACK); /* mov dword ptr [rbp -548], eax */ /* prog = array->ptrs[index]; */ EMIT4_off32(0x48, 0x8B, 0x84, 0xD6, /* mov rax, [rsi + rdx * 8 + offsetof(...)] */ @@ -1040,19 +1013,14 @@ emit_jmp: seen_exit = true; /* Update cleanup_addr */ ctx->cleanup_addr = proglen; - /* mov rbx, qword ptr [rbp+0] */ - EMIT4(0x48, 0x8B, 0x5D, 0); - /* mov r13, qword ptr [rbp+8] */ - EMIT4(0x4C, 0x8B, 0x6D, 8); - /* mov r14, qword ptr [rbp+16] */ - EMIT4(0x4C, 0x8B, 0x75, 16); - /* mov r15, qword ptr [rbp+24] */ - EMIT4(0x4C, 0x8B, 0x7D, 24); - - /* add rbp, AUX_STACK_SPACE */ - EMIT4(0x48, 0x83, 0xC5, AUX_STACK_SPACE); - EMIT1(0xC9); /* leave */ - EMIT1(0xC3); /* ret */ + if (!bpf_prog_was_classic(bpf_prog)) + EMIT1(0x5B); /* get rid of tail_call_cnt */ + EMIT2(0x41, 0x5F); /* pop r15 */ + EMIT2(0x41, 0x5E); /* pop r14 */ + EMIT2(0x41, 0x5D); /* pop r13 */ + EMIT1(0x5B); /* pop rbx */ + EMIT1(0xC9); /* leave */ + EMIT1(0xC3); /* ret */ break; default: From 61356088ace1866a847a727d4d40da7bf00b67fc Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Wed, 12 Jun 2019 19:02:13 +0200 Subject: [PATCH 115/145] qmi_wwan: add support for QMAP padding in the RX path The QMAP code in the qmi_wwan driver is based on the CodeAurora GobiNet driver which does not process QMAP padding in the RX path correctly. Add support for QMAP padding to qmimux_rx_fixup() according to the description of the rmnet driver. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas Signed-off-by: Reinhard Speyerer Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index d9a6699abe59..fd3d078a1923 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -153,7 +153,7 @@ static bool qmimux_has_slaves(struct usbnet *dev) static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb) { - unsigned int len, offset = 0; + unsigned int len, offset = 0, pad_len, pkt_len; struct qmimux_hdr *hdr; struct net_device *net; struct sk_buff *skbn; @@ -171,10 +171,16 @@ static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb) if (hdr->pad & 0x80) goto skip; + /* extract padding length and check for valid length info */ + pad_len = hdr->pad & 0x3f; + if (len == 0 || pad_len >= len) + goto skip; + pkt_len = len - pad_len; + net = qmimux_find_dev(dev, hdr->mux_id); if (!net) goto skip; - skbn = netdev_alloc_skb(net, len); + skbn = netdev_alloc_skb(net, pkt_len); if (!skbn) return 0; skbn->dev = net; @@ -191,7 +197,7 @@ static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb) goto skip; } - skb_put_data(skbn, skb->data + offset + qmimux_hdr_sz, len); + skb_put_data(skbn, skb->data + offset + qmimux_hdr_sz, pkt_len); if (netif_rx(skbn) != NET_RX_SUCCESS) return 0; From 44f82312fe9113bab6642f4d0eab6b1b7902b6e1 Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Wed, 12 Jun 2019 19:02:46 +0200 Subject: [PATCH 116/145] qmi_wwan: add network device usage statistics for qmimux devices Add proper network device usage statistics for qmimux devices instead of reporting all-zero values for them. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas Signed-off-by: Reinhard Speyerer Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 76 +++++++++++++++++++++++++++++++++++--- 1 file changed, 71 insertions(+), 5 deletions(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index fd3d078a1923..b0a96459621f 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -22,6 +22,7 @@ #include #include #include +#include /* This driver supports wwan (3G/LTE/?) devices using a vendor * specific management protocol called Qualcomm MSM Interface (QMI) - @@ -75,6 +76,7 @@ struct qmimux_hdr { struct qmimux_priv { struct net_device *real_dev; u8 mux_id; + struct pcpu_sw_netstats __percpu *stats64; }; static int qmimux_open(struct net_device *dev) @@ -101,19 +103,65 @@ static netdev_tx_t qmimux_start_xmit(struct sk_buff *skb, struct net_device *dev struct qmimux_priv *priv = netdev_priv(dev); unsigned int len = skb->len; struct qmimux_hdr *hdr; + netdev_tx_t ret; hdr = skb_push(skb, sizeof(struct qmimux_hdr)); hdr->pad = 0; hdr->mux_id = priv->mux_id; hdr->pkt_len = cpu_to_be16(len); skb->dev = priv->real_dev; - return dev_queue_xmit(skb); + ret = dev_queue_xmit(skb); + + if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) { + struct pcpu_sw_netstats *stats64 = this_cpu_ptr(priv->stats64); + + u64_stats_update_begin(&stats64->syncp); + stats64->tx_packets++; + stats64->tx_bytes += len; + u64_stats_update_end(&stats64->syncp); + } else { + dev->stats.tx_dropped++; + } + + return ret; +} + +static void qmimux_get_stats64(struct net_device *net, + struct rtnl_link_stats64 *stats) +{ + struct qmimux_priv *priv = netdev_priv(net); + unsigned int start; + int cpu; + + netdev_stats_to_stats64(stats, &net->stats); + + for_each_possible_cpu(cpu) { + struct pcpu_sw_netstats *stats64; + u64 rx_packets, rx_bytes; + u64 tx_packets, tx_bytes; + + stats64 = per_cpu_ptr(priv->stats64, cpu); + + do { + start = u64_stats_fetch_begin_irq(&stats64->syncp); + rx_packets = stats64->rx_packets; + rx_bytes = stats64->rx_bytes; + tx_packets = stats64->tx_packets; + tx_bytes = stats64->tx_bytes; + } while (u64_stats_fetch_retry_irq(&stats64->syncp, start)); + + stats->rx_packets += rx_packets; + stats->rx_bytes += rx_bytes; + stats->tx_packets += tx_packets; + stats->tx_bytes += tx_bytes; + } } static const struct net_device_ops qmimux_netdev_ops = { - .ndo_open = qmimux_open, - .ndo_stop = qmimux_stop, - .ndo_start_xmit = qmimux_start_xmit, + .ndo_open = qmimux_open, + .ndo_stop = qmimux_stop, + .ndo_start_xmit = qmimux_start_xmit, + .ndo_get_stats64 = qmimux_get_stats64, }; static void qmimux_setup(struct net_device *dev) @@ -198,8 +246,19 @@ static int qmimux_rx_fixup(struct usbnet *dev, struct sk_buff *skb) } skb_put_data(skbn, skb->data + offset + qmimux_hdr_sz, pkt_len); - if (netif_rx(skbn) != NET_RX_SUCCESS) + if (netif_rx(skbn) != NET_RX_SUCCESS) { + net->stats.rx_errors++; return 0; + } else { + struct pcpu_sw_netstats *stats64; + struct qmimux_priv *priv = netdev_priv(net); + + stats64 = this_cpu_ptr(priv->stats64); + u64_stats_update_begin(&stats64->syncp); + stats64->rx_packets++; + stats64->rx_bytes += pkt_len; + u64_stats_update_end(&stats64->syncp); + } skip: offset += len + qmimux_hdr_sz; @@ -223,6 +282,12 @@ static int qmimux_register_device(struct net_device *real_dev, u8 mux_id) priv->mux_id = mux_id; priv->real_dev = real_dev; + priv->stats64 = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); + if (!priv->stats64) { + err = -ENOBUFS; + goto out_free_newdev; + } + err = register_netdevice(new_dev); if (err < 0) goto out_free_newdev; @@ -252,6 +317,7 @@ static void qmimux_unregister_device(struct net_device *dev) struct qmimux_priv *priv = netdev_priv(dev); struct net_device *real_dev = priv->real_dev; + free_percpu(priv->stats64); netdev_upper_dev_unlink(real_dev, dev); unregister_netdevice(dev); From a8fdde1cb830e560208af42b6c10750137f53eb3 Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Wed, 12 Jun 2019 19:03:15 +0200 Subject: [PATCH 117/145] qmi_wwan: avoid RCU stalls on device disconnect when in QMAP mode Switch qmimux_unregister_device() and qmi_wwan_disconnect() to use unregister_netdevice_queue() and unregister_netdevice_many() instead of unregister_netdevice(). This avoids RCU stalls which have been observed on device disconnect in certain setups otherwise. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas Signed-off-by: Reinhard Speyerer Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index b0a96459621f..c6fbc2a2a785 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -312,14 +312,15 @@ out_free_newdev: return err; } -static void qmimux_unregister_device(struct net_device *dev) +static void qmimux_unregister_device(struct net_device *dev, + struct list_head *head) { struct qmimux_priv *priv = netdev_priv(dev); struct net_device *real_dev = priv->real_dev; free_percpu(priv->stats64); netdev_upper_dev_unlink(real_dev, dev); - unregister_netdevice(dev); + unregister_netdevice_queue(dev, head); /* Get rid of the reference to real_dev */ dev_put(real_dev); @@ -490,7 +491,7 @@ static ssize_t del_mux_store(struct device *d, struct device_attribute *attr, c ret = -EINVAL; goto err; } - qmimux_unregister_device(del_dev); + qmimux_unregister_device(del_dev, NULL); if (!qmimux_has_slaves(dev)) info->flags &= ~QMI_WWAN_FLAG_MUX; @@ -1500,6 +1501,7 @@ static void qmi_wwan_disconnect(struct usb_interface *intf) struct qmi_wwan_state *info; struct list_head *iter; struct net_device *ldev; + LIST_HEAD(list); /* called twice if separate control and data intf */ if (!dev) @@ -1512,8 +1514,9 @@ static void qmi_wwan_disconnect(struct usb_interface *intf) } rcu_read_lock(); netdev_for_each_upper_dev_rcu(dev->net, ldev, iter) - qmimux_unregister_device(ldev); + qmimux_unregister_device(ldev, &list); rcu_read_unlock(); + unregister_netdevice_many(&list); rtnl_unlock(); info->flags &= ~QMI_WWAN_FLAG_MUX; } From 36815b416fa48766ac5a98e4b2dc3ebc5887222e Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Wed, 12 Jun 2019 19:03:50 +0200 Subject: [PATCH 118/145] qmi_wwan: extend permitted QMAP mux_id value range Permit mux_id values up to 254 to be used in qmimux_register_device() for compatibility with ip(8) and the rmnet driver. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas Signed-off-by: Reinhard Speyerer Signed-off-by: David S. Miller --- Documentation/ABI/testing/sysfs-class-net-qmi | 4 ++-- drivers/net/usb/qmi_wwan.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-net-qmi b/Documentation/ABI/testing/sysfs-class-net-qmi index 7122d6264c49..c310db4ccbc2 100644 --- a/Documentation/ABI/testing/sysfs-class-net-qmi +++ b/Documentation/ABI/testing/sysfs-class-net-qmi @@ -29,7 +29,7 @@ Contact: Bjørn Mork Description: Unsigned integer. - Write a number ranging from 1 to 127 to add a qmap mux + Write a number ranging from 1 to 254 to add a qmap mux based network device, supported by recent Qualcomm based modems. @@ -46,5 +46,5 @@ Contact: Bjørn Mork Description: Unsigned integer. - Write a number ranging from 1 to 127 to delete a previously + Write a number ranging from 1 to 254 to delete a previously created qmap mux based network device. diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index c6fbc2a2a785..780c10ee359b 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -429,8 +429,8 @@ static ssize_t add_mux_store(struct device *d, struct device_attribute *attr, c if (kstrtou8(buf, 0, &mux_id)) return -EINVAL; - /* mux_id [1 - 0x7f] range empirically found */ - if (mux_id < 1 || mux_id > 0x7f) + /* mux_id [1 - 254] for compatibility with ip(8) and the rmnet driver */ + if (mux_id < 1 || mux_id > 254) return -EINVAL; if (!rtnl_trylock()) From 6bb9e376c2a4cc5120c3bf5fd3048b9a0a6ec1f8 Mon Sep 17 00:00:00 2001 From: Robert Hancock Date: Wed, 12 Jun 2019 14:33:32 -0600 Subject: [PATCH 119/145] net: dsa: microchip: Don't try to read stats for unused ports If some of the switch ports were not listed in the device tree, due to being unused, the ksz_mib_read_work function ended up accessing a NULL dp->slave pointer and causing an oops. Skip checking statistics for any unused ports. Fixes: 7c6ff470aa867f53 ("net: dsa: microchip: add MIB counter reading support") Signed-off-by: Robert Hancock Reviewed-by: Vivien Didelot Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz_common.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 39dace8e3512..f46086fa9064 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -83,6 +83,9 @@ static void ksz_mib_read_work(struct work_struct *work) int i; for (i = 0; i < dev->mib_port_cnt; i++) { + if (dsa_is_unused_port(dev->ds, i)) + continue; + p = &dev->ports[i]; mib = &p->mib; mutex_lock(&mib->cnt_mutex); From ce950f1050cece5e406a5cde723c69bba60e1b26 Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Thu, 13 Jun 2019 06:35:59 -0400 Subject: [PATCH 120/145] sctp: Free cookie before we memdup a new one Based on comments from Xin, even after fixes for our recent syzbot report of cookie memory leaks, its possible to get a resend of an INIT chunk which would lead to us leaking cookie memory. To ensure that we don't leak cookie memory, free any previously allocated cookie first. Change notes v1->v2 update subsystem tag in subject (davem) repeat kfree check for peer_random and peer_hmacs (xin) v2->v3 net->sctp also free peer_chunks v3->v4 fix subject tags v4->v5 remove cut line Signed-off-by: Neil Horman Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com CC: Marcelo Ricardo Leitner CC: Xin Long CC: "David S. Miller" CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller --- net/sctp/sm_make_chunk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index f17908f5c4f3..9b0e5b0d701a 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -2583,6 +2583,8 @@ do_addr_param: case SCTP_PARAM_STATE_COOKIE: asoc->peer.cookie_len = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); + if (asoc->peer.cookie) + kfree(asoc->peer.cookie); asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp); if (!asoc->peer.cookie) retval = 0; @@ -2647,6 +2649,8 @@ do_addr_param: goto fall_through; /* Save peer's random parameter */ + if (asoc->peer.peer_random) + kfree(asoc->peer.peer_random); asoc->peer.peer_random = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_random) { @@ -2660,6 +2664,8 @@ do_addr_param: goto fall_through; /* Save peer's HMAC list */ + if (asoc->peer.peer_hmacs) + kfree(asoc->peer.peer_hmacs); asoc->peer.peer_hmacs = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_hmacs) { @@ -2675,6 +2681,8 @@ do_addr_param: if (!ep->auth_enable) goto fall_through; + if (asoc->peer.peer_chunks) + kfree(asoc->peer.peer_chunks); asoc->peer.peer_chunks = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_chunks) From f0c03ee0ec664e07e0ec1ead7091cbe53f0f321c Mon Sep 17 00:00:00 2001 From: Anders Roxell Date: Thu, 13 Jun 2019 13:35:03 +0200 Subject: [PATCH 121/145] net: dsa: fix warning same module names When building with CONFIG_NET_DSA_REALTEK_SMI and CONFIG_REALTEK_PHY enabled as loadable modules, we see the following warning: warning: same module names found: drivers/net/phy/realtek.ko drivers/net/dsa/realtek.ko Rework so the driver name is realtek-smi instead of realtek. Reviewed-by: Linus Walleij Reviewed-by: Andrew Lunn Signed-off-by: Anders Roxell Signed-off-by: David S. Miller --- drivers/net/dsa/Makefile | 4 ++-- drivers/net/dsa/{realtek-smi.c => realtek-smi-core.c} | 2 +- drivers/net/dsa/{realtek-smi.h => realtek-smi-core.h} | 0 drivers/net/dsa/rtl8366.c | 2 +- drivers/net/dsa/rtl8366rb.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) rename drivers/net/dsa/{realtek-smi.c => realtek-smi-core.c} (99%) rename drivers/net/dsa/{realtek-smi.h => realtek-smi-core.h} (100%) diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile index fefb6aaa82ba..d99dc6de0006 100644 --- a/drivers/net/dsa/Makefile +++ b/drivers/net/dsa/Makefile @@ -9,8 +9,8 @@ obj-$(CONFIG_NET_DSA_LANTIQ_GSWIP) += lantiq_gswip.o obj-$(CONFIG_NET_DSA_MT7530) += mt7530.o obj-$(CONFIG_NET_DSA_MV88E6060) += mv88e6060.o obj-$(CONFIG_NET_DSA_QCA8K) += qca8k.o -obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek.o -realtek-objs := realtek-smi.o rtl8366.o rtl8366rb.o +obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o +realtek-smi-objs := realtek-smi-core.o rtl8366.o rtl8366rb.o obj-$(CONFIG_NET_DSA_SMSC_LAN9303) += lan9303-core.o obj-$(CONFIG_NET_DSA_SMSC_LAN9303_I2C) += lan9303_i2c.o obj-$(CONFIG_NET_DSA_SMSC_LAN9303_MDIO) += lan9303_mdio.o diff --git a/drivers/net/dsa/realtek-smi.c b/drivers/net/dsa/realtek-smi-core.c similarity index 99% rename from drivers/net/dsa/realtek-smi.c rename to drivers/net/dsa/realtek-smi-core.c index ad41ec63cc9f..dc0509c02d29 100644 --- a/drivers/net/dsa/realtek-smi.c +++ b/drivers/net/dsa/realtek-smi-core.c @@ -40,7 +40,7 @@ #include #include -#include "realtek-smi.h" +#include "realtek-smi-core.h" #define REALTEK_SMI_ACK_RETRY_COUNT 5 #define REALTEK_SMI_HW_STOP_DELAY 25 /* msecs */ diff --git a/drivers/net/dsa/realtek-smi.h b/drivers/net/dsa/realtek-smi-core.h similarity index 100% rename from drivers/net/dsa/realtek-smi.h rename to drivers/net/dsa/realtek-smi-core.h diff --git a/drivers/net/dsa/rtl8366.c b/drivers/net/dsa/rtl8366.c index 6dedd43442cc..fe5d976b0b10 100644 --- a/drivers/net/dsa/rtl8366.c +++ b/drivers/net/dsa/rtl8366.c @@ -11,7 +11,7 @@ #include #include -#include "realtek-smi.h" +#include "realtek-smi-core.h" int rtl8366_mc_is_used(struct realtek_smi *smi, int mc_index, int *used) { diff --git a/drivers/net/dsa/rtl8366rb.c b/drivers/net/dsa/rtl8366rb.c index 40b3974970c6..a268085ffad2 100644 --- a/drivers/net/dsa/rtl8366rb.c +++ b/drivers/net/dsa/rtl8366rb.c @@ -20,7 +20,7 @@ #include #include -#include "realtek-smi.h" +#include "realtek-smi-core.h" #define RTL8366RB_PORT_NUM_CPU 5 #define RTL8366RB_NUM_PORTS 6 From 99815f5031db36414178f45e3009fb5f0e219dd4 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Thu, 13 Jun 2019 17:54:04 +0300 Subject: [PATCH 122/145] net: sched: flower: don't call synchronize_rcu() on mask creation Current flower mask creating code assumes that temporary mask that is used when inserting new filter is stack allocated. To prevent race condition with data patch synchronize_rcu() is called every time fl_create_new_mask() replaces temporary stack allocated mask. As reported by Jiri, this increases runtime of creating 20000 flower classifiers from 4 seconds to 163 seconds. However, this design is no longer necessary since temporary mask was converted to be dynamically allocated by commit 2cddd2014782 ("net/sched: cls_flower: allocate mask dynamically in fl_change()"). Remove synchronize_rcu() calls from mask creation code. Instead, refactor fl_change() to always deallocate temporary mask with rcu grace period. Fixes: 195c234d15c9 ("net: sched: flower: handle concurrent mask insertion") Reported-by: Jiri Pirko Signed-off-by: Vlad Buslov Tested-by: Jiri Pirko Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- net/sched/cls_flower.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index c388372df0e2..eedd5786c084 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -320,10 +320,13 @@ static int fl_init(struct tcf_proto *tp) return rhashtable_init(&head->ht, &mask_ht_params); } -static void fl_mask_free(struct fl_flow_mask *mask) +static void fl_mask_free(struct fl_flow_mask *mask, bool mask_init_done) { - WARN_ON(!list_empty(&mask->filters)); - rhashtable_destroy(&mask->ht); + /* temporary masks don't have their filters list and ht initialized */ + if (mask_init_done) { + WARN_ON(!list_empty(&mask->filters)); + rhashtable_destroy(&mask->ht); + } kfree(mask); } @@ -332,7 +335,15 @@ static void fl_mask_free_work(struct work_struct *work) struct fl_flow_mask *mask = container_of(to_rcu_work(work), struct fl_flow_mask, rwork); - fl_mask_free(mask); + fl_mask_free(mask, true); +} + +static void fl_uninit_mask_free_work(struct work_struct *work) +{ + struct fl_flow_mask *mask = container_of(to_rcu_work(work), + struct fl_flow_mask, rwork); + + fl_mask_free(mask, false); } static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask) @@ -1346,9 +1357,6 @@ static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head, if (err) goto errout_destroy; - /* Wait until any potential concurrent users of mask are finished */ - synchronize_rcu(); - spin_lock(&head->masks_lock); list_add_tail_rcu(&newmask->list, &head->masks); spin_unlock(&head->masks_lock); @@ -1375,11 +1383,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head, /* Insert mask as temporary node to prevent concurrent creation of mask * with same key. Any concurrent lookups with same key will return - * -EAGAIN because mask's refcnt is zero. It is safe to insert - * stack-allocated 'mask' to masks hash table because we call - * synchronize_rcu() before returning from this function (either in case - * of error or after replacing it with heap-allocated mask in - * fl_create_new_mask()). + * -EAGAIN because mask's refcnt is zero. */ fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht, &mask->ht_node, @@ -1414,8 +1418,6 @@ static int fl_check_assign_mask(struct cls_fl_head *head, errout_cleanup: rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params); - /* Wait until any potential concurrent users of mask are finished */ - synchronize_rcu(); return ret; } @@ -1644,7 +1646,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, *arg = fnew; kfree(tb); - kfree(mask); + tcf_queue_work(&mask->rwork, fl_uninit_mask_free_work); return 0; errout_ht: @@ -1664,7 +1666,7 @@ errout: errout_tb: kfree(tb); errout_mask_alloc: - kfree(mask); + tcf_queue_work(&mask->rwork, fl_uninit_mask_free_work); errout_fold: if (fold) __fl_put(fold); From 9a33629ba6b26caebd73e3c581ba1e6068c696a7 Mon Sep 17 00:00:00 2001 From: Haiyang Zhang Date: Thu, 13 Jun 2019 21:06:53 +0000 Subject: [PATCH 123/145] hv_netvsc: Set probe mode to sync For better consistency of synthetic NIC names, we set the probe mode to PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus channel offer sequence. Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers") Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller --- drivers/net/hyperv/netvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 03ea5a7ed3a4..afdcc5664ea6 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2407,7 +2407,7 @@ static struct hv_driver netvsc_drv = { .probe = netvsc_probe, .remove = netvsc_remove, .driver = { - .probe_type = PROBE_PREFER_ASYNCHRONOUS, + .probe_type = PROBE_FORCE_SYNCHRONOUS, }, }; From a8e11e5c5611a9f70470aebeb2c1dd6132f609d7 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:18 -0700 Subject: [PATCH 124/145] sysctl: define proc_do_static_key() Convert proc_dointvec_minmax_bpf_stats() into a more generic helper, since we are going to use jump labels more often. Note that sysctl_bpf_stats_enabled is removed, since it is no longer needed/used. Signed-off-by: Eric Dumazet Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- include/linux/bpf.h | 1 - include/linux/sysctl.h | 3 +++ kernel/bpf/core.c | 1 - kernel/sysctl.c | 44 ++++++++++++++++++++++-------------------- 4 files changed, 26 insertions(+), 23 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5df8e9e2a393..b92ef9f73e42 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -600,7 +600,6 @@ void bpf_map_area_free(void *base); void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); extern int sysctl_unprivileged_bpf_disabled; -extern int sysctl_bpf_stats_enabled; int bpf_map_new_fd(struct bpf_map *map, int flags); int bpf_prog_new_fd(struct bpf_prog *prog); diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b769ecfcc3bd..aadd310769d0 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -63,6 +63,9 @@ extern int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int, void __user *, size_t *, loff_t *); extern int proc_do_large_bitmap(struct ctl_table *, int, void __user *, size_t *, loff_t *); +extern int proc_do_static_key(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); /* * Register a set of sysctl names by calling register_sysctl_table diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7c473f208a10..080e2bb644cc 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2097,7 +2097,6 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key); EXPORT_SYMBOL(bpf_stats_enabled_key); -int sysctl_bpf_stats_enabled __read_mostly; /* All definitions of tracepoints related to BPF. */ #define CREATE_TRACE_POINTS diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 7d1008be6173..1beca96fb625 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -230,11 +230,6 @@ static int proc_dostring_coredump(struct ctl_table *table, int write, #endif static int proc_dopipe_max_size(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); -#ifdef CONFIG_BPF_SYSCALL -static int proc_dointvec_minmax_bpf_stats(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos); -#endif #ifdef CONFIG_MAGIC_SYSRQ /* Note: sysrq code uses its own private copy */ @@ -1253,12 +1248,10 @@ static struct ctl_table kern_table[] = { }, { .procname = "bpf_stats_enabled", - .data = &sysctl_bpf_stats_enabled, - .maxlen = sizeof(sysctl_bpf_stats_enabled), + .data = &bpf_stats_enabled_key.key, + .maxlen = sizeof(bpf_stats_enabled_key), .mode = 0644, - .proc_handler = proc_dointvec_minmax_bpf_stats, - .extra1 = &zero, - .extra2 = &one, + .proc_handler = proc_do_static_key, }, #endif #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) @@ -3374,26 +3367,35 @@ int proc_do_large_bitmap(struct ctl_table *table, int write, #endif /* CONFIG_PROC_SYSCTL */ -#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SYSCTL) -static int proc_dointvec_minmax_bpf_stats(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos) +#if defined(CONFIG_SYSCTL) +int proc_do_static_key(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) { - int ret, bpf_stats = *(int *)table->data; - struct ctl_table tmp = *table; + struct static_key *key = (struct static_key *)table->data; + static DEFINE_MUTEX(static_key_mutex); + int val, ret; + struct ctl_table tmp = { + .data = &val, + .maxlen = sizeof(val), + .mode = table->mode, + .extra1 = &zero, + .extra2 = &one, + }; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; - tmp.data = &bpf_stats; + mutex_lock(&static_key_mutex); + val = static_key_enabled(key); ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && !ret) { - *(int *)table->data = bpf_stats; - if (bpf_stats) - static_branch_enable(&bpf_stats_enabled_key); + if (val) + static_key_enable(key); else - static_branch_disable(&bpf_stats_enabled_key); + static_key_disable(key); } + mutex_unlock(&static_key_mutex); return ret; } #endif From ede61ca474a0348b975d9824565b66c7595461de Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:19 -0700 Subject: [PATCH 125/145] tcp: add tcp_rx_skb_cache sysctl Instead of relying on rps_needed, it is safer to use a separate static key, since we do not want to enable TCP rx_skb_cache by default. This feature can cause huge increase of memory usage on hosts with millions of sockets. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 8 ++++++++ include/net/sock.h | 6 ++---- net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 14fe93049d28..288aa264ac26 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -772,6 +772,14 @@ tcp_challenge_ack_limit - INTEGER in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) Default: 100 +tcp_rx_skb_cache - BOOLEAN + Controls a per TCP socket cache of one skb, that might help + performance of some workloads. This might be dangerous + on systems with a lot of TCP sockets, since it increases + memory usage. + + Default: 0 (disabled) + UDP variables: udp_l3mdev_accept - BOOLEAN diff --git a/include/net/sock.h b/include/net/sock.h index e9d769c04637..b02645e2dfad 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2433,13 +2433,11 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags) * This routine must be called with interrupts disabled or with the socket * locked so that the sk_buff queue operation is ok. */ +DECLARE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb) { __skb_unlink(skb, &sk->sk_receive_queue); - if ( -#ifdef CONFIG_RPS - !static_branch_unlikely(&rps_needed) && -#endif + if (static_branch_unlikely(&tcp_rx_skb_cache_key) && !sk->sk_rx_skb_cache) { sk->sk_rx_skb_cache = skb; skb_orphan(skb); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 875867b64d6a..886b58d31351 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -51,6 +51,9 @@ static int comp_sack_nr_max = 255; static u32 u32_max_div_HZ = UINT_MAX / HZ; static int one_day_secs = 24 * 3600; +DEFINE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); +EXPORT_SYMBOL(tcp_rx_skb_cache_key); + /* obsolete */ static int sysctl_tcp_low_latency __read_mostly; @@ -559,6 +562,12 @@ static struct ctl_table ipv4_table[] = { .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, + { + .procname = "tcp_rx_skb_cache", + .data = &tcp_rx_skb_cache_key.key, + .mode = 0644, + .proc_handler = proc_do_static_key, + }, { } }; From 0b7d7f6b22084a3156f267c85303908a8f4c9a08 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:20 -0700 Subject: [PATCH 126/145] tcp: add tcp_tx_skb_cache sysctl Feng Tang reported a performance regression after introduction of per TCP socket tx/rx caches, for TCP over loopback (netperf) There is high chance the regression is caused by a change on how well the 32 KB per-thread page (current->task_frag) can be recycled, and lack of pcp caches for order-3 pages. I could not reproduce the regression myself, cpus all being spinning on the mm spinlocks for page allocs/freeing, regardless of enabling or disabling the per tcp socket caches. It seems best to disable the feature by default, and let admins enabling it. MM layer either needs to provide scalable order-3 pages allocations, or could attempt a trylock on zone->lock if the caller only attempts to get a high-order page and is able to fallback to order-0 ones in case of pressure. Tests run on a 56 cores host (112 hyper threads) - 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath - 35.49% queued_spin_lock_slowpath - 18.18% get_page_from_freelist - __alloc_pages_nodemask - 18.18% alloc_pages_current skb_page_frag_refill sk_page_frag_refill tcp_sendmsg_locked tcp_sendmsg inet_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto do_syscall_64 entry_SYSCALL_64_after_hwframe __libc_send + 17.31% __free_pages_ok + 31.43% swapper [kernel.vmlinux] [k] intel_idle + 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 0.68% netperf [kernel.vmlinux] [k] skb_release_data + 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked 0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet Reported-by: Feng Tang Signed-off-by: David S. Miller --- include/net/sock.h | 4 +++- net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index b02645e2dfad..7d7f4ce63bb2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1463,12 +1463,14 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) __sk_mem_reclaim(sk, 1 << 20); } +DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { sock_set_flag(sk, SOCK_QUEUE_SHRUNK); sk->sk_wmem_queued -= skb->truesize; sk_mem_uncharge(sk, skb->truesize); - if (!sk->sk_tx_skb_cache && !skb_cloned(skb)) { + if (static_branch_unlikely(&tcp_tx_skb_cache_key) && + !sk->sk_tx_skb_cache && !skb_cloned(skb)) { skb_zcopy_clear(skb, true); sk->sk_tx_skb_cache = skb; return; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 886b58d31351..08a428a7b274 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -54,6 +54,8 @@ static int one_day_secs = 24 * 3600; DEFINE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); EXPORT_SYMBOL(tcp_rx_skb_cache_key); +DEFINE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); + /* obsolete */ static int sysctl_tcp_low_latency __read_mostly; @@ -568,6 +570,12 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_do_static_key, }, + { + .procname = "tcp_tx_skb_cache", + .data = &tcp_tx_skb_cache_key.key, + .mode = 0644, + .proc_handler = proc_do_static_key, + }, { } }; From ce27ec60648d8e066227cb2f58b1d3d4f7253d08 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2019 16:22:21 -0700 Subject: [PATCH 127/145] net: add high_order_alloc_disable sysctl/static key >From linux-3.7, (commit 5640f7685831 "net: use a per task frag allocator") TCP sendmsg() has preferred using order-3 allocations. While it gives good results for most cases, we had reports that heavy uses of TCP over loopback were hitting a spinlock contention in page allocations/freeing. This commits adds a sysctl so that admins can opt-in for order-0 allocations. Hopefully mm layer might optimize order-3 allocations in the future since it could give us a nice boost (see 8 lines of following benchmark) The following benchmark shows a win when more than 8 TCP_STREAM threads are running (56 x86 cores server in my tests) for thr in {1..30} do sysctl -wq net.core.high_order_alloc_disable=0 T0=`./super_netperf $thr -H 127.0.0.1 -l 15` sysctl -wq net.core.high_order_alloc_disable=1 T1=`./super_netperf $thr -H 127.0.0.1 -l 15` echo $thr:$T0:$T1 done 1: 49979: 37267 2: 98745: 76286 3: 141088: 110051 4: 177414: 144772 5: 197587: 173563 6: 215377: 208448 7: 241061: 234087 8: 267155: 263373 9: 295069: 297402 10: 312393: 335213 11: 340462: 368778 12: 371366: 403954 13: 412344: 443713 14: 426617: 473580 15: 474418: 507861 16: 503261: 538539 17: 522331: 563096 18: 532409: 567084 19: 550824: 605240 20: 525493: 641988 21: 564574: 665843 22: 567349: 690868 23: 583846: 710917 24: 588715: 736306 25: 603212: 763494 26: 604083: 792654 27: 602241: 796450 28: 604291: 797993 29: 611610: 833249 30: 577356: 841062 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/sock.h | 2 ++ net/core/sock.c | 4 +++- net/core/sysctl_net_core.c | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 7d7f4ce63bb2..6cbc16136357 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2534,6 +2534,8 @@ extern int sysctl_optmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; +DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); + static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ diff --git a/net/core/sock.c b/net/core/sock.c index 2b3701958486..7f49392579a5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2320,6 +2320,7 @@ static void sk_leave_memory_pressure(struct sock *sk) /* On 32bit arches, an skb frag is limited to 2^15 */ #define SKB_FRAG_PAGE_ORDER get_order(32768) +DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); /** * skb_page_frag_refill - check that a page_frag contains enough room @@ -2344,7 +2345,8 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) } pfrag->offset = 0; - if (SKB_FRAG_PAGE_ORDER) { + if (SKB_FRAG_PAGE_ORDER && + !static_branch_unlikely(&net_high_order_alloc_disable_key)) { /* Avoid direct reclaim but allow kswapd to wake */ pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN | diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 1a2685694abd..f9204719aeee 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -562,6 +562,13 @@ static struct ctl_table net_core_table[] = { .extra1 = &zero, .extra2 = &two, }, + { + .procname = "high_order_alloc_disable", + .data = &net_high_order_alloc_disable_key.key, + .maxlen = sizeof(net_high_order_alloc_disable_key), + .mode = 0644, + .proc_handler = proc_do_static_key, + }, { } }; From ef7bfa84725d891bbdb88707ed55b2cbf94942bb Mon Sep 17 00:00:00 2001 From: Ioana Ciornei Date: Thu, 13 Jun 2019 09:37:51 +0300 Subject: [PATCH 128/145] net: phylink: set the autoneg state in phylink_phy_change The phy_state field of phylink should carry only valid information especially when this can be passed to the .mac_config callback. Update the an_enabled field with the autoneg state in the phylink_phy_change function. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Ioana Ciornei Signed-off-by: David S. Miller --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 4c0616ba314d..c638e13fbf81 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -635,6 +635,7 @@ static void phylink_phy_change(struct phy_device *phydev, bool up, pl->phy_state.pause |= MLO_PAUSE_ASYM; pl->phy_state.interface = phydev->interface; pl->phy_state.link = up; + pl->phy_state.an_enabled = phydev->autoneg; mutex_unlock(&pl->state_mutex); phylink_run_resolve(pl); From 760c80b70bed2cd01630e8595d1bbde910339f31 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 14 Jun 2019 00:25:20 +0200 Subject: [PATCH 129/145] net: dsa: rtl8366: Fix up VLAN filtering We get this regression when using RTL8366RB as part of a bridge with OpenWrt: WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291 switchdev_port_attr_set_now+0x80/0xa4 lan0: Commit of attribute (id=7) failed. (...) realtek-smi switch lan0: failed to initialize vlan filtering on this port This is because it is trying to disable VLAN filtering on VLAN0, as we have forgot to add 1 to the port number to get the right VLAN in rtl8366_vlan_filtering(): when we initialize the VLAN we associate VLAN1 with port 0, VLAN2 with port 1 etc, so we need to add 1 to the port offset. Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver") Signed-off-by: Linus Walleij Signed-off-by: David S. Miller --- drivers/net/dsa/rtl8366.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/rtl8366.c b/drivers/net/dsa/rtl8366.c index fe5d976b0b10..ca3d17e43ed8 100644 --- a/drivers/net/dsa/rtl8366.c +++ b/drivers/net/dsa/rtl8366.c @@ -307,7 +307,8 @@ int rtl8366_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering) struct rtl8366_vlan_4k vlan4k; int ret; - if (!smi->ops->is_vlan_valid(smi, port)) + /* Use VLAN nr port + 1 since VLAN0 is not valid */ + if (!smi->ops->is_vlan_valid(smi, port + 1)) return -EINVAL; dev_info(smi->dev, "%s filtering on port %d\n", @@ -318,12 +319,12 @@ int rtl8366_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering) * The hardware support filter ID (FID) 0..7, I have no clue how to * support this in the driver when the callback only says on/off. */ - ret = smi->ops->get_vlan_4k(smi, port, &vlan4k); + ret = smi->ops->get_vlan_4k(smi, port + 1, &vlan4k); if (ret) return ret; /* Just set the filter to FID 1 for now then */ - ret = rtl8366_set_vlan(smi, port, + ret = rtl8366_set_vlan(smi, port + 1, vlan4k.member, vlan4k.untag, 1); From 42f5cda5eaf4396a939ae9bb43bb8d1d09c1b15c Mon Sep 17 00:00:00 2001 From: Stephen Barber Date: Fri, 14 Jun 2019 23:42:37 -0700 Subject: [PATCH 130/145] vsock/virtio: set SOCK_DONE on peer shutdown Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has shut down and there is nothing left to read. This fixes the following bug: 1) Peer sends SHUTDOWN(RDWR). 2) Socket enters TCP_CLOSING but SOCK_DONE is not set. 3) read() returns -ENOTCONN until close() is called, then returns 0. Signed-off-by: Stephen Barber Signed-off-by: David S. Miller --- net/vmw_vsock/virtio_transport_common.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index f3f3d06cb6d8..e30f53728725 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -871,8 +871,10 @@ virtio_transport_recv_connected(struct sock *sk, if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) vsk->peer_shutdown |= SEND_SHUTDOWN; if (vsk->peer_shutdown == SHUTDOWN_MASK && - vsock_stream_has_data(vsk) <= 0) + vsock_stream_has_data(vsk) <= 0) { + sock_set_flag(sk, SOCK_DONE); sk->sk_state = TCP_CLOSING; + } if (le32_to_cpu(pkt->hdr.flags)) sk->sk_state_change(sk); break; From 85749218e3a6a1ab4fb3c698394f40f07b80be5e Mon Sep 17 00:00:00 2001 From: Arthur Fabre Date: Sat, 15 Jun 2019 14:36:27 -0700 Subject: [PATCH 131/145] bpf: Fix out of bounds memory access in bpf_sk_storage bpf_sk_storage maps use multiple spin locks to reduce contention. The number of locks to use is determined by the number of possible CPUs. With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used. When updating elements, the correct lock is determined with hash_ptr(). Calling hash_ptr() with 0 bits is undefined behavior, as it does: x >> (64 - bits) Using the value results in an out of bounds memory access. In my case, this manifested itself as a page fault when raw_spin_lock_bh() is called later, when running the self tests: ./tools/testing/selftests/bpf/test_verifier 773 775 [ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8 Force the minimum number of locks to two. Signed-off-by: Arthur Fabre Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- net/core/bpf_sk_storage.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index cc9597a87770..d1c4e1f3be2c 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -633,7 +633,8 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) return ERR_PTR(-ENOMEM); bpf_map_init_from_attr(&smap->map, attr); - smap->bucket_log = ilog2(roundup_pow_of_two(num_possible_cpus())); + /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ + smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(num_possible_cpus()))); nbuckets = 1U << smap->bucket_log; smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, GFP_USER | __GFP_NOWARN); From 9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99 Mon Sep 17 00:00:00 2001 From: Matt Mullins Date: Tue, 11 Jun 2019 14:53:04 -0700 Subject: [PATCH 132/145] bpf: fix nested bpf tracepoints with per-cpu data BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as they do not increment bpf_prog_active while executing. This enables three levels of nesting, to support - a kprobe or raw tp or perf event, - another one of the above that irq context happens to call, and - another one in nmi context (at most one of which may be a kprobe or perf event). Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Signed-off-by: Matt Mullins Acked-by: Andrii Nakryiko Acked-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- kernel/trace/bpf_trace.c | 100 ++++++++++++++++++++++++++++++++------- 1 file changed, 84 insertions(+), 16 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f92d6ad5e080..1c9a4745e596 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -410,8 +410,6 @@ static const struct bpf_func_proto bpf_perf_event_read_value_proto = { .arg4_type = ARG_CONST_SIZE, }; -static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd); - static __always_inline u64 __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, u64 flags, struct perf_sample_data *sd) @@ -442,24 +440,50 @@ __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, return perf_event_output(event, sd, regs); } +/* + * Support executing tracepoints in normal, irq, and nmi context that each call + * bpf_perf_event_output + */ +struct bpf_trace_sample_data { + struct perf_sample_data sds[3]; +}; + +static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_trace_sds); +static DEFINE_PER_CPU(int, bpf_trace_nest_level); BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, u64, flags, void *, data, u64, size) { - struct perf_sample_data *sd = this_cpu_ptr(&bpf_trace_sd); + struct bpf_trace_sample_data *sds = this_cpu_ptr(&bpf_trace_sds); + int nest_level = this_cpu_inc_return(bpf_trace_nest_level); struct perf_raw_record raw = { .frag = { .size = size, .data = data, }, }; + struct perf_sample_data *sd; + int err; - if (unlikely(flags & ~(BPF_F_INDEX_MASK))) - return -EINVAL; + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(sds->sds))) { + err = -EBUSY; + goto out; + } + + sd = &sds->sds[nest_level - 1]; + + if (unlikely(flags & ~(BPF_F_INDEX_MASK))) { + err = -EINVAL; + goto out; + } perf_sample_data_init(sd, 0, 0); sd->raw = &raw; - return __bpf_perf_event_output(regs, map, flags, sd); + err = __bpf_perf_event_output(regs, map, flags, sd); + +out: + this_cpu_dec(bpf_trace_nest_level); + return err; } static const struct bpf_func_proto bpf_perf_event_output_proto = { @@ -822,16 +846,48 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) /* * bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp * to avoid potential recursive reuse issue when/if tracepoints are added - * inside bpf_*_event_output, bpf_get_stackid and/or bpf_get_stack + * inside bpf_*_event_output, bpf_get_stackid and/or bpf_get_stack. + * + * Since raw tracepoints run despite bpf_prog_active, support concurrent usage + * in normal, irq, and nmi context. */ -static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs); +struct bpf_raw_tp_regs { + struct pt_regs regs[3]; +}; +static DEFINE_PER_CPU(struct bpf_raw_tp_regs, bpf_raw_tp_regs); +static DEFINE_PER_CPU(int, bpf_raw_tp_nest_level); +static struct pt_regs *get_bpf_raw_tp_regs(void) +{ + struct bpf_raw_tp_regs *tp_regs = this_cpu_ptr(&bpf_raw_tp_regs); + int nest_level = this_cpu_inc_return(bpf_raw_tp_nest_level); + + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(tp_regs->regs))) { + this_cpu_dec(bpf_raw_tp_nest_level); + return ERR_PTR(-EBUSY); + } + + return &tp_regs->regs[nest_level - 1]; +} + +static void put_bpf_raw_tp_regs(void) +{ + this_cpu_dec(bpf_raw_tp_nest_level); +} + BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args, struct bpf_map *, map, u64, flags, void *, data, u64, size) { - struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs); + struct pt_regs *regs = get_bpf_raw_tp_regs(); + int ret; + + if (IS_ERR(regs)) + return PTR_ERR(regs); perf_fetch_caller_regs(regs); - return ____bpf_perf_event_output(regs, map, flags, data, size); + ret = ____bpf_perf_event_output(regs, map, flags, data, size); + + put_bpf_raw_tp_regs(); + return ret; } static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = { @@ -848,12 +904,18 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = { BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args, struct bpf_map *, map, u64, flags) { - struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs); + struct pt_regs *regs = get_bpf_raw_tp_regs(); + int ret; + + if (IS_ERR(regs)) + return PTR_ERR(regs); perf_fetch_caller_regs(regs); /* similar to bpf_perf_event_output_tp, but pt_regs fetched differently */ - return bpf_get_stackid((unsigned long) regs, (unsigned long) map, - flags, 0, 0); + ret = bpf_get_stackid((unsigned long) regs, (unsigned long) map, + flags, 0, 0); + put_bpf_raw_tp_regs(); + return ret; } static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = { @@ -868,11 +930,17 @@ static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = { BPF_CALL_4(bpf_get_stack_raw_tp, struct bpf_raw_tracepoint_args *, args, void *, buf, u32, size, u64, flags) { - struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs); + struct pt_regs *regs = get_bpf_raw_tp_regs(); + int ret; + + if (IS_ERR(regs)) + return PTR_ERR(regs); perf_fetch_caller_regs(regs); - return bpf_get_stack((unsigned long) regs, (unsigned long) buf, - (unsigned long) size, flags, 0); + ret = bpf_get_stack((unsigned long) regs, (unsigned long) buf, + (unsigned long) size, flags, 0); + put_bpf_raw_tp_regs(); + return ret; } static const struct bpf_func_proto bpf_get_stack_proto_raw_tp = { From 5db2e7c7917f40236a021959893121c4e496f609 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 15 Jun 2019 18:10:30 -0700 Subject: [PATCH 133/145] Revert "net: phylink: set the autoneg state in phylink_phy_change" This reverts commit ef7bfa84725d891bbdb88707ed55b2cbf94942bb. Russell King espressed some strong opposition to this change, explaining that this is trying to make phylink behave outside of how it has been designed. Signed-off-by: David S. Miller --- drivers/net/phy/phylink.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index c638e13fbf81..4c0616ba314d 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -635,7 +635,6 @@ static void phylink_phy_change(struct phy_device *phydev, bool up, pl->phy_state.pause |= MLO_PAUSE_ASYM; pl->phy_state.interface = phydev->interface; pl->phy_state.link = up; - pl->phy_state.an_enabled = phydev->autoneg; mutex_unlock(&pl->state_mutex); phylink_run_resolve(pl); From 3b4929f65b0d8249f19a50245cd88ed1a2f78cff Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 17 May 2019 17:17:22 -0700 Subject: [PATCH 134/145] tcp: limit payload size of sacked skbs Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet Reported-by: Jonathan Looney Acked-by: Neal Cardwell Reviewed-by: Tyler Hicks Cc: Yuchung Cheng Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller --- include/linux/tcp.h | 4 ++++ include/net/tcp.h | 2 ++ net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 26 ++++++++++++++++++++------ net/ipv4/tcp_output.c | 6 +++--- 5 files changed, 30 insertions(+), 9 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 711361af9ce0..9a478a0cd3a2 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -484,4 +484,8 @@ static inline u16 tcp_mss_clamp(const struct tcp_sock *tp, u16 mss) return (user_mss && user_mss < mss) ? user_mss : mss; } + +int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, int pcount, + int shiftlen); + #endif /* _LINUX_TCP_H */ diff --git a/include/net/tcp.h b/include/net/tcp.h index ac2f53fbfa6b..582c0caa9811 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -51,6 +51,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_HEADER (128 + MAX_HEADER) #define MAX_TCP_OPTION_SPACE 40 +#define TCP_MIN_SND_MSS 48 +#define TCP_MIN_GSO_SIZE (TCP_MIN_SND_MSS - MAX_TCP_OPTION_SPACE) /* * Never offer a window over 32767 without using window scaling. Some diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f12d500ec85c..79666ef8c2e2 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3868,6 +3868,7 @@ void __init tcp_init(void) unsigned long limit; unsigned int i; + BUILD_BUG_ON(TCP_MIN_SND_MSS <= MAX_TCP_OPTION_SPACE); BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > FIELD_SIZEOF(struct sk_buff, cb)); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 38dfc308c0fb..d95ee40df6c2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1302,7 +1302,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *prev, TCP_SKB_CB(skb)->seq += shifted; tcp_skb_pcount_add(prev, pcount); - BUG_ON(tcp_skb_pcount(skb) < pcount); + WARN_ON_ONCE(tcp_skb_pcount(skb) < pcount); tcp_skb_pcount_add(skb, -pcount); /* When we're adding to gso_segs == 1, gso_size will be zero, @@ -1368,6 +1368,21 @@ static int skb_can_shift(const struct sk_buff *skb) return !skb_headlen(skb) && skb_is_nonlinear(skb); } +int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, + int pcount, int shiftlen) +{ + /* TCP min gso_size is 8 bytes (TCP_MIN_GSO_SIZE) + * Since TCP_SKB_CB(skb)->tcp_gso_segs is 16 bits, we need + * to make sure not storing more than 65535 * 8 bytes per skb, + * even if current MSS is bigger. + */ + if (unlikely(to->len + shiftlen >= 65535 * TCP_MIN_GSO_SIZE)) + return 0; + if (unlikely(tcp_skb_pcount(to) + pcount > 65535)) + return 0; + return skb_shift(to, from, shiftlen); +} + /* Try collapsing SACK blocks spanning across multiple skbs to a single * skb. */ @@ -1473,7 +1488,7 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb, if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una)) goto fallback; - if (!skb_shift(prev, skb, len)) + if (!tcp_skb_shift(prev, skb, pcount, len)) goto fallback; if (!tcp_shifted_skb(sk, prev, skb, state, pcount, len, mss, dup_sack)) goto out; @@ -1491,11 +1506,10 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb, goto out; len = skb->len; - if (skb_shift(prev, skb, len)) { - pcount += tcp_skb_pcount(skb); - tcp_shifted_skb(sk, prev, skb, state, tcp_skb_pcount(skb), + pcount = tcp_skb_pcount(skb); + if (tcp_skb_shift(prev, skb, pcount, len)) + tcp_shifted_skb(sk, prev, skb, state, pcount, len, mss, 0); - } out: return prev; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f429e856e263..b8e3bbb85211 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1454,8 +1454,8 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu) mss_now -= icsk->icsk_ext_hdr_len; /* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < 48) - mss_now = 48; + if (mss_now < TCP_MIN_SND_MSS) + mss_now = TCP_MIN_SND_MSS; return mss_now; } @@ -2747,7 +2747,7 @@ static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) if (next_skb_size <= skb_availroom(skb)) skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size), next_skb_size); - else if (!skb_shift(skb, next_skb, next_skb_size)) + else if (!tcp_skb_shift(skb, next_skb, 1, next_skb_size)) return false; } tcp_highest_sack_replace(sk, next_skb, skb); From f070ef2ac66716357066b683fb0baf55f8191a2e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 18 May 2019 05:12:05 -0700 Subject: [PATCH 135/145] tcp: tcp_fragment() should apply sane memory limits Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Signed-off-by: Eric Dumazet Reported-by: Jonathan Looney Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Reviewed-by: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller --- include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp_output.c | 5 +++++ 3 files changed, 7 insertions(+) diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 86dc24a96c90..fd42c1316d3d 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -283,6 +283,7 @@ enum LINUX_MIB_TCPACKCOMPRESSED, /* TCPAckCompressed */ LINUX_MIB_TCPZEROWINDOWDROP, /* TCPZeroWindowDrop */ LINUX_MIB_TCPRCVQDROP, /* TCPRcvQDrop */ + LINUX_MIB_TCPWQUEUETOOBIG, /* TCPWqueueTooBig */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 4370f4246e86..073273b751f8 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -287,6 +287,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPAckCompressed", LINUX_MIB_TCPACKCOMPRESSED), SNMP_MIB_ITEM("TCPZeroWindowDrop", LINUX_MIB_TCPZEROWINDOWDROP), SNMP_MIB_ITEM("TCPRcvQDrop", LINUX_MIB_TCPRCVQDROP), + SNMP_MIB_ITEM("TCPWqueueTooBig", LINUX_MIB_TCPWQUEUETOOBIG), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b8e3bbb85211..1bb1c46b4aba 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1296,6 +1296,11 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, if (nsize < 0) nsize = 0; + if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); + return -ENOMEM; + } + if (skb_unclone(skb, gfp)) return -ENOMEM; From 5f3e2bf008c2221478101ee72f5cb4654b9fc363 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 6 Jun 2019 09:15:31 -0700 Subject: [PATCH 136/145] tcp: add tcp_min_snd_mss sysctl Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet Suggested-by: Jonathan Looney Acked-by: Neal Cardwell Cc: Yuchung Cheng Cc: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 8 ++++++++ include/net/netns/ipv4.h | 1 + net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_output.c | 3 +-- 5 files changed, 22 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 288aa264ac26..22f6b8b1110a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -255,6 +255,14 @@ tcp_base_mss - INTEGER Path MTU discovery (MTU probing). If MTU probing is enabled, this is the initial MSS used by the connection. +tcp_min_snd_mss - INTEGER + TCP SYN and SYNACK messages usually advertise an ADVMSS option, + as described in RFC 1122 and RFC 6691. + If this ADVMSS option is smaller than tcp_min_snd_mss, + it is silently capped to tcp_min_snd_mss. + + Default : 48 (at least 8 bytes of payload per segment) + tcp_congestion_control - STRING Set the congestion control algorithm to be used for new connections. The algorithm "reno" is always available, but diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 7698460a3dd1..623cfbb7b8dc 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -117,6 +117,7 @@ struct netns_ipv4 { #endif int sysctl_tcp_mtu_probing; int sysctl_tcp_base_mss; + int sysctl_tcp_min_snd_mss; int sysctl_tcp_probe_threshold; u32 sysctl_tcp_probe_interval; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 08a428a7b274..9e5257251163 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -39,6 +39,8 @@ static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; +static int tcp_min_snd_mss_min = TCP_MIN_SND_MSS; +static int tcp_min_snd_mss_max = 65535; static int ip_privileged_port_min; static int ip_privileged_port_max = 65535; static int ip_ttl_min = 1; @@ -774,6 +776,15 @@ static struct ctl_table ipv4_net_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "tcp_min_snd_mss", + .data = &init_net.ipv4.sysctl_tcp_min_snd_mss, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &tcp_min_snd_mss_min, + .extra2 = &tcp_min_snd_mss_max, + }, { .procname = "tcp_probe_threshold", .data = &init_net.ipv4.sysctl_tcp_probe_threshold, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index bc86f9735f45..cfa81190a1b1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2628,6 +2628,7 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.sysctl_tcp_ecn_fallback = 1; net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS; + net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS; net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1bb1c46b4aba..00c01a01b547 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1459,8 +1459,7 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu) mss_now -= icsk->icsk_ext_hdr_len; /* Then reserve room for full set of TCP options and 8 bytes of data */ - if (mss_now < TCP_MIN_SND_MSS) - mss_now = TCP_MIN_SND_MSS; + mss_now = max(mss_now, sock_net(sk)->ipv4.sysctl_tcp_min_snd_mss); return mss_now; } From 967c05aee439e6e5d7d805e195b3a20ef5c433d6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 8 Jun 2019 10:22:49 -0700 Subject: [PATCH 137/145] tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() If mtu probing is enabled tcp_mtu_probing() could very well end up with a too small MSS. Use the new sysctl tcp_min_snd_mss to make sure MSS search is performed in an acceptable range. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet Reported-by: Jonathan Lemon Cc: Jonathan Looney Acked-by: Neal Cardwell Cc: Yuchung Cheng Cc: Tyler Hicks Cc: Bruce Curtis Signed-off-by: David S. Miller --- net/ipv4/tcp_timer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 5bad937ce779..c801cd37cc2a 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -155,6 +155,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1; mss = min(net->ipv4.sysctl_tcp_base_mss, mss); mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len); + mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss); icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); } tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); From 36b2f61a42c29add695f3bd192ce44d5c113c1eb Mon Sep 17 00:00:00 2001 From: Govindarajulu Varadarajan Date: Fri, 14 Jun 2019 06:13:54 -0700 Subject: [PATCH 138/145] net: handle 802.1P vlan 0 packets properly When stack receives pkt: [802.1P vlan 0][802.1AD vlan 100][IPv4], vlan_do_receive() returns false if it does not find vlan_dev. Later __netif_receive_skb_core() fails to find packet type handler for skb->protocol 801.1AD and drops the packet. 801.1P header with vlan id 0 should be handled as untagged packets. This patch fixes it by checking if vlan_id is 0 and processes next vlan header. Signed-off-by: Govindarajulu Varadarajan Signed-off-by: David S. Miller --- net/core/dev.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index eb7fb6daa1ef..d6edd218babd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4923,8 +4923,36 @@ skip_classify: } if (unlikely(skb_vlan_tag_present(skb))) { - if (skb_vlan_tag_get_id(skb)) +check_vlan_id: + if (skb_vlan_tag_get_id(skb)) { + /* Vlan id is non 0 and vlan_do_receive() above couldn't + * find vlan device. + */ skb->pkt_type = PACKET_OTHERHOST; + } else if (skb->protocol == cpu_to_be16(ETH_P_8021Q) || + skb->protocol == cpu_to_be16(ETH_P_8021AD)) { + /* Outer header is 802.1P with vlan 0, inner header is + * 802.1Q or 802.1AD and vlan_do_receive() above could + * not find vlan dev for vlan id 0. + */ + __vlan_hwaccel_clear_tag(skb); + skb = skb_vlan_untag(skb); + if (unlikely(!skb)) + goto out; + if (vlan_do_receive(&skb)) + /* After stripping off 802.1P header with vlan 0 + * vlan dev is found for inner header. + */ + goto another_round; + else if (unlikely(!skb)) + goto out; + else + /* We have stripped outer 802.1P vlan 0 header. + * But could not find vlan dev. + * check again for vlan id to set OTHERHOST. + */ + goto check_vlan_id; + } /* Note: we might in the future use prio bits * and set skb->priority like in vlan_do_receive() * For the time being, just ignore Priority Code Point From 718f4a2537089ea41903bf357071306163bc7c04 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Fri, 14 Jun 2019 17:48:36 +0200 Subject: [PATCH 139/145] be2net: Fix number of Rx queues used for flow hashing Number of Rx queues used for flow hashing returned by the driver is incorrect and this bug prevents user to use the last Rx queue in indirection table. Let's say we have a NIC with 6 combined queues: [root@sm-03 ~]# ethtool -l enp4s0f0 Channel parameters for enp4s0f0: Pre-set maximums: RX: 5 TX: 5 Other: 0 Combined: 6 Current hardware settings: RX: 0 TX: 0 Other: 0 Combined: 6 Default indirection table maps all (6) queues equally but the driver reports only 5 rings available. [root@sm-03 ~]# ethtool -x enp4s0f0 RX flow hash indirection table for enp4s0f0 with 5 RX ring(s): 0: 0 1 2 3 4 5 0 1 8: 2 3 4 5 0 1 2 3 16: 4 5 0 1 2 3 4 5 24: 0 1 2 3 4 5 0 1 ... Now change indirection table somehow: [root@sm-03 ~]# ethtool -X enp4s0f0 weight 1 1 [root@sm-03 ~]# ethtool -x enp4s0f0 RX flow hash indirection table for enp4s0f0 with 6 RX ring(s): 0: 0 0 0 0 0 0 0 0 ... 64: 1 1 1 1 1 1 1 1 ... Now it is not possible to change mapping back to equal (default) state: [root@sm-03 ~]# ethtool -X enp4s0f0 equal 6 Cannot set RX flow hash configuration: Invalid argument Fixes: 594ad54a2c3b ("be2net: Add support for setting and getting rx flow hash options") Reported-by: Tianhao Signed-off-by: Ivan Vecera Signed-off-by: David S. Miller --- drivers/net/ethernet/emulex/benet/be_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c index 4c218341c51b..6e635debc7fd 100644 --- a/drivers/net/ethernet/emulex/benet/be_ethtool.c +++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c @@ -1105,7 +1105,7 @@ static int be_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, cmd->data = be_get_rss_hash_opts(adapter, cmd->flow_type); break; case ETHTOOL_GRXRINGS: - cmd->data = adapter->num_rx_qs - 1; + cmd->data = adapter->num_rx_qs; break; default: return -EINVAL; From d424a2afd7da136a98e9485bfd6c5d5506bd77f8 Mon Sep 17 00:00:00 2001 From: Dexuan Cui Date: Sat, 15 Jun 2019 05:00:57 +0000 Subject: [PATCH 140/145] hv_sock: Suppress bogus "may be used uninitialized" warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit gcc 8.2.0 may report these bogus warnings under some condition: warning: ‘vnew’ may be used uninitialized in this function warning: ‘hvs_new’ may be used uninitialized in this function Actually, the 2 pointers are only initialized and used if the variable "conn_from_host" is true. The code is not buggy here. Signed-off-by: Dexuan Cui Signed-off-by: David S. Miller --- net/vmw_vsock/hyperv_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 982a8dc49e03..e4801c7261b4 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -344,8 +344,8 @@ static void hvs_open_connection(struct vmbus_channel *chan) struct sockaddr_vm addr; struct sock *sk, *new = NULL; - struct vsock_sock *vnew; - struct hvsock *hvs, *hvs_new; + struct vsock_sock *vnew = NULL; + struct hvsock *hvs, *hvs_new = NULL; int ret; if_type = &chan->offermsg.offer.if_type; From 2e05fcae83c41eb2df10558338dc600dc783af47 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 13:19:55 -0700 Subject: [PATCH 141/145] tcp: fix compile error if !CONFIG_SYSCTL tcp_tx_skb_cache_key and tcp_rx_skb_cache_key must be available even if CONFIG_SYSCTL is not set. Fixes: 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl") Fixes: ede61ca474a0 ("tcp: add tcp_rx_skb_cache sysctl") Signed-off-by: Eric Dumazet Reported-by: Willem de Bruijn Signed-off-by: David S. Miller --- net/ipv4/sysctl_net_ipv4.c | 5 ----- net/ipv4/tcp.c | 5 +++++ 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 08a428a7b274..fa213bd8e233 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -51,11 +51,6 @@ static int comp_sack_nr_max = 255; static u32 u32_max_div_HZ = UINT_MAX / HZ; static int one_day_secs = 24 * 3600; -DEFINE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); -EXPORT_SYMBOL(tcp_rx_skb_cache_key); - -DEFINE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); - /* obsolete */ static int sysctl_tcp_low_latency __read_mostly; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f12d500ec85c..f448a288d158 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -317,6 +317,11 @@ struct tcp_splice_state { unsigned long tcp_memory_pressure __read_mostly; EXPORT_SYMBOL_GPL(tcp_memory_pressure); +DEFINE_STATIC_KEY_FALSE(tcp_rx_skb_cache_key); +EXPORT_SYMBOL(tcp_rx_skb_cache_key); + +DEFINE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); + void tcp_enter_memory_pressure(struct sock *sk) { unsigned long val; From f3e92cb8e2eb8c27d109e6fd73d3a69a8c09e288 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 16:28:48 -0700 Subject: [PATCH 142/145] neigh: fix use-after-free read in pneigh_get_next Nine years ago, I added RCU handling to neighbours, not pneighbours. (pneigh are not commonly used) Unfortunately I missed that /proc dump operations would use a common entry and exit point : neigh_seq_start() and neigh_seq_stop() We need to read_lock(tbl->lock) or risk use-after-free while iterating the pneigh structures. We might later convert pneigh to RCU and revert this patch. sysbot reported : BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825 CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240 seq_read+0x9cf/0x1110 fs/seq_file.c:258 proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221 do_loop_readv_writev fs/read_write.c:714 [inline] do_loop_readv_writev fs/read_write.c:701 [inline] do_iter_read+0x4a4/0x660 fs/read_write.c:935 vfs_readv+0xf0/0x160 fs/read_write.c:997 kernel_readv fs/splice.c:359 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:414 do_splice_to+0x127/0x180 fs/splice.c:877 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954 do_splice_direct+0x1da/0x2a0 fs/splice.c:1063 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4592c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4 R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff Allocated by task 9827: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731 arp_req_set_public net/ipv4/arp.c:1010 [inline] arp_req_set+0x613/0x720 net/ipv4/arp.c:1026 arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226 inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926 sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043 sock_ioctl+0x3ed/0x780 net/socket.c:1194 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9824: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline] __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356 neigh_ifdown+0x20/0x30 net/core/neighbour.c:372 arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274 inetdev_destroy net/ipv4/devinet.c:319 [inline] inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544 notifier_call_chain+0xc2/0x230 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178 rollback_registered+0x109/0x1d0 net/core/dev.c:8220 unregister_netdevice_queue net/core/dev.c:9267 [inline] unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260 unregister_netdevice include/linux/netdevice.h:2631 [inline] __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724 tun_detach drivers/net/tun.c:741 [inline] tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888097f2a700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888097f2a700, ffff888097f2a740) The buggy address belongs to the page: page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340 raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- net/core/neighbour.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 0e2c07355463..9e7fc929bc50 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -3203,6 +3203,7 @@ static void *neigh_get_idx_any(struct seq_file *seq, loff_t *pos) } void *neigh_seq_start(struct seq_file *seq, loff_t *pos, struct neigh_table *tbl, unsigned int neigh_seq_flags) + __acquires(tbl->lock) __acquires(rcu_bh) { struct neigh_seq_state *state = seq->private; @@ -3213,6 +3214,7 @@ void *neigh_seq_start(struct seq_file *seq, loff_t *pos, struct neigh_table *tbl rcu_read_lock_bh(); state->nht = rcu_dereference_bh(tbl->nht); + read_lock(&tbl->lock); return *pos ? neigh_get_idx_any(seq, pos) : SEQ_START_TOKEN; } @@ -3246,8 +3248,13 @@ out: EXPORT_SYMBOL(neigh_seq_next); void neigh_seq_stop(struct seq_file *seq, void *v) + __releases(tbl->lock) __releases(rcu_bh) { + struct neigh_seq_state *state = seq->private; + struct neigh_table *tbl = state->tbl; + + read_unlock(&tbl->lock); rcu_read_unlock_bh(); } EXPORT_SYMBOL(neigh_seq_stop); From d4d5d8e83c9616aeef28a2869cea49cc3fb35526 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 16:40:52 -0700 Subject: [PATCH 143/145] ax25: fix inconsistent lock state in ax25_destroy_timer Before thread in process context uses bh_lock_sock() we must disable bh. sysbot reported : WARNING: inconsistent lock state 5.2.0-rc3+ #32 Not tainted inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline] 00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221 __sys_connect+0x264/0x330 net/socket.c:1834 __do_sys_connect net/socket.c:1845 [inline] __se_sys_connect net/socket.c:1842 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1842 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 2272 hardirqs last enabled at (2272): [] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (2271): [] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (1522): [] __do_softirq+0x654/0x94c kernel/softirq.c:320 softirqs last disabled at (2267): [] invoke_softirq kernel/softirq.c:374 [inline] softirqs last disabled at (2267): [] irq_exit+0x180/0x1d0 kernel/softirq.c:414 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_AX25); lock(slock-AF_AX25); *** DEADLOCK *** 1 lock held by blkid/26581: #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline] #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312 stack backtrace: CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935 valid_state kernel/locking/lockdep.c:2948 [inline] mark_lock_irq kernel/locking/lockdep.c:3138 [inline] mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513 mark_irqflags kernel/locking/lockdep.c:3391 [inline] __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 call_timer_fn+0x193/0x720 kernel/time/timer.c:1322 expire_timers kernel/time/timer.c:1366 [inline] __run_timers kernel/time/timer.c:1685 [inline] __run_timers kernel/time/timer.c:1653 [inline] run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 RIP: 0033:0x7f858d5c3232 Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24 RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8 RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480 R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005 R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798 Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- net/ax25/ax25_route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ax25/ax25_route.c b/net/ax25/ax25_route.c index 09fdd0aac4b9..b40e0bce67ea 100644 --- a/net/ax25/ax25_route.c +++ b/net/ax25/ax25_route.c @@ -426,9 +426,11 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25_address *addr) } if (ax25->sk != NULL) { + local_bh_disable(); bh_lock_sock(ax25->sk); sock_reset_flag(ax25->sk, SOCK_ZAPPED); bh_unlock_sock(ax25->sk); + local_bh_enable(); } put: From 5cf02612b33f104fe1015b2dfaf1758ad3675588 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 16 Jun 2019 17:24:07 +0800 Subject: [PATCH 144/145] tipc: purge deferredq list for each grp member in tipc_group_delete Syzbot reported a memleak caused by grp members' deferredq list not purged when the grp is be deleted. The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in tipc_group_filter_msg() and the skb will stay in deferredq. So fix it by calling __skb_queue_purge for each member's deferredq in tipc_group_delete() when a tipc sk leaves the grp. Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast") Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com Signed-off-by: Xin Long Acked-by: Ying Xue Signed-off-by: David S. Miller --- net/tipc/group.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/group.c b/net/tipc/group.c index 992be6113676..5f98d38bcf08 100644 --- a/net/tipc/group.c +++ b/net/tipc/group.c @@ -218,6 +218,7 @@ void tipc_group_delete(struct net *net, struct tipc_group *grp) rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) { tipc_group_proto_xmit(grp, m, GRP_LEAVE_MSG, &xmitq); + __skb_queue_purge(&m->deferredq); list_del(&m->list); kfree(m); } From 6be8e297f9bcea666ea85ac7a6cd9d52d6deaf92 Mon Sep 17 00:00:00 2001 From: Jeremy Sowden Date: Sun, 16 Jun 2019 16:54:37 +0100 Subject: [PATCH 145/145] lapb: fixed leak of control-blocks. lapb_register calls lapb_create_cb, which initializes the control- block's ref-count to one, and __lapb_insert_cb, which increments it when adding the new block to the list of blocks. lapb_unregister calls __lapb_remove_cb, which decrements the ref-count when removing control-block from the list of blocks, and calls lapb_put itself to decrement the ref-count before returning. However, lapb_unregister also calls __lapb_devtostruct to look up the right control-block for the given net_device, and __lapb_devtostruct also bumps the ref-count, which means that when lapb_unregister returns the ref-count is still 1 and the control-block is leaked. Call lapb_put after __lapb_devtostruct to fix leak. Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com Signed-off-by: Jeremy Sowden Signed-off-by: David S. Miller --- net/lapb/lapb_iface.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/lapb/lapb_iface.c b/net/lapb/lapb_iface.c index 03f0cd872dce..5d2d1f746b91 100644 --- a/net/lapb/lapb_iface.c +++ b/net/lapb/lapb_iface.c @@ -177,6 +177,7 @@ int lapb_unregister(struct net_device *dev) lapb = __lapb_devtostruct(dev); if (!lapb) goto out; + lapb_put(lapb); lapb_stop_t1timer(lapb); lapb_stop_t2timer(lapb);