From 032d4a4802209fe25ce44deb2c002dccf663925f Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 19 Mar 2020 16:32:26 -0500 Subject: [PATCH 001/331] hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva Signed-off-by: Wei Liu --- drivers/hv/hyperv_vmbus.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h index f5fa3b3c9baf..70b30e223a57 100644 --- a/drivers/hv/hyperv_vmbus.h +++ b/drivers/hv/hyperv_vmbus.h @@ -292,7 +292,7 @@ struct vmbus_msginfo { struct list_head msglist_entry; /* The message itself */ - unsigned char msg[0]; + unsigned char msg[]; }; From 7d32e69310d67e6b04af04f26193f79dfc2f05c7 Mon Sep 17 00:00:00 2001 From: Slava Bacherikov Date: Thu, 2 Apr 2020 23:41:39 +0300 Subject: [PATCH 002/331] kbuild, btf: Fix dependencies for DEBUG_INFO_BTF Currently turning on DEBUG_INFO_SPLIT when DEBUG_INFO_BTF is also enabled will produce invalid btf file, since gen_btf function in link-vmlinux.sh script doesn't handle *.dwo files. Enabling DEBUG_INFO_REDUCED will also produce invalid btf file, and using GCC_PLUGIN_RANDSTRUCT with BTF makes no sense. Fixes: e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux") Reported-by: Jann Horn Reported-by: Liu Yiding Signed-off-by: Slava Bacherikov Signed-off-by: Daniel Borkmann Reviewed-by: Kees Cook Acked-by: KP Singh Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20200402204138.408021-1-slava@bacher09.org --- lib/Kconfig.debug | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a85a6a423bf4..4cb4671b1d9e 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -241,6 +241,8 @@ config DEBUG_INFO_DWARF4 config DEBUG_INFO_BTF bool "Generate BTF typeinfo" depends on DEBUG_INFO + depends on !DEBUG_INFO_SPLIT && !DEBUG_INFO_REDUCED + depends on !GCC_PLUGIN_RANDSTRUCT || COMPILE_TEST help Generate deduplicated BTF type information from DWARF debug info. Turning this on expects presence of pahole tool, which will convert From 250e778fe1635b237d9f52c5d9df202cf23413d6 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 31 Mar 2020 11:00:30 +0100 Subject: [PATCH 003/331] bpf: Fix spelling mistake "arithmatic" -> "arithmetic" in test_verifier There are a couple of spelling mistakes in two literal strings, fix them. Signed-off-by: Colin Ian King Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200331100030.41372-1-colin.king@canonical.com --- tools/testing/selftests/bpf/verifier/bounds.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/verifier/bounds.c b/tools/testing/selftests/bpf/verifier/bounds.c index 4d0d09574bf4..a253a064e6e0 100644 --- a/tools/testing/selftests/bpf/verifier/bounds.c +++ b/tools/testing/selftests/bpf/verifier/bounds.c @@ -501,7 +501,7 @@ .result = REJECT }, { - "bounds check mixed 32bit and 64bit arithmatic. test1", + "bounds check mixed 32bit and 64bit arithmetic. test1", .insns = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_MOV64_IMM(BPF_REG_1, -1), @@ -520,7 +520,7 @@ .result = ACCEPT }, { - "bounds check mixed 32bit and 64bit arithmatic. test2", + "bounds check mixed 32bit and 64bit arithmetic. test2", .insns = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_MOV64_IMM(BPF_REG_1, -1), From 93bbb2555b65e582be9daebb752e1b8e7380da20 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Tue, 31 Mar 2020 12:10:46 +0200 Subject: [PATCH 004/331] riscv, bpf: Remove BPF JIT for nommu builds MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The BPF JIT fails to build for kernels configured to !MMU. Without an MMU, the BPF JIT does not make much sense, therefore this patch disables the JIT for nommu builds. This was reported by the kbuild test robot: All errors (new ones prefixed by >>): arch/riscv/net/bpf_jit_comp64.c: In function 'bpf_jit_alloc_exec': >> arch/riscv/net/bpf_jit_comp64.c:1094:47: error: 'BPF_JIT_REGION_START' undeclared (first use in this function) 1094 | return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START, | ^~~~~~~~~~~~~~~~~~~~ arch/riscv/net/bpf_jit_comp64.c:1094:47: note: each undeclared identifier is reported only once for each function it appears in >> arch/riscv/net/bpf_jit_comp64.c:1095:9: error: 'BPF_JIT_REGION_END' undeclared (first use in this function) 1095 | BPF_JIT_REGION_END, GFP_KERNEL, | ^~~~~~~~~~~~~~~~~~ arch/riscv/net/bpf_jit_comp64.c:1098:1: warning: control reaches end of non-void function [-Wreturn-type] 1098 | } | ^ Reported-by: kbuild test robot Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann Acked-by: Luke Nelson Link: https://lore.kernel.org/bpf/20200331101046.23252-1-bjorn.topel@gmail.com --- arch/riscv/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 8672e77a5b7a..bd35ac72fe24 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -55,7 +55,7 @@ config RISCV select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MMIOWB select ARCH_HAS_DEBUG_VIRTUAL - select HAVE_EBPF_JIT + select HAVE_EBPF_JIT if MMU select EDAC_SUPPORT select ARCH_HAS_GIGANTIC_PAGE select ARCH_WANT_HUGE_PMD_SHARE if 64BIT From 7a1ca97269ee197ea967de2c9412d8e7e2274ee6 Mon Sep 17 00:00:00 2001 From: Jakub Sitnicki Date: Thu, 2 Apr 2020 14:55:24 +0200 Subject: [PATCH 005/331] net, sk_msg: Don't use RCU_INIT_POINTER on sk_user_data sparse reports an error due to use of RCU_INIT_POINTER helper to assign to sk_user_data pointer, which is not tagged with __rcu: net/core/sock.c:1875:25: error: incompatible types in comparison expression (different address spaces): net/core/sock.c:1875:25: void [noderef] * net/core/sock.c:1875:25: void * ... and rightfully so. sk_user_data is not always treated as a pointer to an RCU-protected data. When it is used to point at an RCU-protected object, we access it with __sk_user_data to inform sparse about it. In this case, when the child socket does not inherit sk_user_data from the parent, there is no reason to treat it as an RCU-protected pointer. Use a regular assignment to clear the pointer value. Fixes: f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged") Reported-by: kbuild test robot Signed-off-by: Jakub Sitnicki Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200402125524.851439-1-jakub@cloudflare.com --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index da32d9b6d09f..0510826bf860 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1872,7 +1872,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) * as not suitable for copying when cloning. */ if (sk_user_data_is_nocopy(newsk)) - RCU_INIT_POINTER(newsk->sk_user_data, NULL); + newsk->sk_user_data = NULL; newsk->sk_err = 0; newsk->sk_err_soft = 0; From 5222d69642a09261222fb9703761a029db16cadf Mon Sep 17 00:00:00 2001 From: KP Singh Date: Thu, 2 Apr 2020 22:07:51 +0200 Subject: [PATCH 006/331] bpf, lsm: Fix the file_mprotect LSM test. The test was previously using an mprotect on the heap memory allocated using malloc and was expecting the allocation to be always using sbrk(2). This is, however, not always true and in certain conditions malloc may end up using anonymous mmaps for heap alloctions. This means that the following condition that is used in the "lsm/file_mprotect" program is not sufficent to detect all mprotect calls done on heap memory: is_heap = (vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk); The test is updated to use an mprotect on memory allocated on the stack. While this would result in the splitting of the vma, this happens only after the security_file_mprotect hook. So, the condition used in the BPF program holds true. Fixes: 03e54f100d57 ("bpf: lsm: Add selftests for BPF_PROG_TYPE_LSM") Reported-by: Alexei Starovoitov Signed-off-by: KP Singh Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200402200751.26372-1-kpsingh@chromium.org --- .../selftests/bpf/prog_tests/test_lsm.c | 18 +++++++++--------- tools/testing/selftests/bpf/progs/lsm.c | 8 ++++---- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/test_lsm.c b/tools/testing/selftests/bpf/prog_tests/test_lsm.c index 1e4c258de09d..b17eb2045c1d 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_lsm.c +++ b/tools/testing/selftests/bpf/prog_tests/test_lsm.c @@ -15,7 +15,10 @@ char *CMD_ARGS[] = {"true", NULL}; -int heap_mprotect(void) +#define GET_PAGE_ADDR(ADDR, PAGE_SIZE) \ + (char *)(((unsigned long) (ADDR + PAGE_SIZE)) & ~(PAGE_SIZE-1)) + +int stack_mprotect(void) { void *buf; long sz; @@ -25,12 +28,9 @@ int heap_mprotect(void) if (sz < 0) return sz; - buf = memalign(sz, 2 * sz); - if (buf == NULL) - return -ENOMEM; - - ret = mprotect(buf, sz, PROT_READ | PROT_WRITE | PROT_EXEC); - free(buf); + buf = alloca(sz * 3); + ret = mprotect(GET_PAGE_ADDR(buf, sz), sz, + PROT_READ | PROT_WRITE | PROT_EXEC); return ret; } @@ -73,8 +73,8 @@ void test_test_lsm(void) skel->bss->monitored_pid = getpid(); - err = heap_mprotect(); - if (CHECK(errno != EPERM, "heap_mprotect", "want errno=EPERM, got %d\n", + err = stack_mprotect(); + if (CHECK(errno != EPERM, "stack_mprotect", "want err=EPERM, got %d\n", errno)) goto close_prog; diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selftests/bpf/progs/lsm.c index a4e3c223028d..b4598d4bc4f7 100644 --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -23,12 +23,12 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma, return ret; __u32 pid = bpf_get_current_pid_tgid() >> 32; - int is_heap = 0; + int is_stack = 0; - is_heap = (vma->vm_start >= vma->vm_mm->start_brk && - vma->vm_end <= vma->vm_mm->brk); + is_stack = (vma->vm_start <= vma->vm_mm->start_stack && + vma->vm_end >= vma->vm_mm->start_stack); - if (is_heap && monitored_pid == pid) { + if (is_stack && monitored_pid == pid) { mprotect_count++; ret = -EPERM; } From 3e0d3776501a215358e3d91045b74f35a7eeaad9 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 3 Apr 2020 16:28:45 +0800 Subject: [PATCH 007/331] hv_debugfs: Make hv_debug_root static Fix sparse warning: drivers/hv/hv_debugfs.c:14:15: warning: symbol 'hv_debug_root' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: YueHaibing Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20200403082845.22740-1-yuehaibing@huawei.com Signed-off-by: Wei Liu --- drivers/hv/hv_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hv/hv_debugfs.c b/drivers/hv/hv_debugfs.c index 8a2878573582..ccf752b6659a 100644 --- a/drivers/hv/hv_debugfs.c +++ b/drivers/hv/hv_debugfs.c @@ -11,7 +11,7 @@ #include "hyperv_vmbus.h" -struct dentry *hv_debug_root; +static struct dentry *hv_debug_root; static int hv_debugfs_delay_get(void *data, u64 *val) { From bf37da98c51825c90432d340e135cced37a7460d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 12 Mar 2020 16:55:07 -0700 Subject: [PATCH 008/331] rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common() The rcu_nmi_enter_common() function can be invoked both in interrupt and NMI handlers. If it is invoked from process context (as opposed to userspace or idle context) on a nohz_full CPU, it might acquire the CPU's leaf rcu_node structure's ->lock. Because this lock is held only with interrupts disabled, this is safe from an interrupt handler, but doing so from an NMI handler can result in self-deadlock. This commit therefore adds "irq" to the "if" condition so as to only acquire the ->lock from irq handlers or process context, never from an NMI handler. Fixes: 5b14557b073c ("rcu: Avoid tick_dep_set_cpu() misordering") Reported-by: Thomas Gleixner Signed-off-by: Paul E. McKenney Reviewed-by: Joel Fernandes (Google) Cc: # 5.5.x --- kernel/rcu/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 550193a9ce76..2c17859233db 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -825,7 +825,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq) rcu_cleanup_after_idle(); incby = 1; - } else if (tick_nohz_full_cpu(rdp->cpu) && + } else if (irq && tick_nohz_full_cpu(rdp->cpu) && rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE && READ_ONCE(rdp->rcu_urgent_qs) && !READ_ONCE(rdp->rcu_forced_tick)) { From 72239f2795fab9a58633bd0399698ff7581534a3 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 1 Apr 2020 17:14:38 +0200 Subject: [PATCH 009/331] netfilter: nft_set_rbtree: Drop spurious condition for overlap detection on insertion Case a1. for overlap detection in __nft_rbtree_insert() is not a valid one: start-after-start is not needed to detect any type of interval overlap and it actually results in a false positive if, while descending the tree, this is the only step we hit after starting from the root. This introduced a regression, as reported by Pablo, in Python tests cases ip/ip.t and ip/numgen.t: ip/ip.t: ERROR: line 124: add rule ip test-ip4 input ip hdrlength vmap { 0-4 : drop, 5 : accept, 6 : continue } counter: This rule should not have failed. ip/numgen.t: ERROR: line 7: add rule ip test-ip4 pre dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200}: This rule should not have failed. Drop case a1. and renumber others, so that they are a bit clearer. In order for these diagrams to be readily understandable, a bigger rework is probably needed, such as an ASCII art of the actual rbtree (instead of a flattened version). Shell script test sets/0044interval_overlap_0 should cover all possible cases for false negatives, so I consider that test case still sufficient after this change. v2: Fix comments for cases a3. and b3. Reported-by: Pablo Neira Ayuso Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_rbtree.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 3a5552e14f75..3ffef454d469 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -218,27 +218,26 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, /* Detect overlaps as we descend the tree. Set the flag in these cases: * - * a1. |__ _ _? >|__ _ _ (insert start after existing start) - * a2. _ _ __>| ?_ _ __| (insert end before existing end) - * a3. _ _ ___| ?_ _ _>| (insert end after existing end) - * a4. >|__ _ _ _ _ __| (insert start before existing end) + * a1. _ _ __>| ?_ _ __| (insert end before existing end) + * a2. _ _ ___| ?_ _ _>| (insert end after existing end) + * a3. _ _ ___? >|_ _ __| (insert start before existing end) * * and clear it later on, as we eventually reach the points indicated by * '?' above, in the cases described below. We'll always meet these * later, locally, due to tree ordering, and overlaps for the intervals * that are the closest together are always evaluated last. * - * b1. |__ _ _! >|__ _ _ (insert start after existing end) - * b2. _ _ __>| !_ _ __| (insert end before existing start) - * b3. !_____>| (insert end after existing start) + * b1. _ _ __>| !_ _ __| (insert end before existing start) + * b2. _ _ ___| !_ _ _>| (insert end after existing start) + * b3. _ _ ___! >|_ _ __| (insert start after existing end) * - * Case a4. resolves to b1.: + * Case a3. resolves to b3.: * - if the inserted start element is the leftmost, because the '0' * element in the tree serves as end element * - otherwise, if an existing end is found. Note that end elements are * always inserted after corresponding start elements. * - * For a new, rightmost pair of elements, we'll hit cases b1. and b3., + * For a new, rightmost pair of elements, we'll hit cases b3. and b2., * in that order. * * The flag is also cleared in two special cases: @@ -262,9 +261,9 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, p = &parent->rb_left; if (nft_rbtree_interval_start(new)) { - overlap = nft_rbtree_interval_start(rbe) && - nft_set_elem_active(&rbe->ext, - genmask); + if (nft_rbtree_interval_end(rbe) && + nft_set_elem_active(&rbe->ext, genmask)) + overlap = false; } else { overlap = nft_rbtree_interval_end(rbe) && nft_set_elem_active(&rbe->ext, From a26c1e49c8e97922edc8d7e23683384729d09f77 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 31 Mar 2020 23:02:59 +0200 Subject: [PATCH 010/331] netfilter: nf_tables: do not update stateful expressions if lookup is inverted Initialize set lookup matching element to NULL. Otherwise, the NFT_LOOKUP_F_INV flag reverses the matching logic and it leads to deference an uninitialized pointer to the matching element. Make sure element data area and stateful expression are accessed if there is a matching set element. This patch undoes 24791b9aa1ab ("netfilter: nft_set_bitmap: initialize set element extension in lookups") which is not required anymore. Fixes: 339706bc21c1 ("netfilter: nft_lookup: update element stateful expression") Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nft_lookup.c | 12 +++++++----- net/netfilter/nft_set_bitmap.c | 1 - 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 6eb627b3c99b..4ff7c81e6717 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -901,7 +901,7 @@ static inline void nft_set_elem_update_expr(const struct nft_set_ext *ext, { struct nft_expr *expr; - if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR)) { + if (__nft_set_ext_exists(ext, NFT_SET_EXT_EXPR)) { expr = nft_set_ext_expr(ext); expr->ops->eval(expr, regs, pkt); } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 1e70359d633c..f1363b8aabba 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -29,7 +29,7 @@ void nft_lookup_eval(const struct nft_expr *expr, { const struct nft_lookup *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; - const struct nft_set_ext *ext; + const struct nft_set_ext *ext = NULL; bool found; found = set->ops->lookup(nft_net(pkt), set, ®s->data[priv->sreg], @@ -39,11 +39,13 @@ void nft_lookup_eval(const struct nft_expr *expr, return; } - if (set->flags & NFT_SET_MAP) - nft_data_copy(®s->data[priv->dreg], - nft_set_ext_data(ext), set->dlen); + if (ext) { + if (set->flags & NFT_SET_MAP) + nft_data_copy(®s->data[priv->dreg], + nft_set_ext_data(ext), set->dlen); - nft_set_elem_update_expr(ext, regs, pkt); + nft_set_elem_update_expr(ext, regs, pkt); + } } static const struct nla_policy nft_lookup_policy[NFTA_LOOKUP_MAX + 1] = { diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 32f0fc8be3a4..2a81ea421819 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -81,7 +81,6 @@ static bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set, u32 idx, off; nft_bitmap_location(set, key, &idx, &off); - *ext = NULL; return nft_bitmap_active(priv->bitmap, idx, off, genmask); } From bc9fe6143de5df8fb36cf1532b48fecf35868571 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= Date: Tue, 31 Mar 2020 09:35:59 -0700 Subject: [PATCH 011/331] netfilter: xt_IDLETIMER: target v1 - match Android layout MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Android has long had an extension to IDLETIMER to send netlink messages to userspace, see: https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42 Note: this is idletimer target rev 1, there is no rev 0 in the Android common kernel sources, see registration at: https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483 When we compare that to upstream's new idletimer target rev 1: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46 We immediately notice that these two rev 1 structs are the same size and layout, and that while timer_type and send_nl_msg are differently named and serve a different purpose, they're at the same offset. This makes them impossible to tell apart - and thus one cannot know in a mixed Android/vanilla environment whether one means timer_type or send_nl_msg. Since this is iptables/netfilter uapi it introduces a problem between iptables (vanilla vs Android) userspace and kernel (vanilla vs Android) if the two don't match each other. Additionally when at some point in the future Android picks up 5.7+ it's not at all clear how to resolve the resulting merge conflict. Furthermore, since upgrading the kernel on old Android phones is pretty much impossible there does not seem to be an easy way out of this predicament. The only thing I've been able to come up with is some super disgusting kernel version >= 5.7 check in the iptables binary to flip between different struct layouts. By adding a dummy field to the vanilla Linux kernel header file we can force the two structs to be compatible with each other. Long term I think I would like to deprecate send_nl_msg out of Android entirely, but I haven't quite been able to figure out exactly how we depend on it. It seems to be very similar to sysfs notifications but with some extra info. Currently it's actually always enabled whenever Android uses the IDLETIMER target, so we could also probably entirely remove it from the uapi in favour of just always enabling it, but again we can't upgrade old kernels already in the field. (Also note that this doesn't change the structure's size, as it is simply fitting into the pre-existing padding, and that since 5.7 hasn't been released yet, there's still time to make this uapi visible change) Cc: Manoj Basapathi Cc: Subash Abhinov Kasiviswanathan Signed-off-by: Maciej Żenczykowski Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/xt_IDLETIMER.h | 1 + net/netfilter/xt_IDLETIMER.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/netfilter/xt_IDLETIMER.h b/include/uapi/linux/netfilter/xt_IDLETIMER.h index 434e6506abaa..49ddcdc61c09 100644 --- a/include/uapi/linux/netfilter/xt_IDLETIMER.h +++ b/include/uapi/linux/netfilter/xt_IDLETIMER.h @@ -48,6 +48,7 @@ struct idletimer_tg_info_v1 { char label[MAX_IDLETIMER_LABEL_SIZE]; + __u8 send_nl_msg; /* unused: for compatibility with Android */ __u8 timer_type; /* for kernel module internal use only */ diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c index 75bd0e5dd312..7b2f359bfce4 100644 --- a/net/netfilter/xt_IDLETIMER.c +++ b/net/netfilter/xt_IDLETIMER.c @@ -346,6 +346,9 @@ static int idletimer_tg_checkentry_v1(const struct xt_tgchk_param *par) pr_debug("checkentry targinfo%s\n", info->label); + if (info->send_nl_msg) + return -EOPNOTSUPP; + ret = idletimer_tg_helper((struct idletimer_tg_info *)info); if(ret < 0) { From 7fb6f78df7003234d8df4f90aeecc432d7d0c804 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 1 Apr 2020 10:37:16 -0700 Subject: [PATCH 012/331] netfilter: nf_tables: do not leave dangling pointer in nf_tables_set_alloc_name If nf_tables_set_alloc_name() frees set->name, we better clear set->name to avoid a future use-after-free or invalid-free. BUG: KASAN: double-free or invalid-free in nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148 CPU: 0 PID: 28233 Comm: syz-executor.0 Not tainted 5.6.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:374 kasan_report_invalid_free+0x61/0xa0 mm/kasan/report.c:468 __kasan_slab_free+0x129/0x140 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45c849 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe5ca21dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fe5ca21e6d4 RCX: 000000000045c849 RDX: 0000000000000000 RSI: 0000000020000c40 RDI: 0000000000000003 RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000095b R14: 00000000004cc0e9 R15: 000000000076bf0c Allocated by task 28233: save_stack+0x1b/0x80 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:515 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:488 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc_track_caller+0x159/0x790 mm/slab.c:3671 kvasprintf+0xb5/0x150 lib/kasprintf.c:25 kasprintf+0xbb/0xf0 lib/kasprintf.c:59 nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3536 [inline] nf_tables_newset+0x1543/0x2560 net/netfilter/nf_tables_api.c:4088 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 28233: save_stack+0x1b/0x80 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:476 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3544 [inline] nf_tables_newset+0x1f73/0x2560 net/netfilter/nf_tables_api.c:4088 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a6032d00 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 0 bytes inside of 32-byte region [ffff8880a6032d00, ffff8880a6032d20) The buggy address belongs to the page: page:ffffea0002980c80 refcount:1 mapcount:0 mapping:ffff8880aa0001c0 index:0xffff8880a6032fc1 flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0002a3be88 ffffea00029b1908 ffff8880aa0001c0 raw: ffff8880a6032fc1 ffff8880a6032000 000000010000003e 0000000000000000 page dumped because: kasan: bad access detected Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d0ab5ffa1e2c..f91e96d8de05 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3542,6 +3542,7 @@ cont: continue; if (!strcmp(set->name, i->name)) { kfree(set->name); + set->name = NULL; return -ENFILE; } } From b135fc0801b671c50de103572b819bcd41603613 Mon Sep 17 00:00:00 2001 From: Amol Grover Date: Sun, 16 Feb 2020 22:56:54 +0530 Subject: [PATCH 013/331] netfilter: ipset: Pass lockdep expression to RCU lists ip_set_type_list is traversed using list_for_each_entry_rcu outside an RCU read-side critical section but under the protection of ip_set_type_mutex. Hence, add corresponding lockdep expression to silence false-positive warnings, and harden RCU lists. Signed-off-by: Amol Grover Signed-off-by: Pablo Neira Ayuso --- net/netfilter/ipset/ip_set_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 8dd17589217d..340cb955af25 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -86,7 +86,8 @@ find_set_type(const char *name, u8 family, u8 revision) { struct ip_set_type *type; - list_for_each_entry_rcu(type, &ip_set_type_list, list) + list_for_each_entry_rcu(type, &ip_set_type_list, list, + lockdep_is_held(&ip_set_type_mutex)) if (STRNCMP(type->name, name) && (type->family == family || type->family == NFPROTO_UNSPEC) && From 5bf8e6096c7390f8f2c4d5394b5e49823adb004e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= Date: Fri, 27 Mar 2020 14:03:07 +0100 Subject: [PATCH 014/331] brcmfmac: add stub for monitor interface xmit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to the struct net_device_ops documentation .ndo_start_xmit is "Required; cannot be NULL.". Missing it may crash kernel easily: [ 341.216709] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 341.224836] pgd = 26088755 [ 341.227544] [00000000] *pgd=00000000 [ 341.231135] Internal error: Oops: 80000007 [#1] SMP ARM [ 341.236367] Modules linked in: pppoe ppp_async iptable_nat brcmfmac xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQU [ 341.304689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.24 #0 [ 341.310621] Hardware name: BCM5301X [ 341.314116] PC is at 0x0 [ 341.316664] LR is at dev_hard_start_xmit+0x8c/0x11c [ 341.321546] pc : [<00000000>] lr : [] psr: 60000113 [ 341.327821] sp : c0801c30 ip : c610cf00 fp : c08048e4 [ 341.333051] r10: c073a63a r9 : c08044dc r8 : c6c04e00 [ 341.338283] r7 : 00000000 r6 : c60f5000 r5 : 00000000 r4 : c6a9c3c0 [ 341.344820] r3 : 00000000 r2 : bf25a13c r1 : c60f5000 r0 : c6a9c3c0 [ 341.351358] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 341.358504] Control: 10c5387d Table: 0611c04a DAC: 00000051 [ 341.364257] Process swapper/0 (pid: 0, stack limit = 0xc68ed0ca) [ 341.370271] Stack: (0xc0801c30 to 0xc0802000) [ 341.374633] 1c20: c6e7d480 c0802d00 c60f5050 c0801c6c [ 341.382825] 1c40: c60f5000 c6a9c3c0 c6f90000 c6f9005c c6c04e00 c60f5000 00000000 c6f9005c [ 341.391015] 1c60: 00000000 c04a033c 00f90200 00000010 c6a9c3c0 c6a9c3c0 c6f90000 00000000 [ 341.399205] 1c80: 00000000 00000000 00000000 c046a7ac c6f9005c 00000001 fffffff4 00000000 [ 341.407395] 1ca0: c6f90200 00000000 c60f5000 c0479550 00000000 c6f90200 c6a9c3c0 16000000 [ 341.415586] 1cc0: 0000001c 6f4ad52f c6197040 b6df9387 36000000 c0520404 c073a80c c6a9c3c0 [ 341.423777] 1ce0: 00000000 c6d643c0 c6a9c3c0 c0800024 00000001 00000001 c6d643c8 c6a9c3c0 [ 341.431967] 1d00: c081b9c0 c7abca80 c610c840 c081b9c0 0000001c 00400000 c6bc5e6c c0522fb4 [ 341.440157] 1d20: c6d64400 00000004 c6bc5e0a 00000000 c60f5000 c7abca80 c081b9c0 c0522f54 [ 341.448348] 1d40: c6a9c3c0 c7abca80 c0803e48 c0549c94 c610c828 0000000a c0801d74 00000003 [ 341.456538] 1d60: c6ec8f0a 00000000 c60f5000 c7abca80 c081b9c0 c0548520 0000000a 00000000 [ 341.464728] 1d80: 00000000 003a0000 00000000 00000000 00000000 00000000 00000000 00000000 [ 341.472919] 1da0: 000002ff 00000000 00000000 16000000 00000000 00000000 00000000 00000000 [ 341.481110] 1dc0: 00000000 0000008f 00000000 00000000 00000000 2d132a69 c6bc5e40 00000000 [ 341.489300] 1de0: c6bc5e40 c6a9c3c0 00000000 c6ec8e50 00000001 c054b070 00000001 00000000 [ 341.497490] 1e00: c0807200 c6bc5e00 00000000 ffffe000 00000100 c054aea4 00000000 00000000 [ 341.505681] 1e20: 00000122 00400000 c0802d00 c0172e80 6f56a70e ffffffff 6f56a70e c7eb9cc0 [ 341.513871] 1e40: c7eb82c0 00000000 c0801e60 c017309c 00000000 00000000 07780000 c07382c0 [ 341.522061] 1e60: 00000000 c7eb9cc0 c0739cc0 c0803f74 c0801e70 c0801e70 c0801ea4 c013d380 [ 341.530253] 1e80: 00000000 000000a0 00000001 c0802084 c0802080 40000001 ffffe000 00000100 [ 341.538443] 1ea0: c0802080 c01021e8 c8803100 10c5387d 00000000 c07341f0 c0739880 0000000a [ 341.546633] 1ec0: c0734180 00001017 c0802d00 c062aa98 00200002 c062aa60 c8803100 c073984c [ 341.554823] 1ee0: 00000000 00000001 00000000 c7810000 c8803100 10c5387d 00000000 c011c188 [ 341.563014] 1f00: c073984c c015f0f8 c0804244 c0815ae4 c880210c c8802100 c0801f40 c037c584 [ 341.571204] 1f20: c01035f8 60000013 ffffffff c0801f74 c080afd4 c0800000 10c5387d c0101a8c [ 341.579395] 1f40: 00000000 004ac9dc c7eba4b4 c010ee60 ffffe000 c0803e68 c0803ea8 00000001 [ 341.587587] 1f60: c080afd4 c062ca20 10c5387d 00000000 00000000 c0801f90 c01035f4 c01035f8 [ 341.595776] 1f80: 60000013 ffffffff 00000051 00000000 ffffe000 c013ff50 000000ce c0803e40 [ 341.603967] 1fa0: c082216c 00000000 00000001 c072ba38 10c5387d c0140214 c0822184 c0700df8 [ 341.612157] 1fc0: ffffffff ffffffff 00000000 c070058c c072ba38 2d162e71 00000000 c0700330 [ 341.620348] 1fe0: 00000051 10c0387d 000000ff 00a521d0 413fc090 00000000 00000000 00000000 [ 341.628558] [] (dev_hard_start_xmit) from [] (sch_direct_xmit+0xe4/0x2bc) [ 341.637106] [] (sch_direct_xmit) from [] (__dev_queue_xmit+0x6a4/0x72c) [ 341.645481] [] (__dev_queue_xmit) from [] (ip6_finish_output2+0x18c/0x434) [ 341.654112] [] (ip6_finish_output2) from [] (ip6_output+0x5c/0xd0) [ 341.662053] [] (ip6_output) from [] (mld_sendpack+0x1a0/0x1a8) [ 341.669640] [] (mld_sendpack) from [] (mld_ifc_timer_expire+0x1cc/0x2e4) [ 341.678111] [] (mld_ifc_timer_expire) from [] (call_timer_fn.constprop.3+0x24/0x98) [ 341.687527] [] (call_timer_fn.constprop.3) from [] (run_timer_softirq+0x1a8/0x1e4) [ 341.696860] [] (run_timer_softirq) from [] (__do_softirq+0x120/0x2b0) [ 341.705066] [] (__do_softirq) from [] (irq_exit+0x78/0x84) [ 341.712317] [] (irq_exit) from [] (__handle_domain_irq+0x60/0xb4) [ 341.720179] [] (__handle_domain_irq) from [] (gic_handle_irq+0x4c/0x90) [ 341.728549] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90) Fixes: 20f2c5fa3af0 ("brcmfmac: add initial support for monitor mode") Signed-off-by: Rafał Miłecki Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200327130307.26477-1-zajec5@gmail.com --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c index 23627c953a5e..436f501be937 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c @@ -729,9 +729,18 @@ static int brcmf_net_mon_stop(struct net_device *ndev) return err; } +static netdev_tx_t brcmf_net_mon_start_xmit(struct sk_buff *skb, + struct net_device *ndev) +{ + dev_kfree_skb_any(skb); + + return NETDEV_TX_OK; +} + static const struct net_device_ops brcmf_netdev_ops_mon = { .ndo_open = brcmf_net_mon_open, .ndo_stop = brcmf_net_mon_stop, + .ndo_start_xmit = brcmf_net_mon_start_xmit, }; int brcmf_net_mon_attach(struct brcmf_if *ifp) From c9be1a642a7b9ec021e3f32e084dc781b3e5216d Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 3 Apr 2020 16:34:14 +0800 Subject: [PATCH 015/331] ath11k: fix compiler warnings without CONFIG_THERMAL drivers/net/wireless/ath/ath11k/thermal.h:45:1: warning: no return statement in function returning non-void [-Wreturn-type] drivers/net/wireless/ath/ath11k/core.c:416:28: error: passing argument 1 of 'ath11k_thermal_unregister' from incompatible pointer type [-Werror=incompatible-pointer-types] Add missing return 0 in ath11k_thermal_set_throttling, and fix ath11k_thermal_unregister param type. Fixes: 2a63bbca06b2 ("ath11k: add thermal cooling device support") Signed-off-by: YueHaibing Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200403083414.31392-1-yuehaibing@huawei.com --- drivers/net/wireless/ath/ath11k/thermal.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath11k/thermal.h b/drivers/net/wireless/ath/ath11k/thermal.h index 459b8d49c184..f9af55f3682d 100644 --- a/drivers/net/wireless/ath/ath11k/thermal.h +++ b/drivers/net/wireless/ath/ath11k/thermal.h @@ -36,12 +36,13 @@ static inline int ath11k_thermal_register(struct ath11k_base *sc) return 0; } -static inline void ath11k_thermal_unregister(struct ath11k *ar) +static inline void ath11k_thermal_unregister(struct ath11k_base *sc) { } static inline int ath11k_thermal_set_throttling(struct ath11k *ar, u32 throttle_state) { + return 0; } static inline void ath11k_thermal_event_temperature(struct ath11k *ar, From db5c97f02373917efe2c218ebf8e3d8b19e343b6 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 2 Apr 2020 15:52:10 +0800 Subject: [PATCH 016/331] xsk: Fix out of boundary write in __xsk_rcv_memcpy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit first_len is the remainder of the first page we're copying. If this size is larger, then out of page boundary write will otherwise happen. Fixes: c05cd3645814 ("xsk: add support to allow unaligned chunk placement") Signed-off-by: Li RongQing Signed-off-by: Daniel Borkmann Acked-by: Jonathan Lemon Acked-by: Björn Töpel Link: https://lore.kernel.org/bpf/1585813930-19712-1-git-send-email-lirongqing@baidu.com --- net/xdp/xsk.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 356f90e4522b..c350108aa38d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -131,8 +131,9 @@ static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf, u64 page_start = addr & ~(PAGE_SIZE - 1); u64 first_len = PAGE_SIZE - (addr - page_start); - memcpy(to_buf, from_buf, first_len + metalen); - memcpy(next_pg_addr, from_buf + first_len, len - first_len); + memcpy(to_buf, from_buf, first_len); + memcpy(next_pg_addr, from_buf + first_len, + len + metalen - first_len); return; } From 4734b0fefbbf98f8c119eb8344efa19dac82cd2c Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Sat, 4 Apr 2020 01:14:30 -0400 Subject: [PATCH 017/331] libbpf: Initialize *nl_pid so gcc 10 is happy Builds of Fedora's kernel-tools package started to fail with "may be used uninitialized" warnings for nl_pid in bpf_set_link_xdp_fd() and bpf_get_link_xdp_info() on the s390 architecture. Although libbpf_netlink_open() always returns a negative number when it does not set *nl_pid, the compiler does not determine this and thus believes the variable might be used uninitialized. Assuage gcc's fears by explicitly initializing nl_pid. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1807781 Signed-off-by: Jeremy Cline Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20200404051430.698058-1-jcline@redhat.com --- tools/lib/bpf/netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 18b5319025e1..9a14694176de 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -142,7 +142,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, struct ifinfomsg ifinfo; char attrbuf[64]; } req; - __u32 nl_pid; + __u32 nl_pid = 0; sock = libbpf_netlink_open(&nl_pid); if (sock < 0) @@ -288,7 +288,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, { struct xdp_id_md xdp_id = {}; int sock, ret; - __u32 nl_pid; + __u32 nl_pid = 0; __u32 mask; if (flags & ~XDP_FLAGS_MASK || !info_size) From 0ac16296ffc638f5163f9aeeeb1fe447268e449f Mon Sep 17 00:00:00 2001 From: Qiujun Huang Date: Fri, 3 Apr 2020 16:07:34 +0800 Subject: [PATCH 018/331] bpf: Fix a typo "inacitve" -> "inactive" There is a typo in struct bpf_lru_list's next_inactive_rotation description, thus fix s/inacitve/inactive/. Signed-off-by: Qiujun Huang Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/1585901254-30377-1-git-send-email-hqjagain@gmail.com --- kernel/bpf/bpf_lru_list.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/bpf_lru_list.h b/kernel/bpf/bpf_lru_list.h index f02504640e18..6b12f06ee18c 100644 --- a/kernel/bpf/bpf_lru_list.h +++ b/kernel/bpf/bpf_lru_list.h @@ -30,7 +30,7 @@ struct bpf_lru_node { struct bpf_lru_list { struct list_head lists[NR_BPF_LRU_LIST_T]; unsigned int counts[NR_BPF_LRU_LIST_COUNT]; - /* The next inacitve list rotation starts from here */ + /* The next inactive list rotation starts from here */ struct list_head *next_inactive_rotation; raw_spinlock_t lock ____cacheline_aligned_in_smp; From d9583cdf2f38d0f526d9a8c8564dd2e35e649bc7 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 7 Apr 2020 14:10:11 +0200 Subject: [PATCH 019/331] netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type EINVAL should be used for malformed netlink messages. New userspace utility and old kernels might easily result in EINVAL when exercising new set features, which is misleading. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f91e96d8de05..21cbde6ecee3 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3963,7 +3963,7 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, NFT_SET_INTERVAL | NFT_SET_TIMEOUT | NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) - return -EINVAL; + return -EOPNOTSUPP; /* Only one of these operations is supported */ if ((flags & (NFT_SET_MAP | NFT_SET_OBJECT)) == (NFT_SET_MAP | NFT_SET_OBJECT)) @@ -4001,7 +4001,7 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, objtype = ntohl(nla_get_be32(nla[NFTA_SET_OBJ_TYPE])); if (objtype == NFT_OBJECT_UNSPEC || objtype > NFT_OBJECT_MAX) - return -EINVAL; + return -EOPNOTSUPP; } else if (flags & NFT_SET_OBJECT) return -EINVAL; else From ef516e8625ddea90b3a0313f3a0b0baa83db7ac2 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 7 Apr 2020 14:10:38 +0200 Subject: [PATCH 020/331] netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag Stefano originally proposed to introduce this flag, users hit EOPNOTSUPP in new binaries with old kernels when defining a set with ranges in a concatenation. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reviewed-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 2 ++ net/netfilter/nf_tables_api.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 30f2a87270dc..4565456c0ef4 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -276,6 +276,7 @@ enum nft_rule_compat_attributes { * @NFT_SET_TIMEOUT: set uses timeouts * @NFT_SET_EVAL: set can be updated from the evaluation path * @NFT_SET_OBJECT: set contains stateful objects + * @NFT_SET_CONCAT: set contains a concatenation */ enum nft_set_flags { NFT_SET_ANONYMOUS = 0x1, @@ -285,6 +286,7 @@ enum nft_set_flags { NFT_SET_TIMEOUT = 0x10, NFT_SET_EVAL = 0x20, NFT_SET_OBJECT = 0x40, + NFT_SET_CONCAT = 0x80, }; /** diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 21cbde6ecee3..9adfbc7e8ae7 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3962,7 +3962,7 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, if (flags & ~(NFT_SET_ANONYMOUS | NFT_SET_CONSTANT | NFT_SET_INTERVAL | NFT_SET_TIMEOUT | NFT_SET_MAP | NFT_SET_EVAL | - NFT_SET_OBJECT)) + NFT_SET_OBJECT | NFT_SET_CONCAT)) return -EOPNOTSUPP; /* Only one of these operations is supported */ if ((flags & (NFT_SET_MAP | NFT_SET_OBJECT)) == From 489553dd13a88d8a882db10622ba8b9b58582ce4 Mon Sep 17 00:00:00 2001 From: Luke Nelson Date: Mon, 6 Apr 2020 22:16:04 +0000 Subject: [PATCH 021/331] riscv, bpf: Fix offset range checking for auipc+jalr on RV64 The existing code in emit_call on RV64 checks that the PC-relative offset to the function fits in 32 bits before calling emit_jump_and_link to emit an auipc+jalr pair. However, this check is incorrect because offsets in the range [2^31 - 2^11, 2^31 - 1] cannot be encoded using auipc+jalr on RV64 (see discussion [1]). The RISC-V spec has recently been updated to reflect this fact [2, 3]. This patch fixes the problem by moving the check on the offset into emit_jump_and_link and modifying it to the correct range of encodable offsets, which is [-2^31 - 2^11, 2^31 - 2^11). This also enforces the check on the offset to other uses of emit_jump_and_link (e.g., BPF_JA) as well. Currently, this bug is unlikely to be triggered, because the memory region from which JITed images are allocated is close enough to kernel text for the offsets to not become too large; and because the bounds on BPF program size are small enough. This patch prevents this problem from becoming an issue if either of these change. [1]: https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/bwWFhBnnZFQ [2]: https://github.com/riscv/riscv-isa-manual/commit/b1e42e09ac55116dbf9de5e4fb326a5a90e4a993 [3]: https://github.com/riscv/riscv-isa-manual/commit/4c1b2066ebd2965a422e41eb262d0a208a7fea07 Signed-off-by: Luke Nelson Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200406221604.18547-1-luke.r.nels@gmail.com --- arch/riscv/net/bpf_jit_comp64.c | 49 +++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index cc1985d8750a..d208a9fd6c52 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -110,6 +110,16 @@ static bool is_32b_int(s64 val) return -(1L << 31) <= val && val < (1L << 31); } +static bool in_auipc_jalr_range(s64 val) +{ + /* + * auipc+jalr can reach any signed PC-relative offset in the range + * [-2^31 - 2^11, 2^31 - 2^11). + */ + return (-(1L << 31) - (1L << 11)) <= val && + val < ((1L << 31) - (1L << 11)); +} + static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx) { /* Note that the immediate from the add is sign-extended, @@ -380,20 +390,24 @@ static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx) *rd = RV_REG_T2; } -static void emit_jump_and_link(u8 rd, s64 rvoff, bool force_jalr, - struct rv_jit_context *ctx) +static int emit_jump_and_link(u8 rd, s64 rvoff, bool force_jalr, + struct rv_jit_context *ctx) { s64 upper, lower; if (rvoff && is_21b_int(rvoff) && !force_jalr) { emit(rv_jal(rd, rvoff >> 1), ctx); - return; + return 0; + } else if (in_auipc_jalr_range(rvoff)) { + upper = (rvoff + (1 << 11)) >> 12; + lower = rvoff & 0xfff; + emit(rv_auipc(RV_REG_T1, upper), ctx); + emit(rv_jalr(rd, RV_REG_T1, lower), ctx); + return 0; } - upper = (rvoff + (1 << 11)) >> 12; - lower = rvoff & 0xfff; - emit(rv_auipc(RV_REG_T1, upper), ctx); - emit(rv_jalr(rd, RV_REG_T1, lower), ctx); + pr_err("bpf-jit: target offset 0x%llx is out of range\n", rvoff); + return -ERANGE; } static bool is_signed_bpf_cond(u8 cond) @@ -407,18 +421,16 @@ static int emit_call(bool fixed, u64 addr, struct rv_jit_context *ctx) s64 off = 0; u64 ip; u8 rd; + int ret; if (addr && ctx->insns) { ip = (u64)(long)(ctx->insns + ctx->ninsns); off = addr - ip; - if (!is_32b_int(off)) { - pr_err("bpf-jit: target call addr %pK is out of range\n", - (void *)addr); - return -ERANGE; - } } - emit_jump_and_link(RV_REG_RA, off, !fixed, ctx); + ret = emit_jump_and_link(RV_REG_RA, off, !fixed, ctx); + if (ret) + return ret; rd = bpf_to_rv_reg(BPF_REG_0, ctx); emit(rv_addi(rd, RV_REG_A0, 0), ctx); return 0; @@ -429,7 +441,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, { bool is64 = BPF_CLASS(insn->code) == BPF_ALU64 || BPF_CLASS(insn->code) == BPF_JMP; - int s, e, rvoff, i = insn - ctx->prog->insnsi; + int s, e, rvoff, ret, i = insn - ctx->prog->insnsi; struct bpf_prog_aux *aux = ctx->prog->aux; u8 rd = -1, rs = -1, code = insn->code; s16 off = insn->off; @@ -699,7 +711,9 @@ out_be: /* JUMP off */ case BPF_JMP | BPF_JA: rvoff = rv_offset(i, off, ctx); - emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + if (ret) + return ret; break; /* IF (dst COND src) JUMP off */ @@ -801,7 +815,6 @@ out_be: case BPF_JMP | BPF_CALL: { bool fixed; - int ret; u64 addr; mark_call(ctx); @@ -826,7 +839,9 @@ out_be: break; rvoff = epilogue_offset(ctx); - emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + if (ret) + return ret; break; /* dst = imm64 */ From f07cbad29741407ace2a9688548fa93d9cb38df3 Mon Sep 17 00:00:00 2001 From: Andrey Ignatov Date: Mon, 6 Apr 2020 22:09:45 -0700 Subject: [PATCH 022/331] libbpf: Fix bpf_get_link_xdp_id flags handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently if one of XDP_FLAGS_{DRV,HW,SKB}_MODE flags is passed to bpf_get_link_xdp_id() and there is a single XDP program attached to ifindex, that program's id will be returned by bpf_get_link_xdp_id() in prog_id argument no matter what mode the program is attached in, i.e. flags argument is not taken into account. For example, if there is a single program attached with XDP_FLAGS_SKB_MODE but user calls bpf_get_link_xdp_id() with flags = XDP_FLAGS_DRV_MODE, that skb program will be returned. Fix it by returning info->prog_id only if user didn't specify flags. If flags is specified then return corresponding mode-specific-field from struct xdp_link_info. The initial error was introduced in commit 50db9f073188 ("libbpf: Add a support for getting xdp prog id on ifindex") and then refactored in 473f4e133a12 so 473f4e133a12 is used in the Fixes tag. Fixes: 473f4e133a12 ("libbpf: Add bpf_get_link_xdp_info() function to get more XDP information") Signed-off-by: Andrey Ignatov Signed-off-by: Daniel Borkmann Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/0e9e30490b44b447bb2bebc69c7135e7fe7e4e40.1586236080.git.rdna@fb.com --- tools/lib/bpf/netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 9a14694176de..0b709fd10bba 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -321,7 +321,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags) { - if (info->attach_mode != XDP_ATTACHED_MULTI) + if (info->attach_mode != XDP_ATTACHED_MULTI && !flags) return info->prog_id; if (flags & XDP_FLAGS_DRV_MODE) return info->drv_prog_id; From eb203f4b89c1a1a779d9781e49b568d2a712abc6 Mon Sep 17 00:00:00 2001 From: Andrey Ignatov Date: Mon, 6 Apr 2020 22:09:46 -0700 Subject: [PATCH 023/331] selftests/bpf: Add test for bpf_get_link_xdp_id Add xdp_info selftest that makes sure that bpf_get_link_xdp_id returns valid prog_id for different input modes: * w/ and w/o flags when no program is attached; * w/ and w/o flags when one program is attached. Signed-off-by: Andrey Ignatov Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/2a9a6d1ce33b91ccc1aa3de6dba2d309f2062811.1586236080.git.rdna@fb.com --- .../selftests/bpf/prog_tests/xdp_info.c | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_info.c diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_info.c b/tools/testing/selftests/bpf/prog_tests/xdp_info.c new file mode 100644 index 000000000000..d2d7a283d72f --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_info.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +#define IFINDEX_LO 1 + +void test_xdp_info(void) +{ + __u32 len = sizeof(struct bpf_prog_info), duration = 0, prog_id; + const char *file = "./xdp_dummy.o"; + struct bpf_prog_info info = {}; + struct bpf_object *obj; + int err, prog_fd; + + /* Get prog_id for XDP_ATTACHED_NONE mode */ + + err = bpf_get_link_xdp_id(IFINDEX_LO, &prog_id, 0); + if (CHECK(err, "get_xdp_none", "errno=%d\n", errno)) + return; + if (CHECK(prog_id, "prog_id_none", "unexpected prog_id=%u\n", prog_id)) + return; + + err = bpf_get_link_xdp_id(IFINDEX_LO, &prog_id, XDP_FLAGS_SKB_MODE); + if (CHECK(err, "get_xdp_none_skb", "errno=%d\n", errno)) + return; + if (CHECK(prog_id, "prog_id_none_skb", "unexpected prog_id=%u\n", + prog_id)) + return; + + /* Setup prog */ + + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd); + if (CHECK_FAIL(err)) + return; + + err = bpf_obj_get_info_by_fd(prog_fd, &info, &len); + if (CHECK(err, "get_prog_info", "errno=%d\n", errno)) + goto out_close; + + err = bpf_set_link_xdp_fd(IFINDEX_LO, prog_fd, XDP_FLAGS_SKB_MODE); + if (CHECK(err, "set_xdp_skb", "errno=%d\n", errno)) + goto out_close; + + /* Get prog_id for single prog mode */ + + err = bpf_get_link_xdp_id(IFINDEX_LO, &prog_id, 0); + if (CHECK(err, "get_xdp", "errno=%d\n", errno)) + goto out; + if (CHECK(prog_id != info.id, "prog_id", "prog_id not available\n")) + goto out; + + err = bpf_get_link_xdp_id(IFINDEX_LO, &prog_id, XDP_FLAGS_SKB_MODE); + if (CHECK(err, "get_xdp_skb", "errno=%d\n", errno)) + goto out; + if (CHECK(prog_id != info.id, "prog_id_skb", "prog_id not available\n")) + goto out; + + err = bpf_get_link_xdp_id(IFINDEX_LO, &prog_id, XDP_FLAGS_DRV_MODE); + if (CHECK(err, "get_xdp_drv", "errno=%d\n", errno)) + goto out; + if (CHECK(prog_id, "prog_id_drv", "unexpected prog_id=%u\n", prog_id)) + goto out; + +out: + bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0); +out_close: + bpf_object__close(obj); +} From 045065f06f938d3171b3ffacb34453421a32c1e3 Mon Sep 17 00:00:00 2001 From: Lothar Rubusch Date: Tue, 7 Apr 2020 22:55:25 +0000 Subject: [PATCH 024/331] net: sock.h: fix skb_steal_sock() kernel-doc Fix warnings related to kernel-doc notation, and wording in function description. Signed-off-by: Lothar Rubusch Acked-by: Randy Dunlap Tested-by: Randy Dunlap Signed-off-by: David S. Miller --- include/net/sock.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 6d84784d33fa..3e8c6d4b4b59 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2553,9 +2553,9 @@ sk_is_refcounted(struct sock *sk) } /** - * skb_steal_sock - * @skb to steal the socket from - * @refcounted is set to true if the socket is reference-counted + * skb_steal_sock - steal a socket from an sk_buff + * @skb: sk_buff to steal the socket from + * @refcounted: is set to true if the socket is reference-counted */ static inline struct sock * skb_steal_sock(struct sk_buff *skb, bool *refcounted) From da722186f6549d752ea5b5fbc18111833c81a133 Mon Sep 17 00:00:00 2001 From: Martin Fuzzey Date: Thu, 2 Apr 2020 15:51:27 +0200 Subject: [PATCH 025/331] net: fec: set GPR bit on suspend by DT configuration. On some SoCs, such as the i.MX6, it is necessary to set a bit in the SoC level GPR register before suspending for wake on lan to work. The fec platform callback sleep_mode_enable was intended to allow this but the platform implementation was NAK'd back in 2015 [1] This means that, currently, wake on lan is broken on mainline for the i.MX6 at least. So implement the required bit setting in the fec driver by itself by adding a new optional DT property indicating the GPR register and adding the offset and bit information to the driver. [1] https://www.spinics.net/lists/netdev/msg310922.html Signed-off-by: Martin Fuzzey Signed-off-by: Fugang Duan Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/fec.h | 7 + drivers/net/ethernet/freescale/fec_main.c | 149 +++++++++++++++++----- 2 files changed, 127 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index bd898f5b4da5..e74dd1f86bba 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -488,6 +488,12 @@ struct fec_enet_priv_rx_q { struct sk_buff *rx_skbuff[RX_RING_SIZE]; }; +struct fec_stop_mode_gpr { + struct regmap *gpr; + u8 reg; + u8 bit; +}; + /* The FEC buffer descriptors track the ring buffers. The rx_bd_base and * tx_bd_base always point to the base of the buffer descriptors. The * cur_rx and cur_tx point to the currently available buffer. @@ -562,6 +568,7 @@ struct fec_enet_private { int hwts_tx_en; struct delayed_work time_keep; struct regulator *reg_phy; + struct fec_stop_mode_gpr stop_gpr; unsigned int tx_align; unsigned int rx_align; diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index c1c267b61647..dc6f8763a5d4 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -62,6 +62,8 @@ #include #include #include +#include +#include #include #include @@ -84,6 +86,56 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_OPD_V 0xFFF0 #define FEC_MDIO_PM_TIMEOUT 100 /* ms */ +struct fec_devinfo { + u32 quirks; + u8 stop_gpr_reg; + u8 stop_gpr_bit; +}; + +static const struct fec_devinfo fec_imx25_info = { + .quirks = FEC_QUIRK_USE_GASKET | FEC_QUIRK_MIB_CLEAR | + FEC_QUIRK_HAS_FRREG, +}; + +static const struct fec_devinfo fec_imx27_info = { + .quirks = FEC_QUIRK_MIB_CLEAR | FEC_QUIRK_HAS_FRREG, +}; + +static const struct fec_devinfo fec_imx28_info = { + .quirks = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME | + FEC_QUIRK_SINGLE_MDIO | FEC_QUIRK_HAS_RACC | + FEC_QUIRK_HAS_FRREG, +}; + +static const struct fec_devinfo fec_imx6q_info = { + .quirks = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | + FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | + FEC_QUIRK_HAS_VLAN | FEC_QUIRK_ERR006358 | + FEC_QUIRK_HAS_RACC, + .stop_gpr_reg = 0x34, + .stop_gpr_bit = 27, +}; + +static const struct fec_devinfo fec_mvf600_info = { + .quirks = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_RACC, +}; + +static const struct fec_devinfo fec_imx6x_info = { + .quirks = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | + FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | + FEC_QUIRK_HAS_VLAN | FEC_QUIRK_HAS_AVB | + FEC_QUIRK_ERR007885 | FEC_QUIRK_BUG_CAPTURE | + FEC_QUIRK_HAS_RACC | FEC_QUIRK_HAS_COALESCE, +}; + +static const struct fec_devinfo fec_imx6ul_info = { + .quirks = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | + FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | + FEC_QUIRK_HAS_VLAN | FEC_QUIRK_ERR007885 | + FEC_QUIRK_BUG_CAPTURE | FEC_QUIRK_HAS_RACC | + FEC_QUIRK_HAS_COALESCE, +}; + static struct platform_device_id fec_devtype[] = { { /* keep it for coldfire */ @@ -91,39 +143,25 @@ static struct platform_device_id fec_devtype[] = { .driver_data = 0, }, { .name = "imx25-fec", - .driver_data = FEC_QUIRK_USE_GASKET | FEC_QUIRK_MIB_CLEAR | - FEC_QUIRK_HAS_FRREG, + .driver_data = (kernel_ulong_t)&fec_imx25_info, }, { .name = "imx27-fec", - .driver_data = FEC_QUIRK_MIB_CLEAR | FEC_QUIRK_HAS_FRREG, + .driver_data = (kernel_ulong_t)&fec_imx27_info, }, { .name = "imx28-fec", - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME | - FEC_QUIRK_SINGLE_MDIO | FEC_QUIRK_HAS_RACC | - FEC_QUIRK_HAS_FRREG, + .driver_data = (kernel_ulong_t)&fec_imx28_info, }, { .name = "imx6q-fec", - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | - FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | - FEC_QUIRK_HAS_VLAN | FEC_QUIRK_ERR006358 | - FEC_QUIRK_HAS_RACC, + .driver_data = (kernel_ulong_t)&fec_imx6q_info, }, { .name = "mvf600-fec", - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_RACC, + .driver_data = (kernel_ulong_t)&fec_mvf600_info, }, { .name = "imx6sx-fec", - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | - FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | - FEC_QUIRK_HAS_VLAN | FEC_QUIRK_HAS_AVB | - FEC_QUIRK_ERR007885 | FEC_QUIRK_BUG_CAPTURE | - FEC_QUIRK_HAS_RACC | FEC_QUIRK_HAS_COALESCE, + .driver_data = (kernel_ulong_t)&fec_imx6x_info, }, { .name = "imx6ul-fec", - .driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT | - FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM | - FEC_QUIRK_HAS_VLAN | FEC_QUIRK_ERR007885 | - FEC_QUIRK_BUG_CAPTURE | FEC_QUIRK_HAS_RACC | - FEC_QUIRK_HAS_COALESCE, + .driver_data = (kernel_ulong_t)&fec_imx6ul_info, }, { /* sentinel */ } @@ -1092,11 +1130,28 @@ fec_restart(struct net_device *ndev) } +static void fec_enet_stop_mode(struct fec_enet_private *fep, bool enabled) +{ + struct fec_platform_data *pdata = fep->pdev->dev.platform_data; + struct fec_stop_mode_gpr *stop_gpr = &fep->stop_gpr; + + if (stop_gpr->gpr) { + if (enabled) + regmap_update_bits(stop_gpr->gpr, stop_gpr->reg, + BIT(stop_gpr->bit), + BIT(stop_gpr->bit)); + else + regmap_update_bits(stop_gpr->gpr, stop_gpr->reg, + BIT(stop_gpr->bit), 0); + } else if (pdata && pdata->sleep_mode_enable) { + pdata->sleep_mode_enable(enabled); + } +} + static void fec_stop(struct net_device *ndev) { struct fec_enet_private *fep = netdev_priv(ndev); - struct fec_platform_data *pdata = fep->pdev->dev.platform_data; u32 rmii_mode = readl(fep->hwp + FEC_R_CNTRL) & (1 << 8); u32 val; @@ -1125,9 +1180,7 @@ fec_stop(struct net_device *ndev) val = readl(fep->hwp + FEC_ECNTRL); val |= (FEC_ECR_MAGICEN | FEC_ECR_SLEEP); writel(val, fep->hwp + FEC_ECNTRL); - - if (pdata && pdata->sleep_mode_enable) - pdata->sleep_mode_enable(true); + fec_enet_stop_mode(fep, true); } writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED); @@ -3398,6 +3451,37 @@ static int fec_enet_get_irq_cnt(struct platform_device *pdev) return irq_cnt; } +static int fec_enet_init_stop_mode(struct fec_enet_private *fep, + struct fec_devinfo *dev_info, + struct device_node *np) +{ + struct device_node *gpr_np; + int ret = 0; + + if (!dev_info) + return 0; + + gpr_np = of_parse_phandle(np, "gpr", 0); + if (!gpr_np) + return 0; + + fep->stop_gpr.gpr = syscon_node_to_regmap(gpr_np); + if (IS_ERR(fep->stop_gpr.gpr)) { + dev_err(&fep->pdev->dev, "could not find gpr regmap\n"); + ret = PTR_ERR(fep->stop_gpr.gpr); + fep->stop_gpr.gpr = NULL; + goto out; + } + + fep->stop_gpr.reg = dev_info->stop_gpr_reg; + fep->stop_gpr.bit = dev_info->stop_gpr_bit; + +out: + of_node_put(gpr_np); + + return ret; +} + static int fec_probe(struct platform_device *pdev) { @@ -3413,6 +3497,7 @@ fec_probe(struct platform_device *pdev) int num_rx_qs; char irq_name[8]; int irq_cnt; + struct fec_devinfo *dev_info; fec_enet_get_queue_num(pdev, &num_tx_qs, &num_rx_qs); @@ -3430,7 +3515,9 @@ fec_probe(struct platform_device *pdev) of_id = of_match_device(fec_dt_ids, &pdev->dev); if (of_id) pdev->id_entry = of_id->data; - fep->quirks = pdev->id_entry->driver_data; + dev_info = (struct fec_devinfo *)pdev->id_entry->driver_data; + if (dev_info) + fep->quirks = dev_info->quirks; fep->netdev = ndev; fep->num_rx_queues = num_rx_qs; @@ -3464,6 +3551,10 @@ fec_probe(struct platform_device *pdev) if (of_get_property(np, "fsl,magic-packet", NULL)) fep->wol_flag |= FEC_WOL_HAS_MAGIC_PACKET; + ret = fec_enet_init_stop_mode(fep, dev_info, np); + if (ret) + goto failed_stop_mode; + phy_node = of_parse_phandle(np, "phy-handle", 0); if (!phy_node && of_phy_is_fixed_link(np)) { ret = of_phy_register_fixed_link(np); @@ -3632,6 +3723,7 @@ failed_clk: if (of_phy_is_fixed_link(np)) of_phy_deregister_fixed_link(np); of_node_put(phy_node); +failed_stop_mode: failed_phy: dev_id--; failed_ioremap: @@ -3709,7 +3801,6 @@ static int __maybe_unused fec_resume(struct device *dev) { struct net_device *ndev = dev_get_drvdata(dev); struct fec_enet_private *fep = netdev_priv(ndev); - struct fec_platform_data *pdata = fep->pdev->dev.platform_data; int ret; int val; @@ -3727,8 +3818,8 @@ static int __maybe_unused fec_resume(struct device *dev) goto failed_clk; } if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) { - if (pdata && pdata->sleep_mode_enable) - pdata->sleep_mode_enable(false); + fec_enet_stop_mode(fep, false); + val = readl(fep->hwp + FEC_ECNTRL); val &= ~(FEC_ECR_MAGICEN | FEC_ECR_SLEEP); writel(val, fep->hwp + FEC_ECNTRL); From 4141f1a40fc0789f6fd4330e171e1edf155426aa Mon Sep 17 00:00:00 2001 From: Martin Fuzzey Date: Thu, 2 Apr 2020 15:51:28 +0200 Subject: [PATCH 026/331] ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN. In order to wake from suspend by ethernet magic packets the GPC must be used as intc does not have wakeup functionality. But the FEC DT node currently uses interrupt-extended, specificying intc, thus breaking WoL. This problem is probably fallout from the stacked domain conversion as intc used to chain to GPC. So replace "interrupts-extended" by "interrupts" to use the default parent which is GPC. Fixes: b923ff6af0d5 ("ARM: imx6: convert GPC to stacked domains") Signed-off-by: Martin Fuzzey Signed-off-by: David S. Miller --- arch/arm/boot/dts/imx6qdl.dtsi | 5 ++--- arch/arm/boot/dts/imx6qp.dtsi | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi index 47982889d774..0da4cc29effd 100644 --- a/arch/arm/boot/dts/imx6qdl.dtsi +++ b/arch/arm/boot/dts/imx6qdl.dtsi @@ -1039,9 +1039,8 @@ compatible = "fsl,imx6q-fec"; reg = <0x02188000 0x4000>; interrupt-names = "int0", "pps"; - interrupts-extended = - <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>, - <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <0 118 IRQ_TYPE_LEVEL_HIGH>, + <0 119 IRQ_TYPE_LEVEL_HIGH>; clocks = <&clks IMX6QDL_CLK_ENET>, <&clks IMX6QDL_CLK_ENET>, <&clks IMX6QDL_CLK_ENET_REF>; diff --git a/arch/arm/boot/dts/imx6qp.dtsi b/arch/arm/boot/dts/imx6qp.dtsi index 93b89dc1f53b..b310f13a53f2 100644 --- a/arch/arm/boot/dts/imx6qp.dtsi +++ b/arch/arm/boot/dts/imx6qp.dtsi @@ -77,7 +77,6 @@ }; &fec { - /delete-property/interrupts-extended; interrupts = <0 118 IRQ_TYPE_LEVEL_HIGH>, <0 119 IRQ_TYPE_LEVEL_HIGH>; }; From 70f268588a8c4e7596d9031afcbf2e78cf9c757d Mon Sep 17 00:00:00 2001 From: Martin Fuzzey Date: Thu, 2 Apr 2020 15:51:29 +0200 Subject: [PATCH 027/331] dt-bindings: fec: document the new gpr property. This property allows the gpr register bit to be defined for wake on lan support. Signed-off-by: Martin Fuzzey Reviewed-by: Fugang Duan Acked-by: Rob Herring Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/fsl-fec.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt index 5b88fae0307d..ff8b0f211aa1 100644 --- a/Documentation/devicetree/bindings/net/fsl-fec.txt +++ b/Documentation/devicetree/bindings/net/fsl-fec.txt @@ -22,6 +22,8 @@ Optional properties: - fsl,err006687-workaround-present: If present indicates that the system has the hardware workaround for ERR006687 applied and does not need a software workaround. +- gpr: phandle of SoC general purpose register mode. Required for wake on LAN + on some SoCs -interrupt-names: names of the interrupts listed in interrupts property in the same order. The defaults if not specified are __Number of interrupts__ __Default__ From be8ae92f5c25f0896969bbc049e9844f9dcd53f1 Mon Sep 17 00:00:00 2001 From: Martin Fuzzey Date: Thu, 2 Apr 2020 15:51:30 +0200 Subject: [PATCH 028/331] ARM: dts: imx6: add fec gpr property. This is required for wake on lan on i.MX6 Signed-off-by: Martin Fuzzey Reviewed-by: Fugang Duan Signed-off-by: David S. Miller --- arch/arm/boot/dts/imx6qdl.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi index 0da4cc29effd..98da446aa0f2 100644 --- a/arch/arm/boot/dts/imx6qdl.dtsi +++ b/arch/arm/boot/dts/imx6qdl.dtsi @@ -1045,6 +1045,7 @@ <&clks IMX6QDL_CLK_ENET>, <&clks IMX6QDL_CLK_ENET_REF>; clock-names = "ipg", "ahb", "ptp"; + gpr = <&gpr>; status = "disabled"; }; From b93cfb9cd3af3adc9ba4854f178d5300f7544d3e Mon Sep 17 00:00:00 2001 From: Tim Stallard Date: Fri, 3 Apr 2020 21:22:57 +0100 Subject: [PATCH 029/331] net: icmp6: do not select saddr from iif when route has prefsrc set Since commit fac6fce9bdb5 ("net: icmp6: provide input address for traceroute6") ICMPv6 errors have source addresses from the ingress interface. However, this overrides when source address selection is influenced by setting preferred source addresses on routes. This can result in ICMP errors being lost to upstream BCP38 filters when the wrong source addresses are used, breaking path MTU discovery and traceroute. This patch sets the modified source address selection to only take place when the route used has no prefsrc set. It can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8::3 dev v1 nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip addr add 2001:db8:100::1 dev lo ip netns exec test ip route add 2001:db8::1 dev v2 src 2001:db8:100::1 ip route add 2001:db8:1000::1 via 2001:db8::2 traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute6 -s 2001:db8::3 2001:db8:1000::1 ip netns delete test Output before: $ traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 0.843 ms !N 0.396 ms !N 0.257 ms !N $ traceroute6 -s 2001:db8::3 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 0.772 ms !N 0.257 ms !N 0.357 ms !N After: $ traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8:100::1 (2001:db8:100::1) 8.885 ms !N 0.310 ms !N 0.174 ms !N $ traceroute6 -s 2001:db8::3 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 1.403 ms !N 0.205 ms !N 0.313 ms !N Fixes: fac6fce9bdb5 ("net: icmp6: provide input address for traceroute6") Signed-off-by: Tim Stallard Signed-off-by: David S. Miller --- net/ipv6/icmp.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 2688f3e82165..fc5000370030 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -229,6 +229,25 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type, return res; } +static bool icmpv6_rt_has_prefsrc(struct sock *sk, u8 type, + struct flowi6 *fl6) +{ + struct net *net = sock_net(sk); + struct dst_entry *dst; + bool res = false; + + dst = ip6_route_output(net, sk, fl6); + if (!dst->error) { + struct rt6_info *rt = (struct rt6_info *)dst; + struct in6_addr prefsrc; + + rt6_get_prefsrc(rt, &prefsrc); + res = !ipv6_addr_any(&prefsrc); + } + dst_release(dst); + return res; +} + /* * an inline helper for the "simple" if statement below * checks if parameter problem report is caused by an @@ -527,7 +546,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, saddr = force_saddr; if (saddr) { fl6.saddr = *saddr; - } else { + } else if (!icmpv6_rt_has_prefsrc(sk, type, &fl6)) { /* select a more meaningful saddr from input if */ struct net_device *in_netdev; From 03e2a984b6165621f287fadf5f4b5cd8b58dcaba Mon Sep 17 00:00:00 2001 From: Tim Stallard Date: Fri, 3 Apr 2020 21:26:21 +0100 Subject: [PATCH 030/331] net: ipv6: do not consider routes via gateways for anycast address check The behaviour for what is considered an anycast address changed in commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception"). This now considers the first address in a subnet where there is a route via a gateway to be an anycast address. This breaks path MTU discovery and traceroutes when a host in a remote network uses the address at the start of a prefix (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors will not be sent to anycast addresses. This patch excludes any routes with a gateway, or via point to point links, like the behaviour previously from rt6_is_gw_or_nonexthop in net/ipv6/route.c. This can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8:100:: dev lo nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1 ip netns exec test sysctl net.ipv6.conf.all.forwarding=1 ip route add 2001:db8:1::1 via 2001:db8::2 ping -I 2001:db8::1 2001:db8:1::1 -c1 ping -I 2001:db8:100:: 2001:db8:1::1 -c1 ip addr delete 2001:db8:100:: dev lo ip netns delete test Currently the first ping will get back a destination unreachable ICMP error, but the second will never get a response, with "icmp6_send: acast source" logged. After this patch, both get destination unreachable ICMP replies. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Tim Stallard Signed-off-by: David S. Miller --- include/net/ip6_route.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index f7543c095b33..9947eb1e9eb6 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -254,6 +254,7 @@ static inline bool ipv6_anycast_destination(const struct dst_entry *dst, return rt->rt6i_flags & RTF_ANYCAST || (rt->rt6i_dst.plen < 127 && + !(rt->rt6i_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) && ipv6_addr_equal(&rt->rt6i_dst.addr, daddr)); } From 84d2f7b708c374a15a2abe092a74e0e47d018286 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20van=20Dorst?= Date: Mon, 6 Apr 2020 05:42:53 +0800 Subject: [PATCH 031/331] net: dsa: mt7530: move mt7623 settings out off the mt7530 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Moving mt7623 logic out off mt7530, is required to make hardware setting consistent after we introduce phylink to mtk driver. Fixes: ca366d6c889b ("net: dsa: mt7530: Convert to PHYLINK API") Reviewed-by: Sean Wang Tested-by: Sean Wang Signed-off-by: René van Dorst Tested-by: Frank Wunderlich Signed-off-by: David S. Miller --- drivers/net/dsa/mt7530.c | 85 ---------------------------------------- drivers/net/dsa/mt7530.h | 10 ----- 2 files changed, 95 deletions(-) diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 2d0d91db0ddb..84391c8a0e16 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -66,58 +66,6 @@ static const struct mt7530_mib_desc mt7530_mib[] = { MIB_DESC(1, 0xb8, "RxArlDrop"), }; -static int -mt7623_trgmii_write(struct mt7530_priv *priv, u32 reg, u32 val) -{ - int ret; - - ret = regmap_write(priv->ethernet, TRGMII_BASE(reg), val); - if (ret < 0) - dev_err(priv->dev, - "failed to priv write register\n"); - return ret; -} - -static u32 -mt7623_trgmii_read(struct mt7530_priv *priv, u32 reg) -{ - int ret; - u32 val; - - ret = regmap_read(priv->ethernet, TRGMII_BASE(reg), &val); - if (ret < 0) { - dev_err(priv->dev, - "failed to priv read register\n"); - return ret; - } - - return val; -} - -static void -mt7623_trgmii_rmw(struct mt7530_priv *priv, u32 reg, - u32 mask, u32 set) -{ - u32 val; - - val = mt7623_trgmii_read(priv, reg); - val &= ~mask; - val |= set; - mt7623_trgmii_write(priv, reg, val); -} - -static void -mt7623_trgmii_set(struct mt7530_priv *priv, u32 reg, u32 val) -{ - mt7623_trgmii_rmw(priv, reg, 0, val); -} - -static void -mt7623_trgmii_clear(struct mt7530_priv *priv, u32 reg, u32 val) -{ - mt7623_trgmii_rmw(priv, reg, val, 0); -} - static int core_read_mmd_indirect(struct mt7530_priv *priv, int prtad, int devad) { @@ -530,27 +478,6 @@ mt7530_pad_clk_setup(struct dsa_switch *ds, int mode) for (i = 0 ; i < NUM_TRGMII_CTRL; i++) mt7530_rmw(priv, MT7530_TRGMII_RD(i), RD_TAP_MASK, RD_TAP(16)); - else - if (priv->id != ID_MT7621) - mt7623_trgmii_set(priv, GSW_INTF_MODE, - INTF_MODE_TRGMII); - - return 0; -} - -static int -mt7623_pad_clk_setup(struct dsa_switch *ds) -{ - struct mt7530_priv *priv = ds->priv; - int i; - - for (i = 0 ; i < NUM_TRGMII_CTRL; i++) - mt7623_trgmii_write(priv, GSW_TRGMII_TD_ODT(i), - TD_DM_DRVP(8) | TD_DM_DRVN(8)); - - mt7623_trgmii_set(priv, GSW_TRGMII_RCK_CTRL, RX_RST | RXC_DQSISEL); - mt7623_trgmii_clear(priv, GSW_TRGMII_RCK_CTRL, RX_RST); - return 0; } @@ -1303,10 +1230,6 @@ mt7530_setup(struct dsa_switch *ds) dn = dsa_to_port(ds, MT7530_CPU_PORT)->master->dev.of_node->parent; if (priv->id == ID_MT7530) { - priv->ethernet = syscon_node_to_regmap(dn); - if (IS_ERR(priv->ethernet)) - return PTR_ERR(priv->ethernet); - regulator_set_voltage(priv->core_pwr, 1000000, 1000000); ret = regulator_enable(priv->core_pwr); if (ret < 0) { @@ -1468,14 +1391,6 @@ static void mt7530_phylink_mac_config(struct dsa_switch *ds, int port, /* Setup TX circuit incluing relevant PAD and driving */ mt7530_pad_clk_setup(ds, state->interface); - if (priv->id == ID_MT7530) { - /* Setup RX circuit, relevant PAD and driving on the - * host which must be placed after the setup on the - * device side is all finished. - */ - mt7623_pad_clk_setup(ds); - } - priv->p6_interface = state->interface; break; default: diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index ef9b52f3152b..4aef6024441b 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -277,7 +277,6 @@ enum mt7530_vlan_port_attr { /* Registers for TRGMII on the both side */ #define MT7530_TRGMII_RCK_CTRL 0x7a00 -#define GSW_TRGMII_RCK_CTRL 0x300 #define RX_RST BIT(31) #define RXC_DQSISEL BIT(30) #define DQSI1_TAP_MASK (0x7f << 8) @@ -286,31 +285,24 @@ enum mt7530_vlan_port_attr { #define DQSI0_TAP(x) ((x) & 0x7f) #define MT7530_TRGMII_RCK_RTT 0x7a04 -#define GSW_TRGMII_RCK_RTT 0x304 #define DQS1_GATE BIT(31) #define DQS0_GATE BIT(30) #define MT7530_TRGMII_RD(x) (0x7a10 + (x) * 8) -#define GSW_TRGMII_RD(x) (0x310 + (x) * 8) #define BSLIP_EN BIT(31) #define EDGE_CHK BIT(30) #define RD_TAP_MASK 0x7f #define RD_TAP(x) ((x) & 0x7f) -#define GSW_TRGMII_TXCTRL 0x340 #define MT7530_TRGMII_TXCTRL 0x7a40 #define TRAIN_TXEN BIT(31) #define TXC_INV BIT(30) #define TX_RST BIT(28) #define MT7530_TRGMII_TD_ODT(i) (0x7a54 + 8 * (i)) -#define GSW_TRGMII_TD_ODT(i) (0x354 + 8 * (i)) #define TD_DM_DRVP(x) ((x) & 0xf) #define TD_DM_DRVN(x) (((x) & 0xf) << 4) -#define GSW_INTF_MODE 0x390 -#define INTF_MODE_TRGMII BIT(1) - #define MT7530_TRGMII_TCK_CTRL 0x7a78 #define TCK_TAP(x) (((x) & 0xf) << 8) @@ -443,7 +435,6 @@ static const char *p5_intf_modes(unsigned int p5_interface) * @ds: The pointer to the dsa core structure * @bus: The bus used for the device and built-in PHY * @rstc: The pointer to reset control used by MCM - * @ethernet: The regmap used for access TRGMII-based registers * @core_pwr: The power supplied into the core * @io_pwr: The power supplied into the I/O * @reset: The descriptor for GPIO line tied to its reset pin @@ -460,7 +451,6 @@ struct mt7530_priv { struct dsa_switch *ds; struct mii_bus *bus; struct reset_control *rstc; - struct regmap *ethernet; struct regulator *core_pwr; struct regulator *io_pwr; struct gpio_desc *reset; From a5d75538295b06bc6ade1b9da07b9bee57d1c677 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20van=20Dorst?= Date: Mon, 6 Apr 2020 05:42:54 +0800 Subject: [PATCH 032/331] net: ethernet: mediatek: move mt7623 settings out off the mt7530 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Moving mt7623 logic out off mt7530, is required to make hardware setting consistent after we introduce phylink to mtk driver. Fixes: b8fc9f30821e ("net: ethernet: mediatek: Add basic PHYLINK support") Reviewed-by: Sean Wang Tested-by: Sean Wang Signed-off-by: René van Dorst Signed-off-by: David S. Miller --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 24 ++++++++++++++++++++- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 8 +++++++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 8d28f90acfe7..09047109d0da 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -65,6 +65,17 @@ u32 mtk_r32(struct mtk_eth *eth, unsigned reg) return __raw_readl(eth->base + reg); } +u32 mtk_m32(struct mtk_eth *eth, u32 mask, u32 set, unsigned reg) +{ + u32 val; + + val = mtk_r32(eth, reg); + val &= ~mask; + val |= set; + mtk_w32(eth, val, reg); + return reg; +} + static int mtk_mdio_busy_wait(struct mtk_eth *eth) { unsigned long t_start = jiffies; @@ -193,7 +204,7 @@ static void mtk_mac_config(struct phylink_config *config, unsigned int mode, struct mtk_mac *mac = container_of(config, struct mtk_mac, phylink_config); struct mtk_eth *eth = mac->hw; - u32 mcr_cur, mcr_new, sid; + u32 mcr_cur, mcr_new, sid, i; int val, ge_mode, err; /* MT76x8 has no hardware settings between for the MAC */ @@ -255,6 +266,17 @@ static void mtk_mac_config(struct phylink_config *config, unsigned int mode, PHY_INTERFACE_MODE_TRGMII) mtk_gmac0_rgmii_adjust(mac->hw, state->speed); + + /* mt7623_pad_clk_setup */ + for (i = 0 ; i < NUM_TRGMII_CTRL; i++) + mtk_w32(mac->hw, + TD_DM_DRVP(8) | TD_DM_DRVN(8), + TRGMII_TD_ODT(i)); + + /* Assert/release MT7623 RXC reset */ + mtk_m32(mac->hw, 0, RXC_RST | RXC_DQSISEL, + TRGMII_RCK_CTRL); + mtk_m32(mac->hw, RXC_RST, 0, TRGMII_RCK_CTRL); } } diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index 85830fe14a1b..454cfcd465fd 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -352,10 +352,13 @@ #define DQSI0(x) ((x << 0) & GENMASK(6, 0)) #define DQSI1(x) ((x << 8) & GENMASK(14, 8)) #define RXCTL_DMWTLAT(x) ((x << 16) & GENMASK(18, 16)) +#define RXC_RST BIT(31) #define RXC_DQSISEL BIT(30) #define RCK_CTRL_RGMII_1000 (RXC_DQSISEL | RXCTL_DMWTLAT(2) | DQSI1(16)) #define RCK_CTRL_RGMII_10_100 RXCTL_DMWTLAT(2) +#define NUM_TRGMII_CTRL 5 + /* TRGMII RXC control register */ #define TRGMII_TCK_CTRL 0x10340 #define TXCTL_DMWTLAT(x) ((x << 16) & GENMASK(18, 16)) @@ -363,6 +366,11 @@ #define TCK_CTRL_RGMII_1000 TXCTL_DMWTLAT(2) #define TCK_CTRL_RGMII_10_100 (TXC_INV | TXCTL_DMWTLAT(2)) +/* TRGMII TX Drive Strength */ +#define TRGMII_TD_ODT(i) (0x10354 + 8 * (i)) +#define TD_DM_DRVP(x) ((x) & 0xf) +#define TD_DM_DRVN(x) (((x) & 0xf) << 4) + /* TRGMII Interface mode register */ #define INTF_MODE 0x10390 #define TRGMII_INTF_DIS BIT(0) From a4837980fd9fa4c70a821d11831698901baef56b Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Mon, 6 Apr 2020 14:39:32 +0300 Subject: [PATCH 033/331] net: revert default NAPI poll timeout to 2 jiffies For HZ < 1000 timeout 2000us rounds up to 1 jiffy but expires randomly because next timer interrupt could come shortly after starting softirq. For commonly used CONFIG_HZ=1000 nothing changes. Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Reported-by: Dmitry Yakunin Signed-off-by: Konstantin Khlebnikov Signed-off-by: David S. Miller --- net/core/dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 9c9e763bfe0e..df8097b8e286 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4140,7 +4140,8 @@ EXPORT_SYMBOL(netdev_max_backlog); int netdev_tstamp_prequeue __read_mostly = 1; int netdev_budget __read_mostly = 300; -unsigned int __read_mostly netdev_budget_usecs = 2000; +/* Must be at least 2 jiffes to guarantee 1 jiffy timeout */ +unsigned int __read_mostly netdev_budget_usecs = 2 * USEC_PER_SEC / HZ; int weight_p __read_mostly = 64; /* old backlog weight */ int dev_weight_rx_bias __read_mostly = 1; /* bias for backlog weight */ int dev_weight_tx_bias __read_mostly = 1; /* bias for output_queue quota */ From a080da6ac7fa13282f1be8705cc67ceacd999ac3 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Mon, 6 Apr 2020 18:36:56 +0300 Subject: [PATCH 034/331] net: sched: Fix setting last executed chain on skb extension After driver sets the missed chain on the tc skb extension it is consumed (deleted) by tc_classify_ingress and tc jumps to that chain. If tc now misses on this chain (either no match, or no goto action), then last executed chain remains 0, and the skb extension is not re-added, and the next datapath (ovs) will start from 0. Fix that by setting last executed chain to the chain read from the skb extension, so if there is a miss, we set it back. Fixes: af699626ee26 ("net: sched: Support specifying a starting chain via tc skb ext") Reviewed-by: Oz Shlomo Reviewed-by: Jiri Pirko Signed-off-by: Paul Blakey Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- net/sched/cls_api.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index f6a3b969ead0..55bd1429678f 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1667,6 +1667,7 @@ int tcf_classify_ingress(struct sk_buff *skb, skb_ext_del(skb, TC_SKB_EXT); tp = rcu_dereference_bh(fchain->filter_chain); + last_executed_chain = fchain->index; } ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, From ab74110205543a8d57eff64c6af64235aa23c09b Mon Sep 17 00:00:00 2001 From: Lothar Rubusch Date: Mon, 6 Apr 2020 21:29:20 +0000 Subject: [PATCH 035/331] Documentation: mdio_bus.c - fix warnings Fix wrong parameter description and related warnings at 'make htmldocs'. Signed-off-by: Lothar Rubusch Reviewed-by: Florian Fainelli Acked-by: Randy Dunlap Tested-by: Randy Dunlap Signed-off-by: David S. Miller --- drivers/net/phy/mdio_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 522760c8bca6..7a4eb3f2cb74 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -464,7 +464,7 @@ static struct class mdio_bus_class = { /** * mdio_find_bus - Given the name of a mdiobus, find the mii_bus. - * @mdio_bus_np: Pointer to the mii_bus. + * @mdio_name: The name of a mdiobus. * * Returns a reference to the mii_bus, or NULL if none found. The * embedded struct device will have its reference count incremented, From 4faab8c446def7667adf1f722456c2f4c304069c Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Tue, 7 Apr 2020 13:23:21 +0000 Subject: [PATCH 036/331] hsr: check protocol version in hsr_newlink() In the current hsr code, only 0 and 1 protocol versions are valid. But current hsr code doesn't check the version, which is received by userspace. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 version 4 In the test commands, version 4 is invalid. So, the command should be failed. After this patch, following error will occur. "Error: hsr: Only versions 0..1 are supported." Fixes: ee1c27977284 ("net/hsr: Added support for HSR v1") Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller --- net/hsr/hsr_netlink.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c index 5465a395da04..1decb25f6764 100644 --- a/net/hsr/hsr_netlink.c +++ b/net/hsr/hsr_netlink.c @@ -69,10 +69,16 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, else multicast_spec = nla_get_u8(data[IFLA_HSR_MULTICAST_SPEC]); - if (!data[IFLA_HSR_VERSION]) + if (!data[IFLA_HSR_VERSION]) { hsr_version = 0; - else + } else { hsr_version = nla_get_u8(data[IFLA_HSR_VERSION]); + if (hsr_version > 1) { + NL_SET_ERR_MSG_MOD(extack, + "Only versions 0..1 are supported"); + return -EINVAL; + } + } return hsr_dev_finalize(dev, link, multicast_spec, hsr_version, extack); } From cb9533d1c683219bc982905046c05e24bfaf4996 Mon Sep 17 00:00:00 2001 From: Roman Mashak Date: Tue, 7 Apr 2020 13:13:25 -0400 Subject: [PATCH 037/331] tc-testing: remove duplicate code in tdc.py In set_operation_mode() function remove duplicated check for args.list parameter, which is already done one line before. Signed-off-by: Roman Mashak Signed-off-by: David S. Miller --- tools/testing/selftests/tc-testing/tdc.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py index e566c70e64a1..a3e43189d940 100755 --- a/tools/testing/selftests/tc-testing/tdc.py +++ b/tools/testing/selftests/tc-testing/tdc.py @@ -713,9 +713,8 @@ def set_operation_mode(pm, parser, args, remaining): exit(0) if args.list: - if args.list: - list_test_cases(alltests) - exit(0) + list_test_cases(alltests) + exit(0) if len(alltests): req_plugins = pm.get_required_plugins(alltests) From ea104a9e4d3e9ebc26fb78dac35585b142ee288b Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Fri, 27 Mar 2020 17:24:50 +0100 Subject: [PATCH 038/331] watchdog: sp805: fix restart handler The restart handler is missing two things, first, the registers has to be unlocked and second there is no synchronization for the write_relaxed() calls. This was tested on a custom board with the NXP LS1028A SoC. Fixes: 6c5c0d48b686c ("watchdog: sp805: add restart handler") Signed-off-by: Michael Walle Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20200327162450.28506-1-michael@walle.cc Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck --- drivers/watchdog/sp805_wdt.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/watchdog/sp805_wdt.c b/drivers/watchdog/sp805_wdt.c index 53e04926a7b2..190d26e2e75f 100644 --- a/drivers/watchdog/sp805_wdt.c +++ b/drivers/watchdog/sp805_wdt.c @@ -137,10 +137,14 @@ wdt_restart(struct watchdog_device *wdd, unsigned long mode, void *cmd) { struct sp805_wdt *wdt = watchdog_get_drvdata(wdd); + writel_relaxed(UNLOCK, wdt->base + WDTLOCK); writel_relaxed(0, wdt->base + WDTCONTROL); writel_relaxed(0, wdt->base + WDTLOAD); writel_relaxed(INT_ENABLE | RESET_ENABLE, wdt->base + WDTCONTROL); + /* Flush posted writes. */ + readl_relaxed(wdt->base + WDTLOCK); + return 0; } From 4d4225fc228e46948486d8b8207955f0c031b92e Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Thu, 2 Apr 2020 15:51:18 -0400 Subject: [PATCH 039/331] btrfs: check commit root generation in should_ignore_root Previously we would set the reloc root's last snapshot to transid - 1. However there was a problem with doing this, and we changed it to setting the last snapshot to the generation of the commit node of the fs root. This however broke should_ignore_root(). The assumption is that if we are in a generation newer than when the reloc root was created, then we would find the reloc root through normal backref lookups, and thus can ignore any fs roots we find with an old enough reloc root. Now that the last snapshot could be considerably further in the past than before, we'd end up incorrectly ignoring an fs root. Thus we'd find no nodes for the bytenr we were searching for, and we'd fail to relocate anything. We'd loop through the relocate code again and see that there were still used space in that block group, attempt to relocate those bytenr's again, fail in the same way, and just loop like this forever. This is tricky in that we have to not modify the fs root at all during this time, so we need to have a block group that has data in this fs root that is not shared by any other root, which is why this has been difficult to reproduce. Fixes: 054570a1dc94 ("Btrfs: fix relocation incorrectly dropping data references") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Signed-off-by: David Sterba --- fs/btrfs/relocation.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index f65595602aa8..7e362a6935fd 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -611,8 +611,8 @@ static int should_ignore_root(struct btrfs_root *root) if (!reloc_root) return 0; - if (btrfs_root_last_snapshot(&reloc_root->root_item) == - root->fs_info->running_transaction->transid - 1) + if (btrfs_header_generation(reloc_root->commit_root) == + root->fs_info->running_transaction->transid) return 0; /* * if there is reloc tree and it was created in previous From 4fdb688c7071f8d5acca0f1f340ea276e9a61dce Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Sat, 4 Apr 2020 21:20:22 +0100 Subject: [PATCH 040/331] btrfs: fix lost i_size update after cloning inline extent When not using the NO_HOLES feature we were not marking the destination's file range as written after cloning an inline extent into it. This can lead to a data loss if the current destination file size is smaller than the source file's size. Example: $ mkfs.btrfs -f -O ^no-holes /dev/sdc $ mount /mnt/sdc /mnt $ echo "hello world" > /mnt/foo $ cp --reflink=always /mnt/foo /mnt/bar $ rm -f /mnt/foo $ umount /mnt $ mount /mnt/sdc /mnt $ cat /mnt/bar $ $ stat -c %s /mnt/bar 0 # -> the file is empty, since we deleted foo, the data lost is forever Fix that by calling btrfs_inode_set_file_extent_range() after cloning an inline extent. A test case for fstests will follow soon. Link: https://lore.kernel.org/linux-btrfs/20200404193846.GA432065@latitude/ Reported-by: Johannes Hirte Fixes: 9ddc959e802bf ("btrfs: use the file extent tree infrastructure") Tested-by: Johannes Hirte Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/reflink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index d1973141d3bb..040009d1cc31 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -264,6 +264,7 @@ copy_inline_extent: size); inode_add_bytes(dst, datal); set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(dst)->runtime_flags); + ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end); out: if (!ret && !trans) { /* From 7af597433d435b56b7c5c8260dad6f979153957b Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Tue, 7 Apr 2020 11:37:44 +0100 Subject: [PATCH 041/331] btrfs: make full fsyncs always operate on the entire file again This is a revert of commit 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient"), with updated comment in btrfs_sync_file. Commit 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient") made full fsyncs operate on the given range only as it assumed it was safe when using the NO_HOLES feature, since the hole detection was simplified some time ago and no longer was a source for races with ordered extent completion of adjacent file ranges. However it's still not safe to have a full fsync only operate on the given range, because extent maps for new extents might not be present in memory due to inode eviction or extent cloning. Consider the following example: 1) We are currently at transaction N; 2) We write to the file range [0, 1MiB); 3) Writeback finishes for the whole range and ordered extents complete, while we are still at transaction N; 4) The inode is evicted; 5) We open the file for writing, causing the inode to be loaded to memory again, which sets the 'full sync' bit on its flags. At this point the inode's list of modified extent maps is empty (figuring out which extents were created in the current transaction and were not yet logged by an fsync is expensive, that's why we set the 'full sync' bit when loading an inode); 6) We write to the file range [512KiB, 768KiB); 7) We do a ranged fsync (such as msync()) for file range [512KiB, 768KiB). This correctly flushes this range and logs its extent into the log tree. When the writeback started an extent map for range [512KiB, 768KiB) was added to the inode's list of modified extents, and when the fsync() finishes logging it removes that extent map from the list of modified extent maps. This fsync also clears the 'full sync' bit; 8) We do a regular fsync() (full ranged). This fsync() ends up doing nothing because the inode's list of modified extents is empty and no other changes happened since the previous ranged fsync(), so it just returns success (0) and we end up never logging extents for the file ranges [0, 512KiB) and [768KiB, 1MiB). Another scenario where this can happen is if we replace steps 2 to 4 with cloning from another file into our test file, as that sets the 'full sync' bit in our inode's flags and does not populate its list of modified extent maps. This was causing test case generic/457 to fail sporadically when using the NO_HOLES feature, as it exercised this later case where the inode has the 'full sync' bit set and has no extent maps in memory to represent the new extents due to extent cloning. Fix this by reverting commit 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient") since there is no easy way to work around it. Fixes: 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient") Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/file.c | 15 ++++++++ fs/btrfs/tree-log.c | 93 +++++++-------------------------------------- 2 files changed, 29 insertions(+), 79 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 8a144f9cb7ac..719e68ab552c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2097,6 +2097,21 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) atomic_inc(&root->log_batch); + /* + * If the inode needs a full sync, make sure we use a full range to + * avoid log tree corruption, due to hole detection racing with ordered + * extent completion for adjacent ranges and races between logging and + * completion of ordered extents for adjancent ranges - both races + * could lead to file extent items in the log with overlapping ranges. + * Do this while holding the inode lock, to avoid races with other + * tasks. + */ + if (test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &BTRFS_I(inode)->runtime_flags)) { + start = 0; + end = LLONG_MAX; + } + /* * Before we acquired the inode's lock, someone may have dirtied more * pages in the target range. We need to make sure that writeback for diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 58c111474ba5..ec36a7c6ba3d 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -96,8 +96,8 @@ enum { static int btrfs_log_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode *inode, int inode_only, - u64 start, - u64 end, + const loff_t start, + const loff_t end, struct btrfs_log_ctx *ctx); static int link_to_fixup_dir(struct btrfs_trans_handle *trans, struct btrfs_root *root, @@ -4533,15 +4533,13 @@ static int btrfs_log_all_xattrs(struct btrfs_trans_handle *trans, static int btrfs_log_holes(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode *inode, - struct btrfs_path *path, - const u64 start, - const u64 end) + struct btrfs_path *path) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key key; const u64 ino = btrfs_ino(inode); const u64 i_size = i_size_read(&inode->vfs_inode); - u64 prev_extent_end = start; + u64 prev_extent_end = 0; int ret; if (!btrfs_fs_incompat(fs_info, NO_HOLES) || i_size == 0) @@ -4549,21 +4547,14 @@ static int btrfs_log_holes(struct btrfs_trans_handle *trans, key.objectid = ino; key.type = BTRFS_EXTENT_DATA_KEY; - key.offset = start; + key.offset = 0; ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); if (ret < 0) return ret; - if (ret > 0 && path->slots[0] > 0) { - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0] - 1); - if (key.objectid == ino && key.type == BTRFS_EXTENT_DATA_KEY) - path->slots[0]--; - } - while (true) { struct extent_buffer *leaf = path->nodes[0]; - u64 extent_end; if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { ret = btrfs_next_leaf(root, path); @@ -4580,18 +4571,9 @@ static int btrfs_log_holes(struct btrfs_trans_handle *trans, if (key.objectid != ino || key.type != BTRFS_EXTENT_DATA_KEY) break; - extent_end = btrfs_file_extent_end(path); - if (extent_end <= start) - goto next_slot; - /* We have a hole, log it. */ if (prev_extent_end < key.offset) { - u64 hole_len; - - if (key.offset >= end) - hole_len = end - prev_extent_end; - else - hole_len = key.offset - prev_extent_end; + const u64 hole_len = key.offset - prev_extent_end; /* * Release the path to avoid deadlocks with other code @@ -4621,20 +4603,16 @@ static int btrfs_log_holes(struct btrfs_trans_handle *trans, leaf = path->nodes[0]; } - prev_extent_end = min(extent_end, end); - if (extent_end >= end) - break; -next_slot: + prev_extent_end = btrfs_file_extent_end(path); path->slots[0]++; cond_resched(); } - if (prev_extent_end < end && prev_extent_end < i_size) { + if (prev_extent_end < i_size) { u64 hole_len; btrfs_release_path(path); - hole_len = min(ALIGN(i_size, fs_info->sectorsize), end); - hole_len -= prev_extent_end; + hole_len = ALIGN(i_size - prev_extent_end, fs_info->sectorsize); ret = btrfs_insert_file_extent(trans, root->log_root, ino, prev_extent_end, 0, 0, hole_len, 0, hole_len, @@ -4971,8 +4949,6 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans, const u64 logged_isize, const bool recursive_logging, const int inode_only, - const u64 start, - const u64 end, struct btrfs_log_ctx *ctx, bool *need_log_inode_item) { @@ -4981,21 +4957,6 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans, int ins_nr = 0; int ret; - /* - * We must make sure we don't copy extent items that are entirely out of - * the range [start, end - 1]. This is not just an optimization to avoid - * copying but also needed to avoid a corruption where we end up with - * file extent items in the log tree that have overlapping ranges - this - * can happen if we race with ordered extent completion for ranges that - * are outside our target range. For example we copy an extent item and - * when we move to the next leaf, that extent was trimmed and a new one - * covering a subrange of it, but with a higher key, was inserted - we - * would then copy this other extent too, resulting in a log tree with - * 2 extent items that represent overlapping ranges. - * - * We can copy the entire extents at the range bondaries however, even - * if they cover an area outside the target range. That's ok. - */ while (1) { ret = btrfs_search_forward(root, min_key, path, trans->transid); if (ret < 0) @@ -5063,29 +5024,6 @@ again: goto next_slot; } - if (min_key->type == BTRFS_EXTENT_DATA_KEY) { - const u64 extent_end = btrfs_file_extent_end(path); - - if (extent_end <= start) { - if (ins_nr > 0) { - ret = copy_items(trans, inode, dst_path, - path, ins_start_slot, - ins_nr, inode_only, - logged_isize); - if (ret < 0) - return ret; - ins_nr = 0; - } - goto next_slot; - } - if (extent_end >= end) { - ins_nr++; - if (ins_nr == 1) - ins_start_slot = path->slots[0]; - break; - } - } - if (ins_nr && ins_start_slot + ins_nr == path->slots[0]) { ins_nr++; goto next_slot; @@ -5151,8 +5089,8 @@ next_key: static int btrfs_log_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode *inode, int inode_only, - u64 start, - u64 end, + const loff_t start, + const loff_t end, struct btrfs_log_ctx *ctx) { struct btrfs_fs_info *fs_info = root->fs_info; @@ -5180,9 +5118,6 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, return -ENOMEM; } - start = ALIGN_DOWN(start, fs_info->sectorsize); - end = ALIGN(end, fs_info->sectorsize); - min_key.objectid = ino; min_key.type = BTRFS_INODE_ITEM_KEY; min_key.offset = 0; @@ -5298,8 +5233,8 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, err = copy_inode_items_to_log(trans, inode, &min_key, &max_key, path, dst_path, logged_isize, - recursive_logging, inode_only, - start, end, ctx, &need_log_inode_item); + recursive_logging, inode_only, ctx, + &need_log_inode_item); if (err) goto out_unlock; @@ -5312,7 +5247,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) { btrfs_release_path(path); btrfs_release_path(dst_path); - err = btrfs_log_holes(trans, root, inode, path, start, end); + err = btrfs_log_holes(trans, root, inode, path); if (err) goto out_unlock; } From d611add48b717ae34f59e0f474bfa7a7d840c4c4 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Tue, 7 Apr 2020 11:38:49 +0100 Subject: [PATCH 042/331] btrfs: fix reclaim counter leak of space_info objects Whenever we add a ticket to a space_info object we increment the object's reclaim_size counter witht the ticket's bytes, and we decrement it with the corresponding amount only when we are able to grant the requested space to the ticket. When we are not able to grant the space to a ticket, or when the ticket is removed due to a signal (e.g. an application has received sigterm from the terminal) we never decrement the counter with the corresponding bytes from the ticket. This leak can result in the space reclaim code to later do much more work than necessary. So fix it by decrementing the counter when those two cases happen as well. Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time") Reviewed-by: Nikolay Borisov Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/block-group.c | 1 + fs/btrfs/space-info.c | 20 ++++++++++++++------ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 786849fcc319..47f66c6a7d7f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -3370,6 +3370,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) space_info->bytes_reserved > 0 || space_info->bytes_may_use > 0)) btrfs_dump_space_info(info, space_info, 0, 0); + WARN_ON(space_info->reclaim_size > 0); list_del(&space_info->list); btrfs_sysfs_remove_space_info(space_info); } diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 8b0fe053a25d..ff17a4420358 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -361,6 +361,16 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, return 0; } +static void remove_ticket(struct btrfs_space_info *space_info, + struct reserve_ticket *ticket) +{ + if (!list_empty(&ticket->list)) { + list_del_init(&ticket->list); + ASSERT(space_info->reclaim_size >= ticket->bytes); + space_info->reclaim_size -= ticket->bytes; + } +} + /* * This is for space we already have accounted in space_info->bytes_may_use, so * basically when we're returning space from block_rsv's. @@ -388,9 +398,7 @@ again: btrfs_space_info_update_bytes_may_use(fs_info, space_info, ticket->bytes); - list_del_init(&ticket->list); - ASSERT(space_info->reclaim_size >= ticket->bytes); - space_info->reclaim_size -= ticket->bytes; + remove_ticket(space_info, ticket); ticket->bytes = 0; space_info->tickets_id++; wake_up(&ticket->wait); @@ -899,7 +907,7 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info, btrfs_info(fs_info, "failing ticket with %llu bytes", ticket->bytes); - list_del_init(&ticket->list); + remove_ticket(space_info, ticket); ticket->error = -ENOSPC; wake_up(&ticket->wait); @@ -1063,7 +1071,7 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info, * despite getting an error, resulting in a space leak * (bytes_may_use counter of our space_info). */ - list_del_init(&ticket->list); + remove_ticket(space_info, ticket); ticket->error = -EINTR; break; } @@ -1121,7 +1129,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info, * either the async reclaim job deletes the ticket from the list * or we delete it ourselves at wait_reserve_ticket(). */ - list_del_init(&ticket->list); + remove_ticket(space_info, ticket); if (!ret) ret = -ENOSPC; } From 7e4d47596b686bb2714d05f8774ada884ec8983d Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Wed, 8 Apr 2020 09:19:11 -0700 Subject: [PATCH 043/331] ionic: replay filters after fw upgrade The NIC's filters are lost in the midst of the fw-upgrade so we need to replay them into the FW. We also remove the unused ionic_rx_filter_del() function. Fixes: c672412f6172 ("ionic: remove lifs on fw reset") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller --- .../net/ethernet/pensando/ionic/ionic_lif.c | 12 +++-- .../ethernet/pensando/ionic/ionic_rx_filter.c | 50 +++++++++++++++---- .../ethernet/pensando/ionic/ionic_rx_filter.h | 2 +- 3 files changed, 50 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index 4b8a76098ca3..f8f437aec027 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -2127,6 +2127,8 @@ static void ionic_lif_handle_fw_up(struct ionic_lif *lif) if (lif->registered) ionic_lif_set_netdev_info(lif); + ionic_rx_filter_replay(lif); + if (netif_running(lif->netdev)) { err = ionic_txrx_alloc(lif); if (err) @@ -2206,9 +2208,9 @@ static void ionic_lif_deinit(struct ionic_lif *lif) if (!test_bit(IONIC_LIF_F_FW_RESET, lif->state)) { cancel_work_sync(&lif->deferred.work); cancel_work_sync(&lif->tx_timeout_work); + ionic_rx_filters_deinit(lif); } - ionic_rx_filters_deinit(lif); if (lif->netdev->features & NETIF_F_RXHASH) ionic_lif_rss_deinit(lif); @@ -2421,9 +2423,11 @@ static int ionic_lif_init(struct ionic_lif *lif) if (err) goto err_out_notifyq_deinit; - err = ionic_rx_filters_init(lif); - if (err) - goto err_out_notifyq_deinit; + if (!test_bit(IONIC_LIF_F_FW_RESET, lif->state)) { + err = ionic_rx_filters_init(lif); + if (err) + goto err_out_notifyq_deinit; + } err = ionic_station_set(lif); if (err) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c index 7a093f148ee5..f3c7dd1596ee 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c @@ -17,17 +17,49 @@ void ionic_rx_filter_free(struct ionic_lif *lif, struct ionic_rx_filter *f) devm_kfree(dev, f); } -int ionic_rx_filter_del(struct ionic_lif *lif, struct ionic_rx_filter *f) +void ionic_rx_filter_replay(struct ionic_lif *lif) { - struct ionic_admin_ctx ctx = { - .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work), - .cmd.rx_filter_del = { - .opcode = IONIC_CMD_RX_FILTER_DEL, - .filter_id = cpu_to_le32(f->filter_id), - }, - }; + struct ionic_rx_filter_add_cmd *ac; + struct ionic_admin_ctx ctx; + struct ionic_rx_filter *f; + struct hlist_head *head; + struct hlist_node *tmp; + unsigned int i; + int err = 0; - return ionic_adminq_post_wait(lif, &ctx); + ac = &ctx.cmd.rx_filter_add; + + for (i = 0; i < IONIC_RX_FILTER_HLISTS; i++) { + head = &lif->rx_filters.by_id[i]; + hlist_for_each_entry_safe(f, tmp, head, by_id) { + ctx.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work); + memcpy(ac, &f->cmd, sizeof(f->cmd)); + dev_dbg(&lif->netdev->dev, "replay filter command:\n"); + dynamic_hex_dump("cmd ", DUMP_PREFIX_OFFSET, 16, 1, + &ctx.cmd, sizeof(ctx.cmd), true); + + err = ionic_adminq_post_wait(lif, &ctx); + if (err) { + switch (le16_to_cpu(ac->match)) { + case IONIC_RX_FILTER_MATCH_VLAN: + netdev_info(lif->netdev, "Replay failed - %d: vlan %d\n", + err, + le16_to_cpu(ac->vlan.vlan)); + break; + case IONIC_RX_FILTER_MATCH_MAC: + netdev_info(lif->netdev, "Replay failed - %d: mac %pM\n", + err, ac->mac.addr); + break; + case IONIC_RX_FILTER_MATCH_MAC_VLAN: + netdev_info(lif->netdev, "Replay failed - %d: vlan %d mac %pM\n", + err, + le16_to_cpu(ac->vlan.vlan), + ac->mac.addr); + break; + } + } + } + } } int ionic_rx_filters_init(struct ionic_lif *lif) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.h b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.h index b6aec9c19918..cf8f4c0a961c 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.h +++ b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.h @@ -24,7 +24,7 @@ struct ionic_rx_filters { }; void ionic_rx_filter_free(struct ionic_lif *lif, struct ionic_rx_filter *f); -int ionic_rx_filter_del(struct ionic_lif *lif, struct ionic_rx_filter *f); +void ionic_rx_filter_replay(struct ionic_lif *lif); int ionic_rx_filters_init(struct ionic_lif *lif); void ionic_rx_filters_deinit(struct ionic_lif *lif); int ionic_rx_filter_save(struct ionic_lif *lif, u32 flow_id, u16 rxq_index, From 216902ae770e21e739cd3b530b0b3ab3c28641d3 Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Wed, 8 Apr 2020 09:19:12 -0700 Subject: [PATCH 044/331] ionic: set station addr only if needed The code was working too hard and in some cases was trying to delete filters that weren't there, generating a potentially misleading error message. IONIC_CMD_RX_FILTER_DEL (32) failed: IONIC_RC_ENOENT (-2) Fixes: 2a654540be10 ("ionic: Add Rx filter and rx_mode ndo support") Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller --- .../net/ethernet/pensando/ionic/ionic_lif.c | 32 +++++++++++-------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index f8f437aec027..5acf4f46c268 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -2341,24 +2341,30 @@ static int ionic_station_set(struct ionic_lif *lif) err = ionic_adminq_post_wait(lif, &ctx); if (err) return err; - + netdev_dbg(lif->netdev, "found initial MAC addr %pM\n", + ctx.comp.lif_getattr.mac); if (is_zero_ether_addr(ctx.comp.lif_getattr.mac)) return 0; - memcpy(addr.sa_data, ctx.comp.lif_getattr.mac, netdev->addr_len); - addr.sa_family = AF_INET; - err = eth_prepare_mac_addr_change(netdev, &addr); - if (err) { - netdev_warn(lif->netdev, "ignoring bad MAC addr from NIC %pM - err %d\n", - addr.sa_data, err); - return 0; + if (!ether_addr_equal(ctx.comp.lif_getattr.mac, netdev->dev_addr)) { + memcpy(addr.sa_data, ctx.comp.lif_getattr.mac, netdev->addr_len); + addr.sa_family = AF_INET; + err = eth_prepare_mac_addr_change(netdev, &addr); + if (err) { + netdev_warn(lif->netdev, "ignoring bad MAC addr from NIC %pM - err %d\n", + addr.sa_data, err); + return 0; + } + + if (!is_zero_ether_addr(netdev->dev_addr)) { + netdev_dbg(lif->netdev, "deleting station MAC addr %pM\n", + netdev->dev_addr); + ionic_lif_addr(lif, netdev->dev_addr, false); + } + + eth_commit_mac_addr_change(netdev, &addr); } - netdev_dbg(lif->netdev, "deleting station MAC addr %pM\n", - netdev->dev_addr); - ionic_lif_addr(lif, netdev->dev_addr, false); - - eth_commit_mac_addr_change(netdev, &addr); netdev_dbg(lif->netdev, "adding station MAC addr %pM\n", netdev->dev_addr); ionic_lif_addr(lif, netdev->dev_addr, true); From 2abe05234f2e892728c388169631e4b99f354c86 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20Wei=C3=9F?= Date: Tue, 7 Apr 2020 13:11:48 +0200 Subject: [PATCH 045/331] l2tp: Allow management of tunnels and session in user namespace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Creation and management of L2TPv3 tunnels and session through netlink requires CAP_NET_ADMIN. However, a process with CAP_NET_ADMIN in a non-initial user namespace gets an EPERM due to the use of the genetlink GENL_ADMIN_PERM flag. Thus, management of L2TP VPNs inside an unprivileged container won't work. We replaced the GENL_ADMIN_PERM by the GENL_UNS_ADMIN_PERM flag similar to other network modules which also had this problem, e.g., openvswitch (commit 4a92602aa1cd "openvswitch: allow management from inside user namespaces") and nl80211 (commit 5617c6cd6f844 "nl80211: Allow privileged operations from user namespaces"). I tested this in the container runtime trustm3 (trustm3.github.io) and was able to create l2tp tunnels and sessions in unpriviliged (user namespaced) containers using a private network namespace. For other runtimes such as docker or lxc this should work, too. Signed-off-by: Michael Weiß Signed-off-by: David S. Miller --- net/l2tp/l2tp_netlink.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index f5a9bdc4980c..ebb381c3f1b9 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -920,51 +920,51 @@ static const struct genl_ops l2tp_nl_ops[] = { .cmd = L2TP_CMD_TUNNEL_CREATE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_tunnel_create, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_DELETE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_tunnel_delete, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_MODIFY, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_tunnel_modify, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_tunnel_get, .dumpit = l2tp_nl_cmd_tunnel_dump, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_CREATE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_session_create, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_DELETE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_session_delete, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_MODIFY, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_session_modify, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = l2tp_nl_cmd_session_get, .dumpit = l2tp_nl_cmd_session_dump, - .flags = GENL_ADMIN_PERM, + .flags = GENL_UNS_ADMIN_PERM, }, }; From f691a25ce5e5e405156ad4091c8e660b2622b7ad Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 8 Apr 2020 20:54:43 +0200 Subject: [PATCH 046/331] net/tls: fix const assignment warning Building with some experimental patches, I came across a warning in the tls code: include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] 215 | *(volatile typeof(x) *)&(x) = (val); \ | ^ net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release' 650 | smp_store_release(&saved_tcpv4_prot, prot); This appears to be a legitimate warning about assigning a const pointer into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and ipv6 pointers 'const' to make the code internally consistent. Fixes: 5bb4c45d466c ("net/tls: Read sk_prot once when building tls proto ops") Signed-off-by: Arnd Bergmann Signed-off-by: David S. Miller --- net/tls/tls_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 156efce50dbd..0e989005bdc2 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -56,9 +56,9 @@ enum { TLS_NUM_PROTS, }; -static struct proto *saved_tcpv6_prot; +static const struct proto *saved_tcpv6_prot; static DEFINE_MUTEX(tcpv6_prot_mutex); -static struct proto *saved_tcpv4_prot; +static const struct proto *saved_tcpv4_prot; static DEFINE_MUTEX(tcpv4_prot_mutex); static struct proto tls_prots[TLS_NUM_PROTS][TLS_NUM_CONFIG][TLS_NUM_CONFIG]; static struct proto_ops tls_sw_proto_ops; From 8c702a53bb0a79bfa203ba21ef1caba43673c5b7 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Mon, 30 Mar 2020 10:21:49 +0300 Subject: [PATCH 047/331] net/mlx5: Fix frequent ioread PCI access during recovery High frequency of PCI ioread calls during recovery flow may cause the following trace on powerpc: [ 248.670288] EEH: 2100000 reads ignored for recovering device at location=Slot1 driver=mlx5_core pci addr=0000:01:00.1 [ 248.670331] EEH: Might be infinite loop in mlx5_core driver [ 248.670361] CPU: 2 PID: 35247 Comm: kworker/u192:11 Kdump: loaded Tainted: G OE ------------ 4.14.0-115.14.1.el7a.ppc64le #1 [ 248.670425] Workqueue: mlx5_health0000:01:00.1 health_recover_work [mlx5_core] [ 248.670471] Call Trace: [ 248.670492] [c00020391c11b960] [c000000000c217ac] dump_stack+0xb0/0xf4 (unreliable) [ 248.670548] [c00020391c11b9a0] [c000000000045818] eeh_check_failure+0x5c8/0x630 [ 248.670631] [c00020391c11ba50] [c00000000068fce4] ioread32be+0x114/0x1c0 [ 248.670692] [c00020391c11bac0] [c00800000dd8b400] mlx5_error_sw_reset+0x160/0x510 [mlx5_core] [ 248.670752] [c00020391c11bb60] [c00800000dd75824] mlx5_disable_device+0x34/0x1d0 [mlx5_core] [ 248.670822] [c00020391c11bbe0] [c00800000dd8affc] health_recover_work+0x11c/0x3c0 [mlx5_core] [ 248.670891] [c00020391c11bc80] [c000000000164fcc] process_one_work+0x1bc/0x5f0 [ 248.670955] [c00020391c11bd20] [c000000000167f8c] worker_thread+0xac/0x6b0 [ 248.671015] [c00020391c11bdc0] [c000000000171618] kthread+0x168/0x1b0 [ 248.671067] [c00020391c11be30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 Reduce the PCI ioread frequency during recovery by using msleep() instead of cond_resched() Fixes: 3e5b72ac2f29 ("net/mlx5: Issue SW reset on FW assert") Signed-off-by: Moshe Shemesh Reviewed-by: Feras Daoud Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index fa1665caac46..f99e1752d4e5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -243,7 +243,7 @@ recover_from_sw_reset: if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED) break; - cond_resched(); + msleep(20); } while (!time_after(jiffies, end)); if (mlx5_get_nic_state(dev) != MLX5_NIC_IFC_DISABLED) { From 84be2fdae4f80b7388f754fe49149374a41725f2 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Sun, 29 Mar 2020 07:10:43 +0300 Subject: [PATCH 048/331] net/mlx5: Fix condition for termination table cleanup When we destroy rules from slow path we need to avoid destroying termination tables since termination tables are never created in slow path. By doing so we avoid destroying the termination table created for the slow path. Fixes: d8a2034f152a ("net/mlx5: Don't use termination tables in slow path") Signed-off-by: Eli Cohen Reviewed-by: Oz Shlomo Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 - .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 12 +++--------- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 39f42f985fbd..c1848b57f61c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -403,7 +403,6 @@ enum { MLX5_ESW_ATTR_FLAG_VLAN_HANDLED = BIT(0), MLX5_ESW_ATTR_FLAG_SLOW_PATH = BIT(1), MLX5_ESW_ATTR_FLAG_NO_IN_PORT = BIT(2), - MLX5_ESW_ATTR_FLAG_HAIRPIN = BIT(3), }; struct mlx5_esw_flow_attr { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index f171eb2234b0..b2e38e0cde97 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -300,7 +300,6 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, bool split = !!(attr->split_count); struct mlx5_flow_handle *rule; struct mlx5_flow_table *fdb; - bool hairpin = false; int j, i = 0; if (esw->mode != MLX5_ESWITCH_OFFLOADS) @@ -398,21 +397,16 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, goto err_esw_get; } - if (mlx5_eswitch_termtbl_required(esw, attr, &flow_act, spec)) { + if (mlx5_eswitch_termtbl_required(esw, attr, &flow_act, spec)) rule = mlx5_eswitch_add_termtbl_rule(esw, fdb, spec, attr, &flow_act, dest, i); - hairpin = true; - } else { + else rule = mlx5_add_flow_rules(fdb, spec, &flow_act, dest, i); - } if (IS_ERR(rule)) goto err_add_rule; else atomic64_inc(&esw->offloads.num_flows); - if (hairpin) - attr->flags |= MLX5_ESW_ATTR_FLAG_HAIRPIN; - return rule; err_add_rule: @@ -501,7 +495,7 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw, mlx5_del_flow_rules(rule); - if (attr->flags & MLX5_ESW_ATTR_FLAG_HAIRPIN) { + if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH)) { /* unref the term table */ for (i = 0; i < MLX5_MAX_FLOW_FWD_VPORTS; i++) { if (attr->dests[i].termtbl) From d19987ccf57501894fdd8fadc2e55e4a3dd57239 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Tue, 24 Mar 2020 15:04:26 +0200 Subject: [PATCH 049/331] net/mlx5e: Add missing release firmware call Once driver finishes flashing the firmware image, it should release it. Fixes: 9c8bca2637b8 ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha Reviewed-by: Aya Levin Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index bdeb291f6b67..e94f0c4d74a7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -23,7 +23,10 @@ static int mlx5_devlink_flash_update(struct devlink *devlink, if (err) return err; - return mlx5_firmware_flash(dev, fw, extack); + err = mlx5_firmware_flash(dev, fw, extack); + release_firmware(fw); + + return err; } static u8 mlx5_fw_ver_major(u32 version) From 70f478ca085deec4d6c1f187f773f5827ddce7e8 Mon Sep 17 00:00:00 2001 From: Dmytro Linkin Date: Wed, 1 Apr 2020 14:41:27 +0300 Subject: [PATCH 050/331] net/mlx5e: Fix nest_level for vlan pop action Current value of nest_level, assigned from net_device lower_level value, does not reflect the actual number of vlan headers, needed to pop. For ex., if we have untagged ingress traffic sended over vlan devices, instead of one pop action, driver will perform two pop actions. To fix that, calculate nest_level as difference between vlan device and parent device lower_levels. Fixes: f3b0a18bb6cb ("net: remove unnecessary variables and callback") Signed-off-by: Dmytro Linkin Signed-off-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 438128dde187..e3fee837c7a3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -3558,12 +3558,13 @@ static int add_vlan_pop_action(struct mlx5e_priv *priv, struct mlx5_esw_flow_attr *attr, u32 *action) { - int nest_level = attr->parse_attr->filter_dev->lower_level; struct flow_action_entry vlan_act = { .id = FLOW_ACTION_VLAN_POP, }; - int err = 0; + int nest_level, err = 0; + nest_level = attr->parse_attr->filter_dev->lower_level - + priv->netdev->lower_level; while (nest_level--) { err = parse_tc_vlan_action(priv, &vlan_act, attr, action); if (err) From d5a3c2b640093c8a4bb5d76170a8f6c8c2eacc17 Mon Sep 17 00:00:00 2001 From: Roi Dayan Date: Sun, 29 Mar 2020 18:54:10 +0300 Subject: [PATCH 051/331] net/mlx5e: Fix missing pedit action after ct clear action With ct clear action we should not allocate the action in hw and not release the mod_acts parsed in advance. It will be done when handling the ct clear action. Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action") Signed-off-by: Roi Dayan Reviewed-by: Paul Blakey Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index e3fee837c7a3..a574c588269a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1343,7 +1343,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, if (err) return err; - if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR && + !(attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR)) { err = mlx5e_attach_mod_hdr(priv, flow, parse_attr); dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts); if (err) From 7482d9cb5b974b7ad1a58fa8714f7a8c05b5d278 Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Fri, 3 Apr 2020 03:57:30 -0500 Subject: [PATCH 052/331] net/mlx5e: Fix pfnum in devlink port attribute Cited patch missed to extract PCI pf number accurately for PF and VF port flavour. It considered PCI device + function number. Due to this, device having non zero device number shown large pfnum. Hence, use only PCI function number; to avoid similar errors, derive pfnum one time for all port flavours. Fixes: f60f315d339e ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Reviewed-by: Jiri Pirko Signed-off-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 2a0243e4af75..55457f268495 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -2050,29 +2050,30 @@ static int register_devlink_port(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep = rpriv->rep; struct netdev_phys_item_id ppid = {}; unsigned int dl_port_index = 0; + u16 pfnum; if (!is_devlink_port_supported(dev, rpriv)) return 0; mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid); + pfnum = PCI_FUNC(dev->pdev->devfn); if (rep->vport == MLX5_VPORT_UPLINK) { devlink_port_attrs_set(&rpriv->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL, - PCI_FUNC(dev->pdev->devfn), false, 0, + pfnum, false, 0, &ppid.id[0], ppid.id_len); dl_port_index = vport_to_devlink_port_index(dev, rep->vport); } else if (rep->vport == MLX5_VPORT_PF) { devlink_port_attrs_pci_pf_set(&rpriv->dl_port, &ppid.id[0], ppid.id_len, - dev->pdev->devfn); + pfnum); dl_port_index = rep->vport; } else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) { devlink_port_attrs_pci_vf_set(&rpriv->dl_port, &ppid.id[0], ppid.id_len, - dev->pdev->devfn, - rep->vport - 1); + pfnum, rep->vport - 1); dl_port_index = vport_to_devlink_port_index(dev, rep->vport); } From 230a1bc2470c5554a8c2bfe14774863897dc9386 Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Fri, 3 Apr 2020 02:35:46 -0500 Subject: [PATCH 053/331] net/mlx5e: Fix devlink port netdev unregistration sequence In cited commit netdevice is registered after devlink port. Unregistration flow should be mirror sequence of registration flow. Hence, unregister netdevice before devlink port. Fixes: 31e87b39ba9d ("net/mlx5e: Fix devlink port register sequence") Reviewed-by: Jiri Pirko Signed-off-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index dd7f338425eb..f02150a97ac8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5526,8 +5526,8 @@ static void mlx5e_remove(struct mlx5_core_dev *mdev, void *vpriv) #ifdef CONFIG_MLX5_CORE_EN_DCB mlx5e_dcbnl_delete_app(priv); #endif - mlx5e_devlink_port_unregister(priv); unregister_netdev(priv->netdev); + mlx5e_devlink_port_unregister(priv); mlx5e_detach(mdev, vpriv); mlx5e_destroy_netdev(priv); } From 9808dd0a2aeebcb72239a3b082159b0186d9ac3d Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Fri, 27 Mar 2020 12:12:31 +0300 Subject: [PATCH 054/331] net/mlx5e: CT: Use rhashtable's ct entries instead of a separate list Fixes CT entries list corruption. After allowing parallel insertion/removals in upper nf flow table layer, unprotected ct entries list can be corrupted by parallel add/del on the same flow table. CT entries list is only used while freeing a ct zone flow table to go over all the ct entries offloaded on that zone/table, and flush the table. As rhashtable already provides an api to go over all the inserted entries, fix the race by using the rhashtable iteration instead, and remove the list. Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command") Reviewed-by: Oz Shlomo Signed-off-by: Paul Blakey Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/core/en/tc_ct.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c index ad3e3a65d403..16416eaac39e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c @@ -67,11 +67,9 @@ struct mlx5_ct_ft { struct nf_flowtable *nf_ft; struct mlx5_tc_ct_priv *ct_priv; struct rhashtable ct_entries_ht; - struct list_head ct_entries_list; }; struct mlx5_ct_entry { - struct list_head list; u16 zone; struct rhash_head node; struct flow_rule *flow_rule; @@ -617,8 +615,6 @@ mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft, if (err) goto err_insert; - list_add(&entry->list, &ft->ct_entries_list); - return 0; err_insert: @@ -646,7 +642,6 @@ mlx5_tc_ct_block_flow_offload_del(struct mlx5_ct_ft *ft, WARN_ON(rhashtable_remove_fast(&ft->ct_entries_ht, &entry->node, cts_ht_params)); - list_del(&entry->list); kfree(entry); return 0; @@ -818,7 +813,6 @@ mlx5_tc_ct_add_ft_cb(struct mlx5_tc_ct_priv *ct_priv, u16 zone, ft->zone = zone; ft->nf_ft = nf_ft; ft->ct_priv = ct_priv; - INIT_LIST_HEAD(&ft->ct_entries_list); refcount_set(&ft->refcount, 1); err = rhashtable_init(&ft->ct_entries_ht, &cts_ht_params); @@ -847,12 +841,12 @@ err_init: } static void -mlx5_tc_ct_flush_ft(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft) +mlx5_tc_ct_flush_ft_entry(void *ptr, void *arg) { - struct mlx5_ct_entry *entry; + struct mlx5_tc_ct_priv *ct_priv = arg; + struct mlx5_ct_entry *entry = ptr; - list_for_each_entry(entry, &ft->ct_entries_list, list) - mlx5_tc_ct_entry_del_rules(ft->ct_priv, entry); + mlx5_tc_ct_entry_del_rules(ct_priv, entry); } static void @@ -863,9 +857,10 @@ mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft) nf_flow_table_offload_del_cb(ft->nf_ft, mlx5_tc_ct_block_flow_offload, ft); - mlx5_tc_ct_flush_ft(ct_priv, ft); rhashtable_remove_fast(&ct_priv->zone_ht, &ft->node, zone_params); - rhashtable_destroy(&ft->ct_entries_ht); + rhashtable_free_and_destroy(&ft->ct_entries_ht, + mlx5_tc_ct_flush_ft_entry, + ct_priv); kfree(ft); } From 8e368dc72e86ad1e1a612416f32d5ad22dca88bc Mon Sep 17 00:00:00 2001 From: Joe Stringer Date: Tue, 7 Apr 2020 20:35:40 -0700 Subject: [PATCH 055/331] bpf: Fix use of sk->sk_reuseport from sk_assign In testing, we found that for request sockets the sk->sk_reuseport field may yet be uninitialized, which caused bpf_sk_assign() to randomly succeed or return -ESOCKTNOSUPPORT when handling the forward ACK in a three-way handshake. Fix it by only applying the reuseport check for full sockets. Fixes: cf7fbe660f2d ("bpf: Add socket assign support") Signed-off-by: Joe Stringer Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20200408033540.10339-1-joe@wand.net.nz --- net/core/filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 7628b947dbc3..7d6ceaa54d21 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5925,7 +5925,7 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags) return -EOPNOTSUPP; if (unlikely(dev_net(skb->dev) != sock_net(sk))) return -ENETUNREACH; - if (unlikely(sk->sk_reuseport)) + if (unlikely(sk_fullsock(sk) && sk->sk_reuseport)) return -ESOCKTNOSUPPORT; if (sk_is_refcounted(sk) && unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) From bb9562cf5c67813034c96afb50bd21130a504441 Mon Sep 17 00:00:00 2001 From: Luke Nelson Date: Wed, 8 Apr 2020 18:12:29 +0000 Subject: [PATCH 056/331] arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0 The current arm BPF JIT does not correctly compile RSH or ARSH when the immediate shift amount is 0. This causes the "rsh64 by 0 imm" and "arsh64 by 0 imm" BPF selftests to hang the kernel by reaching an instruction the verifier determines to be unreachable. The root cause is in how immediate right shifts are encoded on arm. For LSR and ASR (logical and arithmetic right shift), a bit-pattern of 00000 in the immediate encodes a shift amount of 32. When the BPF immediate is 0, the generated code shifts by 32 instead of the expected behavior (a no-op). This patch fixes the bugs by adding an additional check if the BPF immediate is 0. After the change, the above mentioned BPF selftests pass. Fixes: 39c13c204bb11 ("arm: eBPF JIT compiler") Co-developed-by: Xi Wang Signed-off-by: Xi Wang Signed-off-by: Luke Nelson Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200408181229.10909-1-luke.r.nels@gmail.com --- arch/arm/net/bpf_jit_32.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index cc29869d12a3..d124f78e20ac 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -929,7 +929,11 @@ static inline void emit_a32_rsh_i64(const s8 dst[], rd = arm_bpf_get_reg64(dst, tmp, ctx); /* Do LSR operation */ - if (val < 32) { + if (val == 0) { + /* An immediate value of 0 encodes a shift amount of 32 + * for LSR. To shift by 0, don't do anything. + */ + } else if (val < 32) { emit(ARM_MOV_SI(tmp2[1], rd[1], SRTYPE_LSR, val), ctx); emit(ARM_ORR_SI(rd[1], tmp2[1], rd[0], SRTYPE_ASL, 32 - val), ctx); emit(ARM_MOV_SI(rd[0], rd[0], SRTYPE_LSR, val), ctx); @@ -955,7 +959,11 @@ static inline void emit_a32_arsh_i64(const s8 dst[], rd = arm_bpf_get_reg64(dst, tmp, ctx); /* Do ARSH operation */ - if (val < 32) { + if (val == 0) { + /* An immediate value of 0 encodes a shift amount of 32 + * for ASR. To shift by 0, don't do anything. + */ + } else if (val < 32) { emit(ARM_MOV_SI(tmp2[1], rd[1], SRTYPE_LSR, val), ctx); emit(ARM_ORR_SI(rd[1], tmp2[1], rd[0], SRTYPE_ASL, 32 - val), ctx); emit(ARM_MOV_SI(rd[0], rd[0], SRTYPE_ASR, val), ctx); From 4df933252827af69cb087e3df1294e4945a6f6c6 Mon Sep 17 00:00:00 2001 From: Xu Wang Date: Thu, 9 Apr 2020 19:20:52 +0800 Subject: [PATCH 057/331] ALSA: ctxfi: Remove unnecessary cast in kfree Remove unnecassary casts in the argument to kfree. Signed-off-by: Xu Wang Link: https://lore.kernel.org/r/20200409112052.13402-1-vulab@iscas.ac.cn Signed-off-by: Takashi Iwai --- sound/pci/ctxfi/cthw20k1.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sound/pci/ctxfi/cthw20k1.c b/sound/pci/ctxfi/cthw20k1.c index 6e3177bcc709..015c0d676897 100644 --- a/sound/pci/ctxfi/cthw20k1.c +++ b/sound/pci/ctxfi/cthw20k1.c @@ -168,7 +168,7 @@ static int src_get_rsc_ctrl_blk(void **rblk) static int src_put_rsc_ctrl_blk(void *blk) { - kfree((struct src_rsc_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -494,7 +494,7 @@ static int src_mgr_get_ctrl_blk(void **rblk) static int src_mgr_put_ctrl_blk(void *blk) { - kfree((struct src_mgr_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -515,7 +515,7 @@ static int srcimp_mgr_get_ctrl_blk(void **rblk) static int srcimp_mgr_put_ctrl_blk(void *blk) { - kfree((struct srcimp_mgr_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -702,7 +702,7 @@ static int amixer_rsc_get_ctrl_blk(void **rblk) static int amixer_rsc_put_ctrl_blk(void *blk) { - kfree((struct amixer_rsc_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -909,7 +909,7 @@ static int dai_get_ctrl_blk(void **rblk) static int dai_put_ctrl_blk(void *blk) { - kfree((struct dai_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -958,7 +958,7 @@ static int dao_get_ctrl_blk(void **rblk) static int dao_put_ctrl_blk(void *blk) { - kfree((struct dao_ctrl_blk *)blk); + kfree(blk); return 0; } @@ -1156,7 +1156,7 @@ static int daio_mgr_get_ctrl_blk(struct hw *hw, void **rblk) static int daio_mgr_put_ctrl_blk(void *blk) { - kfree((struct daio_mgr_ctrl_blk *)blk); + kfree(blk); return 0; } From bfa5807d4db98dd58ce6b69607e8655dcdbbabbd Mon Sep 17 00:00:00 2001 From: Likun Gao Date: Thu, 9 Apr 2020 12:05:08 +0800 Subject: [PATCH 058/331] Revert "drm/amdgpu: change SH MEM alignment mode for gfx10" This reverts commit b74fb888f4927e2079be576ce6dcdbf0c420f1f8. Revert the auto alignment mode set of SH MEM config, as it will result to OCL Conformance Test fail. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index d78059fd2c72..f92c158d89a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -279,7 +279,7 @@ static const struct soc15_reg_golden golden_settings_gc_10_1_2_nv12[] = #define DEFAULT_SH_MEM_CONFIG \ ((SH_MEM_ADDRESS_MODE_64 << SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \ - (SH_MEM_ALIGNMENT_MODE_DWORD << SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \ + (SH_MEM_ALIGNMENT_MODE_UNALIGNED << SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \ (SH_MEM_RETRY_MODE_ALL << SH_MEM_CONFIG__RETRY_MODE__SHIFT) | \ (3 << SH_MEM_CONFIG__INITIAL_INST_PREFETCH__SHIFT)) From 97d9f1c43bedd400301d6f1eff54d46e8c636e47 Mon Sep 17 00:00:00 2001 From: Olaf Hering Date: Tue, 7 Apr 2020 19:27:39 +0200 Subject: [PATCH 059/331] x86: hyperv: report value of misc_features A few kernel features depend on ms_hyperv.misc_features, but unlike its siblings ->features and ->hints, the value was never reported during boot. Signed-off-by: Olaf Hering Link: https://lore.kernel.org/r/20200407172739.31371-1-olaf@aepfle.de Signed-off-by: Wei Liu --- arch/x86/kernel/cpu/mshyperv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index caa032ce3fe3..53706fb56433 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -227,8 +227,8 @@ static void __init ms_hyperv_init_platform(void) ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES); ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO); - pr_info("Hyper-V: features 0x%x, hints 0x%x\n", - ms_hyperv.features, ms_hyperv.hints); + pr_info("Hyper-V: features 0x%x, hints 0x%x, misc 0x%x\n", + ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features); ms_hyperv.max_vp_index = cpuid_eax(HYPERV_CPUID_IMPLEMENT_LIMITS); ms_hyperv.max_lp_index = cpuid_ebx(HYPERV_CPUID_IMPLEMENT_LIMITS); From e750b84dc9c5510b89d9f52695f9c2f51e73eaab Mon Sep 17 00:00:00 2001 From: Lothar Rubusch Date: Wed, 8 Apr 2020 22:09:31 +0000 Subject: [PATCH 060/331] Documentation: devlink: fix broken link warning At 'make htmldocs' the following warning is thrown: Documentation/networking/devlink/devlink-trap.rst:302: WARNING: undefined label: generic-packet-trap-groups Fixes the warning by setting the label to the specified header, within the same document. Signed-off-by: Lothar Rubusch Signed-off-by: David S. Miller --- Documentation/networking/devlink/devlink-trap.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index a09971c2115c..fe089acb7783 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -257,6 +257,8 @@ drivers: * :doc:`netdevsim` * :doc:`mlxsw` +.. _Generic-Packet-Trap-Groups: + Generic Packet Trap Groups ========================== From 6dbf02acef69b0742c238574583b3068afbd227c Mon Sep 17 00:00:00 2001 From: Wang Wenhu Date: Wed, 8 Apr 2020 19:53:53 -0700 Subject: [PATCH 061/331] net: qrtr: send msgs from local of same id as broadcast If the local node id(qrtr_local_nid) is not modified after its initialization, it equals to the broadcast node id(QRTR_NODE_BCAST). So the messages from local node should not be taken as broadcast and keep the process going to send them out anyway. The definitions are as follow: static unsigned int qrtr_local_nid = NUMA_NO_NODE; Fixes: fdf5fd397566 ("net: qrtr: Broadcast messages only from control port") Signed-off-by: Wang Wenhu Signed-off-by: David S. Miller --- net/qrtr/qrtr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index e22092e4a783..7ed31b5e77e4 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -906,20 +906,21 @@ static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) node = NULL; if (addr->sq_node == QRTR_NODE_BCAST) { - enqueue_fn = qrtr_bcast_enqueue; - if (addr->sq_port != QRTR_PORT_CTRL) { + if (addr->sq_port != QRTR_PORT_CTRL && + qrtr_local_nid != QRTR_NODE_BCAST) { release_sock(sk); return -ENOTCONN; } + enqueue_fn = qrtr_bcast_enqueue; } else if (addr->sq_node == ipc->us.sq_node) { enqueue_fn = qrtr_local_enqueue; } else { - enqueue_fn = qrtr_node_enqueue; node = qrtr_node_lookup(addr->sq_node); if (!node) { release_sock(sk); return -ECONNRESET; } + enqueue_fn = qrtr_node_enqueue; } plen = (len + 3) & ~3; From 5f0224a6ddc3101ab9664a5f7a6287047934a079 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 9 Apr 2020 14:41:26 +0100 Subject: [PATCH 062/331] net-sysfs: remove redundant assignment to variable ret The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King Signed-off-by: David S. Miller --- net/core/net-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index cf0215734ceb..4773ad6ec111 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -80,7 +80,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr, struct net_device *netdev = to_net_dev(dev); struct net *net = dev_net(netdev); unsigned long new; - int ret = -EINVAL; + int ret; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; From 022e9d6090599c0593c78e87dc9ba98a290e6bc4 Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Thu, 9 Apr 2020 14:08:08 +0000 Subject: [PATCH 063/331] net: macsec: fix using wrong structure in macsec_changelink() In the macsec_changelink(), "struct macsec_tx_sa tx_sc" is used to store "macsec_secy.tx_sc". But, the struct type of tx_sc is macsec_tx_sc, not macsec_tx_sa. So, the macsec_tx_sc should be used instead. Test commands: ip link add dummy0 type dummy ip link add macsec0 link dummy0 type macsec ip link set macsec0 type macsec encrypt off Splat looks like: [61119.963483][ T9335] ================================================================== [61119.964709][ T9335] BUG: KASAN: slab-out-of-bounds in macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.965787][ T9335] Read of size 160 at addr ffff888020d69c68 by task ip/9335 [61119.966699][ T9335] [61119.966979][ T9335] CPU: 0 PID: 9335 Comm: ip Not tainted 5.6.0+ #503 [61119.967791][ T9335] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [61119.968914][ T9335] Call Trace: [61119.969324][ T9335] dump_stack+0x96/0xdb [61119.969809][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.970554][ T9335] print_address_description.constprop.5+0x1be/0x360 [61119.971294][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.971973][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.972703][ T9335] __kasan_report+0x12a/0x170 [61119.973323][ T9335] ? macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.973942][ T9335] kasan_report+0xe/0x20 [61119.974397][ T9335] check_memory_region+0x149/0x1a0 [61119.974866][ T9335] memcpy+0x1f/0x50 [61119.975209][ T9335] macsec_changelink.part.34+0xb6/0x200 [macsec] [61119.975825][ T9335] ? macsec_get_stats64+0x3e0/0x3e0 [macsec] [61119.976451][ T9335] ? kernel_text_address+0x111/0x120 [61119.976990][ T9335] ? pskb_expand_head+0x25f/0xe10 [61119.977503][ T9335] ? stack_trace_save+0x82/0xb0 [61119.977986][ T9335] ? memset+0x1f/0x40 [61119.978397][ T9335] ? __nla_validate_parse+0x98/0x1ab0 [61119.978936][ T9335] ? macsec_alloc_tfm+0x90/0x90 [macsec] [61119.979511][ T9335] ? __kasan_slab_free+0x111/0x150 [61119.980021][ T9335] ? kfree+0xce/0x2f0 [61119.980700][ T9335] ? netlink_trim+0x196/0x1f0 [61119.981420][ T9335] ? nla_memcpy+0x90/0x90 [61119.982036][ T9335] ? register_lock_class+0x19e0/0x19e0 [61119.982776][ T9335] ? memcpy+0x34/0x50 [61119.983327][ T9335] __rtnl_newlink+0x922/0x1270 [ ... ] Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller --- drivers/net/macsec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 0d580d81d910..a183250ff66a 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -3809,7 +3809,7 @@ static int macsec_changelink(struct net_device *dev, struct nlattr *tb[], struct netlink_ext_ack *extack) { struct macsec_dev *macsec = macsec_priv(dev); - struct macsec_tx_sa tx_sc; + struct macsec_tx_sc tx_sc; struct macsec_secy secy; int ret; From e228a5d05e9ee25878e9a40de96e7ceb579d4893 Mon Sep 17 00:00:00 2001 From: Ka-Cheong Poon Date: Wed, 8 Apr 2020 03:21:01 -0700 Subject: [PATCH 064/331] net/rds: Replace struct rds_mr's r_refcount with struct kref And removed rds_mr_put(). Signed-off-by: Ka-Cheong Poon Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/message.c | 6 +++--- net/rds/rdma.c | 28 +++++++++++++++------------- net/rds/rds.h | 9 ++------- 3 files changed, 20 insertions(+), 23 deletions(-) diff --git a/net/rds/message.c b/net/rds/message.c index 50f13f1d4ae0..bbecb8cb873e 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 Oracle. All rights reserved. + * Copyright (c) 2006, 2020 Oracle and/or its affiliates. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -162,12 +162,12 @@ static void rds_message_purge(struct rds_message *rm) if (rm->rdma.op_active) rds_rdma_free_op(&rm->rdma); if (rm->rdma.op_rdma_mr) - rds_mr_put(rm->rdma.op_rdma_mr); + kref_put(&rm->rdma.op_rdma_mr->r_kref, __rds_put_mr_final); if (rm->atomic.op_active) rds_atomic_free_op(&rm->atomic); if (rm->atomic.op_rdma_mr) - rds_mr_put(rm->atomic.op_rdma_mr); + kref_put(&rm->atomic.op_rdma_mr->r_kref, __rds_put_mr_final); } void rds_message_put(struct rds_message *rm) diff --git a/net/rds/rdma.c b/net/rds/rdma.c index 585e6b3b69ce..f828b66978e4 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2007, 2020 Oracle and/or its affiliates. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -84,7 +84,7 @@ static struct rds_mr *rds_mr_tree_walk(struct rb_root *root, u64 key, if (insert) { rb_link_node(&insert->r_rb_node, parent, p); rb_insert_color(&insert->r_rb_node, root); - refcount_inc(&insert->r_refcount); + kref_get(&insert->r_kref); } return NULL; } @@ -99,7 +99,7 @@ static void rds_destroy_mr(struct rds_mr *mr) unsigned long flags; rdsdebug("RDS: destroy mr key is %x refcnt %u\n", - mr->r_key, refcount_read(&mr->r_refcount)); + mr->r_key, kref_read(&mr->r_kref)); if (test_and_set_bit(RDS_MR_DEAD, &mr->r_state)) return; @@ -115,8 +115,10 @@ static void rds_destroy_mr(struct rds_mr *mr) mr->r_trans->free_mr(trans_private, mr->r_invalidate); } -void __rds_put_mr_final(struct rds_mr *mr) +void __rds_put_mr_final(struct kref *kref) { + struct rds_mr *mr = container_of(kref, struct rds_mr, r_kref); + rds_destroy_mr(mr); kfree(mr); } @@ -141,7 +143,7 @@ void rds_rdma_drop_keys(struct rds_sock *rs) RB_CLEAR_NODE(&mr->r_rb_node); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); rds_destroy_mr(mr); - rds_mr_put(mr); + kref_put(&mr->r_kref, __rds_put_mr_final); spin_lock_irqsave(&rs->rs_rdma_lock, flags); } spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); @@ -242,7 +244,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, goto out; } - refcount_set(&mr->r_refcount, 1); + kref_init(&mr->r_kref); RB_CLEAR_NODE(&mr->r_rb_node); mr->r_trans = rs->rs_transport; mr->r_sock = rs; @@ -343,7 +345,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, rdsdebug("RDS: get_mr key is %x\n", mr->r_key); if (mr_ret) { - refcount_inc(&mr->r_refcount); + kref_get(&mr->r_kref); *mr_ret = mr; } @@ -351,7 +353,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, out: kfree(pages); if (mr) - rds_mr_put(mr); + kref_put(&mr->r_kref, __rds_put_mr_final); return ret; } @@ -440,7 +442,7 @@ int rds_free_mr(struct rds_sock *rs, char __user *optval, int optlen) * someone else drops their ref. */ rds_destroy_mr(mr); - rds_mr_put(mr); + kref_put(&mr->r_kref, __rds_put_mr_final); return 0; } @@ -481,7 +483,7 @@ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force) * trigger an async flush. */ if (zot_me) { rds_destroy_mr(mr); - rds_mr_put(mr); + kref_put(&mr->r_kref, __rds_put_mr_final); } } @@ -490,7 +492,7 @@ void rds_rdma_free_op(struct rm_rdma_op *ro) unsigned int i; if (ro->op_odp_mr) { - rds_mr_put(ro->op_odp_mr); + kref_put(&ro->op_odp_mr->r_kref, __rds_put_mr_final); } else { for (i = 0; i < ro->op_nents; i++) { struct page *page = sg_page(&ro->op_sg[i]); @@ -730,7 +732,7 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm, goto out_pages; } RB_CLEAR_NODE(&local_odp_mr->r_rb_node); - refcount_set(&local_odp_mr->r_refcount, 1); + kref_init(&local_odp_mr->r_kref); local_odp_mr->r_trans = rs->rs_transport; local_odp_mr->r_sock = rs; local_odp_mr->r_trans_private = @@ -827,7 +829,7 @@ int rds_cmsg_rdma_dest(struct rds_sock *rs, struct rds_message *rm, if (!mr) err = -EINVAL; /* invalid r_key */ else - refcount_inc(&mr->r_refcount); + kref_get(&mr->r_kref); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); if (mr) { diff --git a/net/rds/rds.h b/net/rds/rds.h index e4a603523083..3cda01cfaa56 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -291,7 +291,7 @@ struct rds_incoming { struct rds_mr { struct rb_node r_rb_node; - refcount_t r_refcount; + struct kref r_kref; u32 r_key; /* A copy of the creation flags */ @@ -946,12 +946,7 @@ void rds_atomic_send_complete(struct rds_message *rm, int wc_status); int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg); -void __rds_put_mr_final(struct rds_mr *mr); -static inline void rds_mr_put(struct rds_mr *mr) -{ - if (refcount_dec_and_test(&mr->r_refcount)) - __rds_put_mr_final(mr); -} +void __rds_put_mr_final(struct kref *kref); static inline bool rds_destroy_pending(struct rds_connection *conn) { From 2fabef4f65b46b261434a27ecdce291b63de8522 Mon Sep 17 00:00:00 2001 From: Ka-Cheong Poon Date: Wed, 8 Apr 2020 03:21:02 -0700 Subject: [PATCH 065/331] net/rds: Fix MR reference counting problem In rds_free_mr(), it calls rds_destroy_mr(mr) directly. But this defeats the purpose of reference counting and makes MR free handling impossible. It means that holding a reference does not guarantee that it is safe to access some fields. For example, In rds_cmsg_rdma_dest(), it increases the ref count, unlocks and then calls mr->r_trans->sync_mr(). But if rds_free_mr() (and rds_destroy_mr()) is called in between (there is no lock preventing this to happen), r_trans_private is set to NULL, causing a panic. Similar issue is in rds_rdma_unuse(). Reported-by: zerons Signed-off-by: Ka-Cheong Poon Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/rdma.c | 25 ++++++++++++------------- net/rds/rds.h | 8 -------- 2 files changed, 12 insertions(+), 21 deletions(-) diff --git a/net/rds/rdma.c b/net/rds/rdma.c index f828b66978e4..113e442101ce 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -101,9 +101,6 @@ static void rds_destroy_mr(struct rds_mr *mr) rdsdebug("RDS: destroy mr key is %x refcnt %u\n", mr->r_key, kref_read(&mr->r_kref)); - if (test_and_set_bit(RDS_MR_DEAD, &mr->r_state)) - return; - spin_lock_irqsave(&rs->rs_rdma_lock, flags); if (!RB_EMPTY_NODE(&mr->r_rb_node)) rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); @@ -142,7 +139,6 @@ void rds_rdma_drop_keys(struct rds_sock *rs) rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); RB_CLEAR_NODE(&mr->r_rb_node); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); - rds_destroy_mr(mr); kref_put(&mr->r_kref, __rds_put_mr_final); spin_lock_irqsave(&rs->rs_rdma_lock, flags); } @@ -436,12 +432,6 @@ int rds_free_mr(struct rds_sock *rs, char __user *optval, int optlen) if (!mr) return -EINVAL; - /* - * call rds_destroy_mr() ourselves so that we're sure it's done by the time - * we return. If we let rds_mr_put() do it it might not happen until - * someone else drops their ref. - */ - rds_destroy_mr(mr); kref_put(&mr->r_kref, __rds_put_mr_final); return 0; } @@ -466,6 +456,14 @@ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force) return; } + /* Get a reference so that the MR won't go away before calling + * sync_mr() below. + */ + kref_get(&mr->r_kref); + + /* If it is going to be freed, remove it from the tree now so + * that no other thread can find it and free it. + */ if (mr->r_use_once || force) { rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); RB_CLEAR_NODE(&mr->r_rb_node); @@ -479,12 +477,13 @@ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force) if (mr->r_trans->sync_mr) mr->r_trans->sync_mr(mr->r_trans_private, DMA_FROM_DEVICE); + /* Release the reference held above. */ + kref_put(&mr->r_kref, __rds_put_mr_final); + /* If the MR was marked as invalidate, this will * trigger an async flush. */ - if (zot_me) { - rds_destroy_mr(mr); + if (zot_me) kref_put(&mr->r_kref, __rds_put_mr_final); - } } void rds_rdma_free_op(struct rm_rdma_op *ro) diff --git a/net/rds/rds.h b/net/rds/rds.h index 3cda01cfaa56..8e18cd2aec51 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -299,19 +299,11 @@ struct rds_mr { unsigned int r_invalidate:1; unsigned int r_write:1; - /* This is for RDS_MR_DEAD. - * It would be nice & consistent to make this part of the above - * bit field here, but we need to use test_and_set_bit. - */ - unsigned long r_state; struct rds_sock *r_sock; /* back pointer to the socket that owns us */ struct rds_transport *r_trans; void *r_trans_private; }; -/* Flags for mr->r_state */ -#define RDS_MR_DEAD 0 - static inline rds_rdma_cookie_t rds_rdma_make_cookie(u32 r_key, u32 offset) { return r_key | (((u64) offset) << 32); From 690cc86321eb9bcee371710252742fb16fe96824 Mon Sep 17 00:00:00 2001 From: Taras Chornyi Date: Thu, 9 Apr 2020 20:25:24 +0300 Subject: [PATCH 066/331] net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device with autojoin flag or when multicast ip is deleted kernel will crash. steps to reproduce: ip addr add 224.0.0.0/32 dev eth0 ip addr del 224.0.0.0/32 dev eth0 or ip addr add 224.0.0.0/32 dev eth0 autojoin Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088 pc : _raw_write_lock_irqsave+0x1e0/0x2ac lr : lock_sock_nested+0x1c/0x60 Call trace: _raw_write_lock_irqsave+0x1e0/0x2ac lock_sock_nested+0x1c/0x60 ip_mc_config.isra.28+0x50/0xe0 inet_rtm_deladdr+0x1a8/0x1f0 rtnetlink_rcv_msg+0x120/0x350 netlink_rcv_skb+0x58/0x120 rtnetlink_rcv+0x14/0x20 netlink_unicast+0x1b8/0x270 netlink_sendmsg+0x1a0/0x3b0 ____sys_sendmsg+0x248/0x290 ___sys_sendmsg+0x80/0xc0 __sys_sendmsg+0x68/0xc0 __arm64_sys_sendmsg+0x20/0x30 el0_svc_common.constprop.2+0x88/0x150 do_el0_svc+0x20/0x80 el0_sync_handler+0x118/0x190 el0_sync+0x140/0x180 Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") Signed-off-by: Taras Chornyi Signed-off-by: Vadym Kochan Signed-off-by: David S. Miller --- net/ipv4/devinet.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 30fa42f5997d..c0dd561aa190 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -614,12 +614,15 @@ struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, return NULL; } -static int ip_mc_config(struct sock *sk, bool join, const struct in_ifaddr *ifa) +static int ip_mc_autojoin_config(struct net *net, bool join, + const struct in_ifaddr *ifa) { +#if defined(CONFIG_IP_MULTICAST) struct ip_mreqn mreq = { .imr_multiaddr.s_addr = ifa->ifa_address, .imr_ifindex = ifa->ifa_dev->dev->ifindex, }; + struct sock *sk = net->ipv4.mc_autojoin_sk; int ret; ASSERT_RTNL(); @@ -632,6 +635,9 @@ static int ip_mc_config(struct sock *sk, bool join, const struct in_ifaddr *ifa) release_sock(sk); return ret; +#else + return -EOPNOTSUPP; +#endif } static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -675,7 +681,7 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, continue; if (ipv4_is_multicast(ifa->ifa_address)) - ip_mc_config(net->ipv4.mc_autojoin_sk, false, ifa); + ip_mc_autojoin_config(net, false, ifa); __inet_del_ifa(in_dev, ifap, 1, nlh, NETLINK_CB(skb).portid); return 0; } @@ -940,8 +946,7 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, */ set_ifa_lifetime(ifa, valid_lft, prefered_lft); if (ifa->ifa_flags & IFA_F_MCAUTOJOIN) { - int ret = ip_mc_config(net->ipv4.mc_autojoin_sk, - true, ifa); + int ret = ip_mc_autojoin_config(net, true, ifa); if (ret < 0) { inet_free_ifa(ifa); From 2098c564701c0dde76063dd9c5c00a7a1f173541 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sat, 4 Apr 2020 08:36:31 -0700 Subject: [PATCH 067/331] mtd: spi-nor: Compile files in controllers/ directory Commit a0900d0195d2 ("mtd: spi-nor: Prepare core / manufacturer code split") moved various files into a new directory, but did not add the new directory to its parent directory Makefile. The moved files no longer build, and affected flash chips no longer instantiate. Adding the new directory to the parent directory Makefile fixes the problem. Fixes: a0900d0195d2 ("mtd: spi-nor: Prepare core / manufacturer code split") Cc: Boris Brezillon Cc: Tudor Ambarus Signed-off-by: Guenter Roeck Reviewed-by: Boris Brezillon Acked-by: Joel Stanley Reviewed-by: Tudor Ambarus Signed-off-by: Richard Weinberger --- drivers/mtd/spi-nor/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mtd/spi-nor/Makefile b/drivers/mtd/spi-nor/Makefile index 7ddb742de1fe..653923896205 100644 --- a/drivers/mtd/spi-nor/Makefile +++ b/drivers/mtd/spi-nor/Makefile @@ -18,3 +18,5 @@ spi-nor-objs += winbond.o spi-nor-objs += xilinx.o spi-nor-objs += xmc.o obj-$(CONFIG_MTD_SPI_NOR) += spi-nor.o + +obj-$(CONFIG_MTD_SPI_NOR) += controllers/ From 8c7f0a44b4b4ef16df8f44fbaee6d1f5d1593c83 Mon Sep 17 00:00:00 2001 From: Sergei Lopatin Date: Wed, 26 Jun 2019 14:56:59 +0500 Subject: [PATCH 068/331] drm/amd/powerplay: force the trim of the mclk dpm_levels if OD is enabled Should prevent flicker if PP_OVERDRIVE_MASK is set. bug: https://bugs.freedesktop.org/show_bug.cgi?id=102646 bug: https://bugs.freedesktop.org/show_bug.cgi?id=108941 bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1088 bug: https://gitlab.freedesktop.org/drm/amd/-/issues/628 Signed-off-by: Sergei Lopatin Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c index 7740488999df..4795eb66b2b2 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c @@ -3804,9 +3804,12 @@ static int smu7_trim_single_dpm_states(struct pp_hwmgr *hwmgr, { uint32_t i; + /* force the trim if mclk_switching is disabled to prevent flicker */ + bool force_trim = (low_limit == high_limit); for (i = 0; i < dpm_table->count; i++) { /*skip the trim if od is enabled*/ - if (!hwmgr->od_enabled && (dpm_table->dpm_levels[i].value < low_limit + if ((!hwmgr->od_enabled || force_trim) + && (dpm_table->dpm_levels[i].value < low_limit || dpm_table->dpm_levels[i].value > high_limit)) dpm_table->dpm_levels[i].enabled = false; else From 4f7d010fc47ee1aad84bed97ac25719658117c1d Mon Sep 17 00:00:00 2001 From: Evan Quan Date: Tue, 24 Mar 2020 16:22:19 +0800 Subject: [PATCH 069/331] drm/amd/powerplay: unload mp1 for Arcturus RAS baco reset This sequence is recommended by PMFW team for the baco reset with PMFW reloaded. And it seems able to address the random failure seen on Arcturus. Signed-off-by: Evan Quan Reviewed-by: Feifei Xu Reviewed-by: John Clements Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c index 541c932a6005..655ba4fb05dc 100644 --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c @@ -1718,6 +1718,12 @@ int smu_v11_0_baco_set_state(struct smu_context *smu, enum smu_baco_state state) if (ret) goto out; + if (ras && ras->supported) { + ret = smu_send_smc_msg(smu, SMU_MSG_PrepareMp1ForUnload, NULL); + if (ret) + goto out; + } + /* clear vbios scratch 6 and 7 for coming asic reinit */ WREG32(adev->bios_scratch_reg_offset + 6, 0); WREG32(adev->bios_scratch_reg_offset + 7, 0); From 74347a99e73ae00b8385f1209aaea193c670f901 Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:26 -0700 Subject: [PATCH 070/331] x86/Hyper-V: Unload vmbus channel in hv panic callback When kdump is not configured, a Hyper-V VM might still respond to network traffic after a kernel panic when kernel parameter panic=0. The panic CPU goes into an infinite loop with interrupts enabled, and the VMbus driver interrupt handler still works because the VMbus connection is unloaded only in the kdump path. The network responses make the other end of the connection think the VM is still functional even though it has panic'ed, which could affect any failover actions that should be taken. Fix this by unloading the VMbus connection during the panic process. vmbus_initiate_unload() could then be called twice (e.g., by hyperv_panic_event() and hv_crash_handler(), so reset the connection state in vmbus_initiate_unload() to ensure the unload is done only once. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Link: https://lore.kernel.org/r/20200406155331.2105-2-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- drivers/hv/channel_mgmt.c | 3 +++ drivers/hv/vmbus_drv.c | 21 +++++++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 0370364169c4..501c43c5851d 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -839,6 +839,9 @@ void vmbus_initiate_unload(bool crash) { struct vmbus_channel_message_header hdr; + if (xchg(&vmbus_connection.conn_state, DISCONNECTED) == DISCONNECTED) + return; + /* Pre-Win2012R2 hosts don't support reconnect */ if (vmbus_proto_version < VERSION_WIN8_1) return; diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 029378c27421..6478240d11ab 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -53,9 +53,12 @@ static int hyperv_panic_event(struct notifier_block *nb, unsigned long val, { struct pt_regs *regs; - regs = current_pt_regs(); + vmbus_initiate_unload(true); - hyperv_report_panic(regs, val); + if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) { + regs = current_pt_regs(); + hyperv_report_panic(regs, val); + } return NOTIFY_DONE; } @@ -1391,10 +1394,16 @@ static int vmbus_bus_init(void) } register_die_notifier(&hyperv_die_block); - atomic_notifier_chain_register(&panic_notifier_list, - &hyperv_panic_block); } + /* + * Always register the panic notifier because we need to unload + * the VMbus channel connection to prevent any VMbus + * activity after the VM panics. + */ + atomic_notifier_chain_register(&panic_notifier_list, + &hyperv_panic_block); + vmbus_request_offers(); return 0; @@ -2204,8 +2213,6 @@ static int vmbus_bus_suspend(struct device *dev) vmbus_initiate_unload(false); - vmbus_connection.conn_state = DISCONNECTED; - /* Reset the event for the next resume. */ reinit_completion(&vmbus_connection.ready_for_resume_event); @@ -2289,7 +2296,6 @@ static void hv_kexec_handler(void) { hv_stimer_global_cleanup(); vmbus_initiate_unload(false); - vmbus_connection.conn_state = DISCONNECTED; /* Make sure conn_state is set as hv_synic_cleanup checks for it */ mb(); cpuhp_remove_state(hyperv_cpuhp_online); @@ -2306,7 +2312,6 @@ static void hv_crash_handler(struct pt_regs *regs) * doing the cleanup for current CPU only. This should be sufficient * for kdump. */ - vmbus_connection.conn_state = DISCONNECTED; cpu = smp_processor_id(); hv_stimer_cleanup(cpu); hv_synic_disable_regs(cpu); From 7f11a2cc10a4ae3a70e2c73361f4a9a33503539b Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:27 -0700 Subject: [PATCH 071/331] x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump If kmsg_dump_register() fails, hv_panic_page will not be used anywhere. So free and reset it. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Link: https://lore.kernel.org/r/20200406155331.2105-3-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- drivers/hv/vmbus_drv.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 6478240d11ab..00a511f15926 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1385,9 +1385,13 @@ static int vmbus_bus_init(void) hv_panic_page = (void *)hv_alloc_hyperv_zeroed_page(); if (hv_panic_page) { ret = kmsg_dump_register(&hv_kmsg_dumper); - if (ret) + if (ret) { pr_err("Hyper-V: kmsg dump register " "error 0x%x\n", ret); + hv_free_hyperv_page( + (unsigned long)hv_panic_page); + hv_panic_page = NULL; + } } else pr_err("Hyper-V: panic message page memory " "allocation failed"); @@ -1416,7 +1420,6 @@ err_alloc: hv_remove_vmbus_irq(); bus_unregister(&hv_bus); - hv_free_hyperv_page((unsigned long)hv_panic_page); unregister_sysctl_table(hv_ctl_table_hdr); hv_ctl_table_hdr = NULL; return ret; From 34c51814b2b87cb2e5a98c92fe957db2ee8e27f4 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Wed, 1 Apr 2020 05:26:50 +0200 Subject: [PATCH 072/331] btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition The commit 9c1036fdb1d1ff1b ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support") breaks strace build with the kernel headers from git: btrfs.c: In function "btrfs_test_subvol_ioctls": btrfs.c:531:23: error: "BTRFS_SUBVOL_CREATE_ASYNC" undeclared (first use in this function) vol_args_v2.flags = BTRFS_SUBVOL_CREATE_ASYNC; Moreover, it is improper to break UAPI, strace uses the definitions to decode ioctls that are considered part of public API. Restore the macro definition and put it under "#ifndef __KERNEL__" in order to prevent inadvertent in-kernel usage. Fixes: 9c1036fdb1d1ff1b ("btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support") Reviewed-by: Nikolay Borisov Signed-off-by: Eugene Syromiatnikov Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/uapi/linux/btrfs.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 8134924cfc17..e6b6cb0f8bc6 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -36,12 +36,10 @@ struct btrfs_ioctl_vol_args { #define BTRFS_DEVICE_PATH_NAME_MAX 1024 #define BTRFS_SUBVOL_NAME_MAX 4039 -/* - * Deprecated since 5.7: - * - * BTRFS_SUBVOL_CREATE_ASYNC (1ULL << 0) - */ - +#ifndef __KERNEL__ +/* Deprecated since 5.7 */ +# define BTRFS_SUBVOL_CREATE_ASYNC (1ULL << 0) +#endif #define BTRFS_SUBVOL_RDONLY (1ULL << 1) #define BTRFS_SUBVOL_QGROUP_INHERIT (1ULL << 2) From 9b038086f06be1a85d74fe8cc8e74e07db6f422e Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 9 Apr 2020 14:21:58 -0700 Subject: [PATCH 073/331] docs: networking: convert DIM to RST Convert the Dynamic Interrupt Moderation doc to RST and use the RST features like syntax highlight, function and structure documentation, enumerations, table of contents. Reviewed-by: Randy Dunlap Signed-off-by: Jakub Kicinski --- Documentation/networking/index.rst | 1 + .../networking/{net_dim.txt => net_dim.rst} | 90 +++++++++---------- MAINTAINERS | 1 + 3 files changed, 45 insertions(+), 47 deletions(-) rename Documentation/networking/{net_dim.txt => net_dim.rst} (79%) diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 50133d9761c9..6538ede29661 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -22,6 +22,7 @@ Contents: z8530book msg_zerocopy failover + net_dim net_failover phy sfp-phylink diff --git a/Documentation/networking/net_dim.txt b/Documentation/networking/net_dim.rst similarity index 79% rename from Documentation/networking/net_dim.txt rename to Documentation/networking/net_dim.rst index 9bdb7d5a3ba3..1de1e3ec774b 100644 --- a/Documentation/networking/net_dim.txt +++ b/Documentation/networking/net_dim.rst @@ -1,28 +1,20 @@ +====================================================== Net DIM - Generic Network Dynamic Interrupt Moderation ====================================================== -Author: - Tal Gilboa +:Author: Tal Gilboa +.. contents:: :depth: 2 -Contents -========= - -- Assumptions -- Introduction -- The Net DIM Algorithm -- Registering a Network Device to DIM -- Example - -Part 0: Assumptions -====================== +Assumptions +=========== This document assumes the reader has basic knowledge in network drivers and in general interrupt moderation. -Part I: Introduction -====================== +Introduction +============ Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the interrupt moderation configuration of a channel in order to optimize packet @@ -41,14 +33,15 @@ number of wanted packets per event. The Net DIM algorithm ascribes importance to increase bandwidth over reducing interrupt rate. -Part II: The Net DIM Algorithm -=============================== +Net DIM Algorithm +================= Each iteration of the Net DIM algorithm follows these steps: -1. Calculates new data sample. -2. Compares it to previous sample. -3. Makes a decision - suggests interrupt moderation configuration fields. -4. Applies a schedule work function, which applies suggested configuration. + +#. Calculates new data sample. +#. Compares it to previous sample. +#. Makes a decision - suggests interrupt moderation configuration fields. +#. Applies a schedule work function, which applies suggested configuration. The first two steps are straightforward, both the new and the previous data are supplied by the driver registered to Net DIM. The previous data is the new data @@ -89,19 +82,21 @@ manoeuvre as it may provide partial data or ignore the algorithm suggestion under some conditions. -Part III: Registering a Network Device to DIM -============================================== +Registering a Network Device to DIM +=================================== -Net DIM API exposes the main function net_dim(struct dim *dim, -struct dim_sample end_sample). This function is the entry point to the Net +Net DIM API exposes the main function net_dim(). +This function is the entry point to the Net DIM algorithm and has to be called every time the driver would like to check if it should change interrupt moderation parameters. The driver should provide two -data structures: struct dim and struct dim_sample. Struct dim +data structures: :c:type:`struct dim ` and +:c:type:`struct dim_sample `. :c:type:`struct dim ` describes the state of DIM for a specific object (RX queue, TX queue, other queues, etc.). This includes the current selected profile, previous data samples, the callback function provided by the driver and more. -Struct dim_sample describes a data sample, which will be compared to the -data sample stored in struct dim in order to decide on the algorithm's next +:c:type:`struct dim_sample ` describes a data sample, +which will be compared to the data sample stored in :c:type:`struct dim ` +in order to decide on the algorithm's next step. The sample should include bytes, packets and interrupts, measured by the driver. @@ -110,9 +105,10 @@ main net_dim() function. The recommended method is to call net_dim() on each interrupt. Since Net DIM has a built-in moderation and it might decide to skip iterations under certain conditions, there is no need to moderate the net_dim() calls as well. As mentioned above, the driver needs to provide an object of type -struct dim to the net_dim() function call. It is advised for each entity -using Net DIM to hold a struct dim as part of its data structure and use it -as the main Net DIM API object. The struct dim_sample should hold the latest +:c:type:`struct dim ` to the net_dim() function call. It is advised for +each entity using Net DIM to hold a :c:type:`struct dim ` as part of its +data structure and use it as the main Net DIM API object. +The :c:type:`struct dim_sample ` should hold the latest bytes, packets and interrupts count. No need to perform any calculations, just include the raw data. @@ -124,19 +120,19 @@ the data flow. After the work is done, Net DIM algorithm needs to be set to the proper state in order to move to the next iteration. -Part IV: Example -================= +Example +======= The following code demonstrates how to register a driver to Net DIM. The actual usage is not complete but it should make the outline of the usage clear. -my_driver.c: +.. code-block:: c -#include + #include -/* Callback for net DIM to schedule on a decision to change moderation */ -void my_driver_do_dim_work(struct work_struct *work) -{ + /* Callback for net DIM to schedule on a decision to change moderation */ + void my_driver_do_dim_work(struct work_struct *work) + { /* Get struct dim from struct work_struct */ struct dim *dim = container_of(work, struct dim, work); @@ -145,11 +141,11 @@ void my_driver_do_dim_work(struct work_struct *work) /* Signal net DIM work is done and it should move to next iteration */ dim->state = DIM_START_MEASURE; -} + } -/* My driver's interrupt handler */ -int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) -{ + /* My driver's interrupt handler */ + int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) + { ... /* A struct to hold current measured data */ struct dim_sample dim_sample; @@ -162,13 +158,13 @@ int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) /* Call net DIM */ net_dim(&my_entity->dim, dim_sample); ... -} + } -/* My entity's initialization function (my_entity was already allocated) */ -int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) -{ + /* My entity's initialization function (my_entity was already allocated) */ + int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) + { ... /* Initiate struct work_struct with my driver's callback function */ INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); ... -} + } diff --git a/MAINTAINERS b/MAINTAINERS index 9271068bde63..46a3a01b54b9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5961,6 +5961,7 @@ M: Tal Gilboa S: Maintained F: include/linux/dim.h F: lib/dim/ +F: Documentation/networking/net_dim.rst DZ DECSTATION DZ11 SERIAL DRIVER M: "Maciej W. Rozycki" From 9d8592896fd946b27c385d42f5c80b0b5254fce9 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 9 Apr 2020 14:21:59 -0700 Subject: [PATCH 074/331] docs: networking: add full DIM API Add the full net DIM API to the net_dim.rst file. Signed-off-by: Randy Dunlap Signed-off-by: Jakub Kicinski --- Documentation/networking/net_dim.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/networking/net_dim.rst b/Documentation/networking/net_dim.rst index 1de1e3ec774b..3bed9fd95336 100644 --- a/Documentation/networking/net_dim.rst +++ b/Documentation/networking/net_dim.rst @@ -168,3 +168,9 @@ usage is not complete but it should make the outline of the usage clear. INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); ... } + +Dynamic Interrupt Moderation (DIM) library API +============================================== + +.. kernel-doc:: include/linux/dim.h + :internal: From 4963d66b8a26c489958063abb6900ea6ed8e4836 Mon Sep 17 00:00:00 2001 From: Adam Barber Date: Fri, 10 Apr 2020 17:00:32 +0800 Subject: [PATCH 075/331] ALSA: hda/realtek - Enable the headset mic on Asus FX505DT On Asus FX505DT with Realtek ALC233, the headset mic is connected to pin 0x19, with default 0x411111f0. Enable headset mic by reconfiguring the pin to an external mic associated with the headphone on 0x21. Mic jack detection was also found to be working. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207131 Signed-off-by: Adam Barber Cc: Link: https://lore.kernel.org/r/20200410090032.2759-1-barberadam995@gmail.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index de2826f90d34..dc5557d79c43 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7378,6 +7378,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x1043, 0x17d1, "ASUS UX431FL", ALC294_FIXUP_ASUS_DUAL_SPK), SND_PCI_QUIRK(0x1043, 0x18b1, "Asus MJ401TA", ALC256_FIXUP_ASUS_HEADSET_MIC), + SND_PCI_QUIRK(0x1043, 0x18f1, "Asus FX505DT", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1043, 0x19ce, "ASUS B9450FA", ALC294_FIXUP_ASUS_HPE), SND_PCI_QUIRK(0x1043, 0x1a13, "Asus G73Jw", ALC269_FIXUP_ASUS_G73JW), SND_PCI_QUIRK(0x1043, 0x1a30, "ASUS X705UD", ALC256_FIXUP_ASUS_MIC), From 73f26e526f19afb3a06b76b970a76bcac2cafd05 Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:28 -0700 Subject: [PATCH 076/331] x86/Hyper-V: Trigger crash enlightenment only once during system crash. When a guest VM panics, Hyper-V should be notified only once via the crash synthetic MSRs. Current Linux code might write these crash MSRs twice during a system panic: 1) hyperv_panic/die_event() calling hyperv_report_panic() 2) hv_kmsg_dump() calling hyperv_report_panic_msg() Fix this by not calling hyperv_report_panic() if a kmsg dump has been successfully registered. The notification will happen later via hyperv_report_panic_msg(). Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Link: https://lore.kernel.org/r/20200406155331.2105-4-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- drivers/hv/vmbus_drv.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 00a511f15926..333dad39b1c1 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -55,7 +55,13 @@ static int hyperv_panic_event(struct notifier_block *nb, unsigned long val, vmbus_initiate_unload(true); - if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) { + /* + * Hyper-V should be notified only once about a panic. If we will be + * doing hyperv_report_panic_msg() later with kmsg data, don't do + * the notification here. + */ + if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE + && !hv_panic_page) { regs = current_pt_regs(); hyperv_report_panic(regs, val); } @@ -68,7 +74,13 @@ static int hyperv_die_event(struct notifier_block *nb, unsigned long val, struct die_args *die = (struct die_args *)args; struct pt_regs *regs = die->regs; - hyperv_report_panic(regs, val); + /* + * Hyper-V should be notified only once about a panic. If we will be + * doing hyperv_report_panic_msg() later with kmsg data, don't do + * the notification here. + */ + if (!hv_panic_page) + hyperv_report_panic(regs, val); return NOTIFY_DONE; } From a11589563e96bf262767294b89b25a9d44e7303b Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:29 -0700 Subject: [PATCH 077/331] x86/Hyper-V: Report crash register data or kmsg before running crash kernel We want to notify Hyper-V when a Linux guest VM crash occurs, so there is a record of the crash even when kdump is enabled. But crash_kexec_post_notifiers defaults to "false", so the kdump kernel runs before the notifiers and Hyper-V never gets notified. Fix this by always setting crash_kexec_post_notifiers to be true for Hyper-V VMs. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Link: https://lore.kernel.org/r/20200406155331.2105-5-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- arch/x86/kernel/cpu/mshyperv.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 53706fb56433..ebf34c7bc8bc 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -263,6 +263,16 @@ static void __init ms_hyperv_init_platform(void) cpuid_eax(HYPERV_CPUID_NESTED_FEATURES); } + /* + * Hyper-V expects to get crash register data or kmsg when + * crash enlightment is available and system crashes. Set + * crash_kexec_post_notifiers to be true to make sure that + * calling crash enlightment interface before running kdump + * kernel. + */ + if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) + crash_kexec_post_notifiers = true; + #ifdef CONFIG_X86_LOCAL_APIC if (ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS && ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) { From 040026df7088c56ccbad28f7042308f67bde63df Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:30 -0700 Subject: [PATCH 078/331] x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set When sysctl_record_panic_msg is not set, the panic will not be reported to Hyper-V via hyperv_report_panic_msg(). So the crash should be reported via hyperv_report_panic(). Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Link: https://lore.kernel.org/r/20200406155331.2105-6-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- drivers/hv/vmbus_drv.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 333dad39b1c1..172ceae69abb 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -48,6 +48,18 @@ static int hyperv_cpuhp_online; static void *hv_panic_page; +/* + * Boolean to control whether to report panic messages over Hyper-V. + * + * It can be set via /proc/sys/kernel/hyperv/record_panic_msg + */ +static int sysctl_record_panic_msg = 1; + +static int hyperv_report_reg(void) +{ + return !sysctl_record_panic_msg || !hv_panic_page; +} + static int hyperv_panic_event(struct notifier_block *nb, unsigned long val, void *args) { @@ -61,7 +73,7 @@ static int hyperv_panic_event(struct notifier_block *nb, unsigned long val, * the notification here. */ if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE - && !hv_panic_page) { + && hyperv_report_reg()) { regs = current_pt_regs(); hyperv_report_panic(regs, val); } @@ -79,7 +91,7 @@ static int hyperv_die_event(struct notifier_block *nb, unsigned long val, * doing hyperv_report_panic_msg() later with kmsg data, don't do * the notification here. */ - if (!hv_panic_page) + if (hyperv_report_reg()) hyperv_report_panic(regs, val); return NOTIFY_DONE; } @@ -1267,13 +1279,6 @@ static void vmbus_isr(void) add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0); } -/* - * Boolean to control whether to report panic messages over Hyper-V. - * - * It can be set via /proc/sys/kernel/hyperv/record_panic_msg - */ -static int sysctl_record_panic_msg = 1; - /* * Callback from kmsg_dump. Grab as much as possible from the end of the kmsg * buffer and call into Hyper-V to transfer the data. From f3a99e761efa616028b255b4de58e9b5b87c5545 Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Mon, 6 Apr 2020 08:53:31 -0700 Subject: [PATCH 079/331] x86/Hyper-V: Report crash data in die() when panic_on_oops is set When oops happens with panic_on_oops unset, the oops thread is killed by die() and system continues to run. In such case, guest should not report crash register data to host since system still runs. Check panic_on_oops and return directly in hyperv_report_panic() when the function is called in the die() and panic_on_oops is unset. Fix it. Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Signed-off-by: Tianyu Lan Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu --- arch/x86/hyperv/hv_init.c | 6 +++++- drivers/hv/vmbus_drv.c | 5 +++-- include/asm-generic/mshyperv.h | 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index b0da5320bcff..624f5d9b0f79 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -419,11 +420,14 @@ void hyperv_cleanup(void) } EXPORT_SYMBOL_GPL(hyperv_cleanup); -void hyperv_report_panic(struct pt_regs *regs, long err) +void hyperv_report_panic(struct pt_regs *regs, long err, bool in_die) { static bool panic_reported; u64 guest_id; + if (in_die && !panic_on_oops) + return; + /* * We prefer to report panic on 'die' chain as we have proper * registers to report, but if we miss it (e.g. on BUG()) we need diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 172ceae69abb..a68bce4d0ddb 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include "hyperv_vmbus.h" @@ -75,7 +76,7 @@ static int hyperv_panic_event(struct notifier_block *nb, unsigned long val, if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE && hyperv_report_reg()) { regs = current_pt_regs(); - hyperv_report_panic(regs, val); + hyperv_report_panic(regs, val, false); } return NOTIFY_DONE; } @@ -92,7 +93,7 @@ static int hyperv_die_event(struct notifier_block *nb, unsigned long val, * the notification here. */ if (hyperv_report_reg()) - hyperv_report_panic(regs, val); + hyperv_report_panic(regs, val, true); return NOTIFY_DONE; } diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index b3f1082cc435..1c4fd950f091 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -163,7 +163,7 @@ static inline int cpumask_to_vpset(struct hv_vpset *vpset, return nr_bank; } -void hyperv_report_panic(struct pt_regs *regs, long err); +void hyperv_report_panic(struct pt_regs *regs, long err, bool in_die); void hyperv_report_panic_msg(phys_addr_t pa, size_t size); bool hv_is_hyperv_initialized(void); bool hv_is_hibernation_supported(void); From 3b72f84f8fb65e83e85e9be58eabcf95a40b8f46 Mon Sep 17 00:00:00 2001 From: Clemens Gruber Date: Sat, 11 Apr 2020 18:51:25 +0200 Subject: [PATCH 080/331] net: phy: marvell: Fix pause frame negotiation The negotiation of flow control / pause frame modes was broken since commit fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") moved the setting of phydev->duplex below the phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that function, phydev->pause was no longer set. Fix it by moving the parsing of the status variable before the blocks dealing with the pause frames. As the Marvell 88E1510 datasheet does not specify the timing between the link status and the "Speed and Duplex Resolved" bit, we have to force the link down as long as the resolved bit is not set, to avoid reporting link up before we even have valid Speed/Duplex. Tested with a Marvell 88E1510 (RGMII to Copper/1000Base-T) Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()") Signed-off-by: Clemens Gruber Signed-off-by: Jakub Kicinski --- drivers/net/phy/marvell.c | 46 ++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 4714ca0e0d4b..7fc8e10c5f33 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -1263,6 +1263,30 @@ static int marvell_read_status_page_an(struct phy_device *phydev, int lpa; int err; + if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) { + phydev->link = 0; + return 0; + } + + if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) + phydev->duplex = DUPLEX_FULL; + else + phydev->duplex = DUPLEX_HALF; + + switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { + case MII_M1011_PHY_STATUS_1000: + phydev->speed = SPEED_1000; + break; + + case MII_M1011_PHY_STATUS_100: + phydev->speed = SPEED_100; + break; + + default: + phydev->speed = SPEED_10; + break; + } + if (!fiber) { err = genphy_read_lpa(phydev); if (err < 0) @@ -1291,28 +1315,6 @@ static int marvell_read_status_page_an(struct phy_device *phydev, } } - if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) - return 0; - - if (status & MII_M1011_PHY_STATUS_FULLDUPLEX) - phydev->duplex = DUPLEX_FULL; - else - phydev->duplex = DUPLEX_HALF; - - switch (status & MII_M1011_PHY_STATUS_SPD_MASK) { - case MII_M1011_PHY_STATUS_1000: - phydev->speed = SPEED_1000; - break; - - case MII_M1011_PHY_STATUS_100: - phydev->speed = SPEED_100; - break; - - default: - phydev->speed = SPEED_10; - break; - } - return 0; } From 48cc42973509afac24e83d6edc23901d102872d1 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 12 Apr 2020 10:13:28 +0200 Subject: [PATCH 081/331] ALSA: usb-audio: Filter error from connector kctl ops, too The ignore_ctl_error option should filter the error at kctl accesses, but there was an overlook: mixer_ctl_connector_get() returns an error from the request. This patch covers the forgotten code path and apply filter_error() properly. The locking error is still returned since this is a fatal error that has to be reported even with ignore_ctl_error option. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206873 Cc: Link: https://lore.kernel.org/r/20200412081331.4742-2-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/usb/mixer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 721d12130d0c..d27e390dcd32 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -1457,7 +1457,7 @@ error: usb_audio_err(chip, "cannot get connectors status: req = %#x, wValue = %#x, wIndex = %#x, type = %d\n", UAC_GET_CUR, validx, idx, cval->val_type); - return ret; + return filter_error(cval, ret); } ucontrol->value.integer.value[0] = val; From 3507245b82b4362dc9721cbc328644905a3efa22 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 12 Apr 2020 10:13:29 +0200 Subject: [PATCH 082/331] ALSA: usb-audio: Don't override ignore_ctl_error value from the map The mapping table may contain also ignore_ctl_error flag for devices that are known to behave wild. Since this flag always writes the card's own ignore_ctl_error flag, it overrides the value already set by the module option, so it doesn't follow user's expectation. Let's fix the code not to clear the flag that has been set by user. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206873 Cc: Link: https://lore.kernel.org/r/20200412081331.4742-3-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/usb/mixer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index d27e390dcd32..83926b1be53b 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -3106,7 +3106,7 @@ static int snd_usb_mixer_controls(struct usb_mixer_interface *mixer) if (map->id == state.chip->usb_id) { state.map = map->map; state.selector_map = map->selector_map; - mixer->ignore_ctl_error = map->ignore_ctl_error; + mixer->ignore_ctl_error |= map->ignore_ctl_error; break; } } From 7dc3c5a0172e6c0449502103356c3628d05bc0e0 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 12 Apr 2020 10:13:30 +0200 Subject: [PATCH 083/331] ALSA: usb-audio: Don't create jack controls for PCM terminals Some funky firmwares set the connector flag even on PCM terminals although it doesn't make sense (and even actually the firmware doesn't react properly!). Let's skip creation of jack controls in such a case. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206873 Cc: Link: https://lore.kernel.org/r/20200412081331.4742-4-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/usb/mixer.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 83926b1be53b..ab9c908a8771 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2109,7 +2109,8 @@ static int parse_audio_input_terminal(struct mixer_build *state, int unitid, check_input_term(state, term_id, &iterm); /* Check for jack detection. */ - if (uac_v2v3_control_is_readable(bmctls, control)) + if ((iterm.type & 0xff00) != 0x0100 && + uac_v2v3_control_is_readable(bmctls, control)) build_connector_control(state->mixer, &iterm, true); return 0; @@ -3149,7 +3150,8 @@ static int snd_usb_mixer_controls(struct usb_mixer_interface *mixer) if (err < 0 && err != -EINVAL) return err; - if (uac_v2v3_control_is_readable(le16_to_cpu(desc->bmControls), + if ((state.oterm.type & 0xff00) != 0x0100 && + uac_v2v3_control_is_readable(le16_to_cpu(desc->bmControls), UAC2_TE_CONNECTOR)) { build_connector_control(state.mixer, &state.oterm, false); @@ -3174,7 +3176,8 @@ static int snd_usb_mixer_controls(struct usb_mixer_interface *mixer) if (err < 0 && err != -EINVAL) return err; - if (uac_v2v3_control_is_readable(le32_to_cpu(desc->bmControls), + if ((state.oterm.type & 0xff00) != 0x0100 && + uac_v2v3_control_is_readable(le32_to_cpu(desc->bmControls), UAC3_TE_INSERTION)) { build_connector_control(state.mixer, &state.oterm, false); From 934b96594ed66b07dbc7e576d28814466df3a494 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 12 Apr 2020 10:13:31 +0200 Subject: [PATCH 084/331] ALSA: usb-audio: Check mapping at creating connector controls, too Add the mapping check to build_connector_control() so that the device specific quirk can provide the node to skip for the badly behaving connector controls. As an example, ALC1220-VB-based codec implements the skip entry for the broken SPDIF connector detection. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206873 Cc: Link: https://lore.kernel.org/r/20200412081331.4742-5-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/usb/mixer.c | 18 +++++++++++------- sound/usb/mixer_maps.c | 4 +++- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index ab9c908a8771..e7b9040a54e6 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -1771,11 +1771,15 @@ static void get_connector_control_name(struct usb_mixer_interface *mixer, /* Build a mixer control for a UAC connector control (jack-detect) */ static void build_connector_control(struct usb_mixer_interface *mixer, + const struct usbmix_name_map *imap, struct usb_audio_term *term, bool is_input) { struct snd_kcontrol *kctl; struct usb_mixer_elem_info *cval; + if (check_ignored_ctl(find_map(imap, term->id, 0))) + return; + cval = kzalloc(sizeof(*cval), GFP_KERNEL); if (!cval) return; @@ -2111,7 +2115,7 @@ static int parse_audio_input_terminal(struct mixer_build *state, int unitid, /* Check for jack detection. */ if ((iterm.type & 0xff00) != 0x0100 && uac_v2v3_control_is_readable(bmctls, control)) - build_connector_control(state->mixer, &iterm, true); + build_connector_control(state->mixer, state->map, &iterm, true); return 0; } @@ -3072,13 +3076,13 @@ static int snd_usb_mixer_controls_badd(struct usb_mixer_interface *mixer, memset(&iterm, 0, sizeof(iterm)); iterm.id = UAC3_BADD_IT_ID4; iterm.type = UAC_BIDIR_TERMINAL_HEADSET; - build_connector_control(mixer, &iterm, true); + build_connector_control(mixer, map->map, &iterm, true); /* Output Term - Insertion control */ memset(&oterm, 0, sizeof(oterm)); oterm.id = UAC3_BADD_OT_ID3; oterm.type = UAC_BIDIR_TERMINAL_HEADSET; - build_connector_control(mixer, &oterm, false); + build_connector_control(mixer, map->map, &oterm, false); } return 0; @@ -3153,8 +3157,8 @@ static int snd_usb_mixer_controls(struct usb_mixer_interface *mixer) if ((state.oterm.type & 0xff00) != 0x0100 && uac_v2v3_control_is_readable(le16_to_cpu(desc->bmControls), UAC2_TE_CONNECTOR)) { - build_connector_control(state.mixer, &state.oterm, - false); + build_connector_control(state.mixer, state.map, + &state.oterm, false); } } else { /* UAC_VERSION_3 */ struct uac3_output_terminal_descriptor *desc = p; @@ -3179,8 +3183,8 @@ static int snd_usb_mixer_controls(struct usb_mixer_interface *mixer) if ((state.oterm.type & 0xff00) != 0x0100 && uac_v2v3_control_is_readable(le32_to_cpu(desc->bmControls), UAC3_TE_INSERTION)) { - build_connector_control(state.mixer, &state.oterm, - false); + build_connector_control(state.mixer, state.map, + &state.oterm, false); } } } diff --git a/sound/usb/mixer_maps.c b/sound/usb/mixer_maps.c index 72b575c34860..b4e77000f441 100644 --- a/sound/usb/mixer_maps.c +++ b/sound/usb/mixer_maps.c @@ -360,9 +360,11 @@ static const struct usbmix_name_map corsair_virtuoso_map[] = { }; /* Some mobos shipped with a dummy HD-audio show the invalid GET_MIN/GET_MAX - * response for Input Gain Pad (id=19, control=12). Skip it. + * response for Input Gain Pad (id=19, control=12) and the connector status + * for SPDIF terminal (id=18). Skip them. */ static const struct usbmix_name_map asus_rog_map[] = { + { 18, NULL }, /* OT, connector control */ { 19, NULL, 12 }, /* FU, Input Gain Pad */ {} }; From dccc587f6c07ccc734588226fdf62f685558e89f Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 12 Apr 2020 02:05:01 +0300 Subject: [PATCH 085/331] io_uring: remove obsolete @mm_fault If io_submit_sqes() can't grab an mm, it fails and exits right away. There is no need to track the fact of the failure. Remove @mm_fault. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5190bfb6a665..81532479c857 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5818,7 +5818,6 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, struct io_submit_state state, *statep = NULL; struct io_kiocb *link = NULL; int i, submitted = 0; - bool mm_fault = false; /* if we have a backlog and couldn't flush it all, return BUSY */ if (test_bit(0, &ctx->sq_check_overflow)) { @@ -5872,8 +5871,7 @@ fail_req: } if (io_op_defs[req->opcode].needs_mm && !*mm) { - mm_fault = mm_fault || !mmget_not_zero(ctx->sqo_mm); - if (unlikely(mm_fault)) { + if (unlikely(!mmget_not_zero(ctx->sqo_mm))) { err = -EFAULT; goto fail_req; } From bf9c2f1cdcc718b6d2d41172f6ca005fe22cc7ff Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 12 Apr 2020 02:05:02 +0300 Subject: [PATCH 086/331] io_uring: track mm through current->mm As a preparation for extracting request init bits, remove self-coded mm tracking from io_submit_sqes(), but rely on current->mm. It's more convenient, than passing this piece of state in other functions. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 81532479c857..f7825d3de400 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5812,8 +5812,7 @@ static void io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, } static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, - struct file *ring_file, int ring_fd, - struct mm_struct **mm, bool async) + struct file *ring_file, int ring_fd, bool async) { struct io_submit_state state, *statep = NULL; struct io_kiocb *link = NULL; @@ -5870,13 +5869,12 @@ fail_req: break; } - if (io_op_defs[req->opcode].needs_mm && !*mm) { + if (io_op_defs[req->opcode].needs_mm && !current->mm) { if (unlikely(!mmget_not_zero(ctx->sqo_mm))) { err = -EFAULT; goto fail_req; } use_mm(ctx->sqo_mm); - *mm = ctx->sqo_mm; } req->needs_fixed_file = async; @@ -5902,10 +5900,19 @@ fail_req: return submitted; } +static inline void io_sq_thread_drop_mm(struct io_ring_ctx *ctx) +{ + struct mm_struct *mm = current->mm; + + if (mm) { + unuse_mm(mm); + mmput(mm); + } +} + static int io_sq_thread(void *data) { struct io_ring_ctx *ctx = data; - struct mm_struct *cur_mm = NULL; const struct cred *old_cred; mm_segment_t old_fs; DEFINE_WAIT(wait); @@ -5946,11 +5953,7 @@ static int io_sq_thread(void *data) * adding ourselves to the waitqueue, as the unuse/drop * may sleep. */ - if (cur_mm) { - unuse_mm(cur_mm); - mmput(cur_mm); - cur_mm = NULL; - } + io_sq_thread_drop_mm(ctx); /* * We're polling. If we're within the defined idle @@ -6014,7 +6017,7 @@ static int io_sq_thread(void *data) } mutex_lock(&ctx->uring_lock); - ret = io_submit_sqes(ctx, to_submit, NULL, -1, &cur_mm, true); + ret = io_submit_sqes(ctx, to_submit, NULL, -1, true); mutex_unlock(&ctx->uring_lock); timeout = jiffies + ctx->sq_thread_idle; } @@ -6023,10 +6026,7 @@ static int io_sq_thread(void *data) task_work_run(); set_fs(old_fs); - if (cur_mm) { - unuse_mm(cur_mm); - mmput(cur_mm); - } + io_sq_thread_drop_mm(ctx); revert_creds(old_cred); kthread_parkme(); @@ -7507,13 +7507,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, wake_up(&ctx->sqo_wait); submitted = to_submit; } else if (to_submit) { - struct mm_struct *cur_mm; - mutex_lock(&ctx->uring_lock); - /* already have mm, so io_submit_sqes() won't try to grab it */ - cur_mm = ctx->sqo_mm; - submitted = io_submit_sqes(ctx, to_submit, f.file, fd, - &cur_mm, false); + submitted = io_submit_sqes(ctx, to_submit, f.file, fd, false); mutex_unlock(&ctx->uring_lock); if (submitted != to_submit) From 1d4240cc9e7bb101dac58f30283fa24a809f5606 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 12 Apr 2020 02:05:03 +0300 Subject: [PATCH 087/331] io_uring: early submission req fail code Having only one place for cleaning up a request after a link assembly/ submission failure will play handy in the future. At least it allows to remove duplicated cleanup sequence. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 50 +++++++++++++++++++------------------------------- 1 file changed, 19 insertions(+), 31 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f7825d3de400..ff10bd49a619 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5608,7 +5608,7 @@ static inline void io_queue_link_head(struct io_kiocb *req) IOSQE_IO_HARDLINK | IOSQE_ASYNC | \ IOSQE_BUFFER_SELECT) -static bool io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, +static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, struct io_submit_state *state, struct io_kiocb **link) { struct io_ring_ctx *ctx = req->ctx; @@ -5618,24 +5618,18 @@ static bool io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, sqe_flags = READ_ONCE(sqe->flags); /* enforce forwards compatibility on users */ - if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) { - ret = -EINVAL; - goto err_req; - } + if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) + return -EINVAL; if ((sqe_flags & IOSQE_BUFFER_SELECT) && - !io_op_defs[req->opcode].buffer_select) { - ret = -EOPNOTSUPP; - goto err_req; - } + !io_op_defs[req->opcode].buffer_select) + return -EOPNOTSUPP; id = READ_ONCE(sqe->personality); if (id) { req->work.creds = idr_find(&ctx->personality_idr, id); - if (unlikely(!req->work.creds)) { - ret = -EINVAL; - goto err_req; - } + if (unlikely(!req->work.creds)) + return -EINVAL; get_cred(req->work.creds); } @@ -5646,12 +5640,8 @@ static bool io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, fd = READ_ONCE(sqe->fd); ret = io_req_set_file(state, req, fd, sqe_flags); - if (unlikely(ret)) { -err_req: - io_cqring_add_event(req, ret); - io_double_put_req(req); - return false; - } + if (unlikely(ret)) + return ret; /* * If we already have a head request, queue this one for async @@ -5674,16 +5664,14 @@ err_req: head->flags |= REQ_F_IO_DRAIN; ctx->drain_next = 1; } - if (io_alloc_async_ctx(req)) { - ret = -EAGAIN; - goto err_req; - } + if (io_alloc_async_ctx(req)) + return -EAGAIN; ret = io_req_defer_prep(req, sqe); if (ret) { /* fail even hard links since we don't submit */ head->flags |= REQ_F_FAIL_LINK; - goto err_req; + return ret; } trace_io_uring_link(ctx, req, head); list_add_tail(&req->link_list, &head->link_list); @@ -5702,10 +5690,9 @@ err_req: req->flags |= REQ_F_LINK; INIT_LIST_HEAD(&req->link_list); - if (io_alloc_async_ctx(req)) { - ret = -EAGAIN; - goto err_req; - } + if (io_alloc_async_ctx(req)) + return -EAGAIN; + ret = io_req_defer_prep(req, sqe); if (ret) req->flags |= REQ_F_FAIL_LINK; @@ -5715,7 +5702,7 @@ err_req: } } - return true; + return 0; } /* @@ -5880,8 +5867,9 @@ fail_req: req->needs_fixed_file = async; trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data, true, async); - if (!io_submit_sqe(req, sqe, statep, &link)) - break; + err = io_submit_sqe(req, sqe, statep, &link); + if (err) + goto fail_req; } if (unlikely(submitted != nr)) { From dea3b49c7fb09b4f6b6a574c0485ffeb9df7b69c Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 12 Apr 2020 02:05:04 +0300 Subject: [PATCH 088/331] io_uring: keep all sqe->flags in req->flags It's a good idea to not read sqe->flags twice, as it's prone to security bugs. Instead of passing it around, embeed them in req->flags. It's already so except for IOSQE_IO_LINK. 1. rename former REQ_F_LINK -> REQ_F_LINK_HEAD 2. introduce and copy REQ_F_LINK, which mimics IO_IOSQE_LINK And leave req_set_fail_links() using new REQ_F_LINK, because it's more sensible. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index ff10bd49a619..b0e1bdfe0a43 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -508,6 +508,7 @@ enum { REQ_F_FORCE_ASYNC_BIT = IOSQE_ASYNC_BIT, REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT, + REQ_F_LINK_HEAD_BIT, REQ_F_LINK_NEXT_BIT, REQ_F_FAIL_LINK_BIT, REQ_F_INFLIGHT_BIT, @@ -543,6 +544,8 @@ enum { /* IOSQE_BUFFER_SELECT */ REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + /* head of a link */ + REQ_F_LINK_HEAD = BIT(REQ_F_LINK_HEAD_BIT), /* already grabbed next link */ REQ_F_LINK_NEXT = BIT(REQ_F_LINK_NEXT_BIT), /* fail rest of links */ @@ -1437,7 +1440,7 @@ static bool io_link_cancel_timeout(struct io_kiocb *req) if (ret != -1) { io_cqring_fill_event(req, -ECANCELED); io_commit_cqring(ctx); - req->flags &= ~REQ_F_LINK; + req->flags &= ~REQ_F_LINK_HEAD; io_put_req(req); return true; } @@ -1473,7 +1476,7 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr) list_del_init(&req->link_list); if (!list_empty(&nxt->link_list)) - nxt->flags |= REQ_F_LINK; + nxt->flags |= REQ_F_LINK_HEAD; *nxtptr = nxt; break; } @@ -1484,7 +1487,7 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr) } /* - * Called if REQ_F_LINK is set, and we fail the head request + * Called if REQ_F_LINK_HEAD is set, and we fail the head request */ static void io_fail_links(struct io_kiocb *req) { @@ -1517,7 +1520,7 @@ static void io_fail_links(struct io_kiocb *req) static void io_req_find_next(struct io_kiocb *req, struct io_kiocb **nxt) { - if (likely(!(req->flags & REQ_F_LINK))) + if (likely(!(req->flags & REQ_F_LINK_HEAD))) return; /* @@ -1669,7 +1672,7 @@ static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx) static inline bool io_req_multi_free(struct req_batch *rb, struct io_kiocb *req) { - if ((req->flags & REQ_F_LINK) || io_is_fallback_req(req)) + if ((req->flags & REQ_F_LINK_HEAD) || io_is_fallback_req(req)) return false; if (!(req->flags & REQ_F_FIXED_FILE) || req->io) @@ -2562,7 +2565,7 @@ static int io_read(struct io_kiocb *req, bool force_nonblock) req->result = 0; io_size = ret; - if (req->flags & REQ_F_LINK) + if (req->flags & REQ_F_LINK_HEAD) req->result = io_size; /* @@ -2653,7 +2656,7 @@ static int io_write(struct io_kiocb *req, bool force_nonblock) req->result = 0; io_size = ret; - if (req->flags & REQ_F_LINK) + if (req->flags & REQ_F_LINK_HEAD) req->result = io_size; /* @@ -5476,7 +5479,7 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req) { struct io_kiocb *nxt; - if (!(req->flags & REQ_F_LINK)) + if (!(req->flags & REQ_F_LINK_HEAD)) return NULL; /* for polled retry, if flag is set, we already went through here */ if (req->flags & REQ_F_POLLED) @@ -5636,7 +5639,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, /* same numerical values with corresponding REQ_F_*, safe to copy */ req->flags |= sqe_flags & (IOSQE_IO_DRAIN | IOSQE_IO_HARDLINK | IOSQE_ASYNC | IOSQE_FIXED_FILE | - IOSQE_BUFFER_SELECT); + IOSQE_BUFFER_SELECT | IOSQE_IO_LINK); fd = READ_ONCE(sqe->fd); ret = io_req_set_file(state, req, fd, sqe_flags); @@ -5687,7 +5690,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, req->ctx->drain_next = 0; } if (sqe_flags & (IOSQE_IO_LINK|IOSQE_IO_HARDLINK)) { - req->flags |= REQ_F_LINK; + req->flags |= REQ_F_LINK_HEAD; INIT_LIST_HEAD(&req->link_list); if (io_alloc_async_ctx(req)) From ef4ff581102a917a69877feca2e5347e2f3e458c Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 12 Apr 2020 02:05:05 +0300 Subject: [PATCH 089/331] io_uring: move all request init code in one place Requests initialisation is scattered across several functions, namely io_init_req(), io_submit_sqes(), io_submit_sqe(). Put it in io_init_req() for better data locality and code clarity. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 104 +++++++++++++++++++++++++------------------------- 1 file changed, 52 insertions(+), 52 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index b0e1bdfe0a43..c0cf57764329 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5607,44 +5607,11 @@ static inline void io_queue_link_head(struct io_kiocb *req) io_queue_sqe(req, NULL); } -#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \ - IOSQE_IO_HARDLINK | IOSQE_ASYNC | \ - IOSQE_BUFFER_SELECT) - static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, struct io_submit_state *state, struct io_kiocb **link) { struct io_ring_ctx *ctx = req->ctx; - unsigned int sqe_flags; - int ret, id, fd; - - sqe_flags = READ_ONCE(sqe->flags); - - /* enforce forwards compatibility on users */ - if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) - return -EINVAL; - - if ((sqe_flags & IOSQE_BUFFER_SELECT) && - !io_op_defs[req->opcode].buffer_select) - return -EOPNOTSUPP; - - id = READ_ONCE(sqe->personality); - if (id) { - req->work.creds = idr_find(&ctx->personality_idr, id); - if (unlikely(!req->work.creds)) - return -EINVAL; - get_cred(req->work.creds); - } - - /* same numerical values with corresponding REQ_F_*, safe to copy */ - req->flags |= sqe_flags & (IOSQE_IO_DRAIN | IOSQE_IO_HARDLINK | - IOSQE_ASYNC | IOSQE_FIXED_FILE | - IOSQE_BUFFER_SELECT | IOSQE_IO_LINK); - - fd = READ_ONCE(sqe->fd); - ret = io_req_set_file(state, req, fd, sqe_flags); - if (unlikely(ret)) - return ret; + int ret; /* * If we already have a head request, queue this one for async @@ -5663,7 +5630,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, * next after the link request. The last one is done via * drain_next flag to persist the effect across calls. */ - if (sqe_flags & IOSQE_IO_DRAIN) { + if (req->flags & REQ_F_IO_DRAIN) { head->flags |= REQ_F_IO_DRAIN; ctx->drain_next = 1; } @@ -5680,16 +5647,16 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, list_add_tail(&req->link_list, &head->link_list); /* last request of a link, enqueue the link */ - if (!(sqe_flags & (IOSQE_IO_LINK|IOSQE_IO_HARDLINK))) { + if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) { io_queue_link_head(head); *link = NULL; } } else { if (unlikely(ctx->drain_next)) { req->flags |= REQ_F_IO_DRAIN; - req->ctx->drain_next = 0; + ctx->drain_next = 0; } - if (sqe_flags & (IOSQE_IO_LINK|IOSQE_IO_HARDLINK)) { + if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) { req->flags |= REQ_F_LINK_HEAD; INIT_LIST_HEAD(&req->link_list); @@ -5779,9 +5746,17 @@ static inline void io_consume_sqe(struct io_ring_ctx *ctx) ctx->cached_sq_head++; } -static void io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) +#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \ + IOSQE_IO_HARDLINK | IOSQE_ASYNC | \ + IOSQE_BUFFER_SELECT) + +static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, + struct io_submit_state *state, bool async) { + unsigned int sqe_flags; + int id, fd; + /* * All io need record the previous position, if LINK vs DARIN, * it can be used to mark the position of the first IO in the @@ -5798,7 +5773,42 @@ static void io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, refcount_set(&req->refs, 2); req->task = NULL; req->result = 0; + req->needs_fixed_file = async; INIT_IO_WORK(&req->work, io_wq_submit_work); + + if (unlikely(req->opcode >= IORING_OP_LAST)) + return -EINVAL; + + if (io_op_defs[req->opcode].needs_mm && !current->mm) { + if (unlikely(!mmget_not_zero(ctx->sqo_mm))) + return -EFAULT; + use_mm(ctx->sqo_mm); + } + + sqe_flags = READ_ONCE(sqe->flags); + /* enforce forwards compatibility on users */ + if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) + return -EINVAL; + + if ((sqe_flags & IOSQE_BUFFER_SELECT) && + !io_op_defs[req->opcode].buffer_select) + return -EOPNOTSUPP; + + id = READ_ONCE(sqe->personality); + if (id) { + req->work.creds = idr_find(&ctx->personality_idr, id); + if (unlikely(!req->work.creds)) + return -EINVAL; + get_cred(req->work.creds); + } + + /* same numerical values with corresponding REQ_F_*, safe to copy */ + req->flags |= sqe_flags & (IOSQE_IO_DRAIN | IOSQE_IO_HARDLINK | + IOSQE_ASYNC | IOSQE_FIXED_FILE | + IOSQE_BUFFER_SELECT | IOSQE_IO_LINK); + + fd = READ_ONCE(sqe->fd); + return io_req_set_file(state, req, fd, sqe_flags); } static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, @@ -5846,28 +5856,18 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, break; } - io_init_req(ctx, req, sqe); + err = io_init_req(ctx, req, sqe, statep, async); io_consume_sqe(ctx); /* will complete beyond this point, count as submitted */ submitted++; - if (unlikely(req->opcode >= IORING_OP_LAST)) { - err = -EINVAL; + if (unlikely(err)) { fail_req: io_cqring_add_event(req, err); io_double_put_req(req); break; } - if (io_op_defs[req->opcode].needs_mm && !current->mm) { - if (unlikely(!mmget_not_zero(ctx->sqo_mm))) { - err = -EFAULT; - goto fail_req; - } - use_mm(ctx->sqo_mm); - } - - req->needs_fixed_file = async; trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data, true, async); err = io_submit_sqe(req, sqe, statep, &link); From b1f573bd15fda2e19ea66a4d26fae8be1b12791d Mon Sep 17 00:00:00 2001 From: Xiaoguang Wang Date: Sun, 12 Apr 2020 14:50:54 +0800 Subject: [PATCH 090/331] io_uring: restore req->work when canceling poll request When running liburing test case 'accept', I got below warning: RED: Invalid credentials RED: At include/linux/cred.h:285 RED: Specified credentials: 00000000d02474a0 RED: ->magic=4b, put_addr=000000005b4f46e9 RED: ->usage=-1699227648, subscr=-25693 RED: ->*uid = { 256,-25693,-25693,65534 } RED: ->*gid = { 0,-1925859360,-1789740800,-1827028688 } RED: ->security is 00000000258c136e eneral protection fault, probably for non-canonical address 0xdead4ead00000000: 0000 [#1] SMP PTI PU: 21 PID: 2037 Comm: accept Not tainted 5.6.0+ #318 ardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 IP: 0010:dump_invalid_creds+0x16f/0x184 ode: 48 8b 83 88 00 00 00 48 3d ff 0f 00 00 76 29 48 89 c2 81 e2 00 ff ff ff 48 81 fa 00 6b 6b 6b 74 17 5b 48 c7 c7 4b b1 10 8e 5d <8b> 50 04 41 5c 8b 30 41 5d e9 67 e3 04 00 5b 5d 41 5c 41 5d c3 0f SP: 0018:ffffacc1039dfb38 EFLAGS: 00010087 AX: dead4ead00000000 RBX: ffff9ba39319c100 RCX: 0000000000000007 DX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e10b14b BP: ffffffff8e108476 R08: 0000000000000000 R09: 0000000000000001 10: 0000000000000000 R11: ffffacc1039df9e5 R12: 000000009552b900 13: 000000009319c130 R14: ffff9ba39319c100 R15: 0000000000000246 S: 00007f96b2bfc4c0(0000) GS:ffff9ba39f340000(0000) knlGS:0000000000000000 S: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 R2: 0000000000401870 CR3: 00000007db7a4000 CR4: 00000000000006e0 all Trace: __invalid_creds+0x48/0x4a __io_req_aux_free+0x2e8/0x3b0 ? io_poll_remove_one+0x2a/0x1d0 __io_free_req+0x18/0x200 io_free_req+0x31/0x350 io_poll_remove_one+0x17f/0x1d0 io_poll_cancel.isra.80+0x6c/0x80 io_async_find_and_cancel+0x111/0x120 io_issue_sqe+0x181/0x10e0 ? __lock_acquire+0x552/0xae0 ? lock_acquire+0x8e/0x310 ? fs_reclaim_acquire.part.97+0x5/0x30 __io_queue_sqe.part.100+0xc4/0x580 ? io_submit_sqes+0x751/0xbd0 ? rcu_read_lock_sched_held+0x32/0x40 io_submit_sqes+0x9ba/0xbd0 ? __x64_sys_io_uring_enter+0x2b2/0x460 ? __x64_sys_io_uring_enter+0xaf/0x460 ? find_held_lock+0x2d/0x90 ? __x64_sys_io_uring_enter+0x111/0x460 __x64_sys_io_uring_enter+0x2d7/0x460 do_syscall_64+0x5a/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xb3 After looking into codes, it turns out that this issue is because we didn't restore the req->work, which is changed in io_arm_poll_handler(), req->work is a union with below struct: struct { struct callback_head task_work; struct hlist_node hash_node; struct async_poll *apoll; }; If we forget to restore, members in struct io_wq_work would be invalid, restore the req->work to fix this issue. Signed-off-by: Xiaoguang Wang Get rid of not needed 'need_restore' variable. Signed-off-by: Jens Axboe --- fs/io_uring.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index c0cf57764329..68a678a0056b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4318,11 +4318,13 @@ static bool __io_poll_remove_one(struct io_kiocb *req, static bool io_poll_remove_one(struct io_kiocb *req) { + struct async_poll *apoll = NULL; bool do_complete; if (req->opcode == IORING_OP_POLL_ADD) { do_complete = __io_poll_remove_one(req, &req->poll); } else { + apoll = req->apoll; /* non-poll requests have submit ref still */ do_complete = __io_poll_remove_one(req, &req->apoll->poll); if (do_complete) @@ -4331,6 +4333,14 @@ static bool io_poll_remove_one(struct io_kiocb *req) hash_del(&req->hash_node); + if (apoll) { + /* + * restore ->work because we need to call io_req_work_drop_env. + */ + memcpy(&req->work, &apoll->work, sizeof(req->work)); + kfree(apoll); + } + if (do_complete) { io_cqring_fill_event(req, -ECANCELED); io_commit_cqring(req->ctx); From 465aa30420bc730ad8f0fe235bc80d169e4b5831 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 10 Apr 2020 20:11:50 +0100 Subject: [PATCH 091/331] net: neterion: remove redundant assignment to variable tmp64 The variable tmp64 is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/neterion/s2io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/neterion/s2io.c b/drivers/net/ethernet/neterion/s2io.c index 0ec6b8e8b549..67e62603fe3b 100644 --- a/drivers/net/ethernet/neterion/s2io.c +++ b/drivers/net/ethernet/neterion/s2io.c @@ -5155,7 +5155,7 @@ static int do_s2io_delete_unicast_mc(struct s2io_nic *sp, u64 addr) /* read mac entries from CAM */ static u64 do_s2io_read_unicast_mc(struct s2io_nic *sp, int offset) { - u64 tmp64 = 0xffffffffffff0000ULL, val64; + u64 tmp64, val64; struct XENA_dev_config __iomem *bar0 = sp->bar0; /* read mac addr */ From 2ba538989479b3ff34f66728ecfbaf3c5daf0797 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 11 Apr 2020 09:30:04 +0200 Subject: [PATCH 092/331] soc: qcom: ipa: Add a missing '\n' in a log message Message logged by 'dev_xxx()' or 'pr_xxx()' should end with a '\n'. Fixes: a646d6ec9098 ("soc: qcom: ipa: modem and microcontroller") Signed-off-by: Christophe JAILLET Signed-off-by: Jakub Kicinski --- drivers/net/ipa/ipa_modem.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c index 55c9329a4b1d..ed10818dd99f 100644 --- a/drivers/net/ipa/ipa_modem.c +++ b/drivers/net/ipa/ipa_modem.c @@ -297,14 +297,13 @@ static void ipa_modem_crashed(struct ipa *ipa) ret = ipa_endpoint_modem_exception_reset_all(ipa); if (ret) - dev_err(dev, "error %d resetting exception endpoint", - ret); + dev_err(dev, "error %d resetting exception endpoint\n", ret); ipa_endpoint_modem_pause_all(ipa, false); ret = ipa_modem_stop(ipa); if (ret) - dev_err(dev, "error %d stopping modem", ret); + dev_err(dev, "error %d stopping modem\n", ret); /* Now prepare for the next modem boot */ ret = ipa_mem_zero_modem(ipa); From e6aaeafd56e33345f1d242cde33dd92614734be8 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 11 Apr 2020 09:52:11 +0200 Subject: [PATCH 093/331] net: ethernet: ti: Add missing '\n' in log messages Message logged by 'dev_xxx()' or 'pr_xxx()' should end with a '\n'. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Christophe JAILLET Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index f71c15c39492..2bf56733ba94 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -1372,7 +1372,7 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) err: i = devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common); if (i) { - dev_err(dev, "failed to add free_tx_chns action %d", i); + dev_err(dev, "Failed to add free_tx_chns action %d\n", i); return i; } @@ -1481,7 +1481,7 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common) err: i = devm_add_action(dev, am65_cpsw_nuss_free_rx_chns, common); if (i) { - dev_err(dev, "failed to add free_rx_chns action %d", i); + dev_err(dev, "Failed to add free_rx_chns action %d\n", i); return i; } @@ -1691,7 +1691,7 @@ static int am65_cpsw_nuss_init_ndev_2g(struct am65_cpsw_common *common) ret = devm_add_action_or_reset(dev, am65_cpsw_pcpu_stats_free, ndev_priv->stats); if (ret) { - dev_err(dev, "failed to add percpu stat free action %d", ret); + dev_err(dev, "Failed to add percpu stat free action %d\n", ret); return ret; } From eaec2b0bd30690575c581eebffae64bfb7f684ac Mon Sep 17 00:00:00 2001 From: Zhiqiang Liu Date: Mon, 30 Mar 2020 10:18:33 +0800 Subject: [PATCH 094/331] signal: check sig before setting info in kill_pid_usb_asyncio In kill_pid_usb_asyncio, if signal is not valid, we do not need to set info struct. Signed-off-by: Zhiqiang Liu Acked-by: Christian Brauner Link: https://lore.kernel.org/r/f525fd08-1cf7-fb09-d20c-4359145eb940@huawei.com Signed-off-by: Christian Brauner --- kernel/signal.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index e58a6c619824..3f94894d1253 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1510,15 +1510,15 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, unsigned long flags; int ret = -EINVAL; + if (!valid_signal(sig)) + return ret; + clear_siginfo(&info); info.si_signo = sig; info.si_errno = errno; info.si_code = SI_ASYNCIO; *((sigval_t *)&info.si_pid) = addr; - if (!valid_signal(sig)) - return ret; - rcu_read_lock(); p = pid_task(pid, PIDTYPE_PID); if (!p) { From 3075afdf15b89a063f8d31c0db08a50472bb7faf Mon Sep 17 00:00:00 2001 From: Zhiqiang Liu Date: Mon, 30 Mar 2020 10:44:43 +0800 Subject: [PATCH 095/331] signal: use kill_proc_info instead of kill_pid_info in kill_something_info signal.c provides kill_proc_info, we can use it instead of kill_pid_info in kill_something_info func gracefully. Signed-off-by: Zhiqiang Liu Acked-by: Oleg Nesterov Acked-by: Christian Brauner Link: https://lore.kernel.org/r/80236965-f0b5-c888-95ff-855bdec75bb3@huawei.com Signed-off-by: Christian Brauner --- kernel/signal.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 3f94894d1253..713104884414 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1557,12 +1557,8 @@ static int kill_something_info(int sig, struct kernel_siginfo *info, pid_t pid) { int ret; - if (pid > 0) { - rcu_read_lock(); - ret = kill_pid_info(sig, info, find_vpid(pid)); - rcu_read_unlock(); - return ret; - } + if (pid > 0) + return kill_proc_info(sig, info, pid); /* -INT_MIN is undefined. Exclude this case to avoid a UBSAN warning */ if (pid == INT_MIN) From 37d59d10a80132406279001b1fbe9eb88be5a30a Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 1 Apr 2020 08:24:56 -0700 Subject: [PATCH 096/331] hwmon: (pmbus/isl68137) Fix up chip IDs I2C chip IDs need to reflect chip names, not chip functionality. Fixes: f621d61fd59f ("hwmon: (pmbus) add support for 2nd Gen Renesas digital multiphase") Cc: Grant Peltier Signed-off-by: Guenter Roeck --- Documentation/hwmon/isl68137.rst | 76 +++++++++++++------------- drivers/hwmon/pmbus/isl68137.c | 92 +++++++++++++++++++++++++++++--- 2 files changed, 123 insertions(+), 45 deletions(-) diff --git a/Documentation/hwmon/isl68137.rst b/Documentation/hwmon/isl68137.rst index cc4b61447b63..0e71b22047f8 100644 --- a/Documentation/hwmon/isl68137.rst +++ b/Documentation/hwmon/isl68137.rst @@ -16,7 +16,7 @@ Supported chips: * Renesas ISL68220 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl68220' Addresses scanned: - @@ -26,7 +26,7 @@ Supported chips: * Renesas ISL68221 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl68221' Addresses scanned: - @@ -36,7 +36,7 @@ Supported chips: * Renesas ISL68222 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl68222' Addresses scanned: - @@ -46,7 +46,7 @@ Supported chips: * Renesas ISL68223 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl68223' Addresses scanned: - @@ -56,7 +56,7 @@ Supported chips: * Renesas ISL68224 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl68224' Addresses scanned: - @@ -66,7 +66,7 @@ Supported chips: * Renesas ISL68225 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl68225' Addresses scanned: - @@ -76,7 +76,7 @@ Supported chips: * Renesas ISL68226 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl68226' Addresses scanned: - @@ -86,7 +86,7 @@ Supported chips: * Renesas ISL68227 - Prefix: 'raa_dmpvr2_1rail' + Prefix: 'isl68227' Addresses scanned: - @@ -96,7 +96,7 @@ Supported chips: * Renesas ISL68229 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl68229' Addresses scanned: - @@ -106,7 +106,7 @@ Supported chips: * Renesas ISL68233 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl68233' Addresses scanned: - @@ -116,7 +116,7 @@ Supported chips: * Renesas ISL68239 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl68239' Addresses scanned: - @@ -126,7 +126,7 @@ Supported chips: * Renesas ISL69222 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69222' Addresses scanned: - @@ -136,7 +136,7 @@ Supported chips: * Renesas ISL69223 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl69223' Addresses scanned: - @@ -146,7 +146,7 @@ Supported chips: * Renesas ISL69224 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69224' Addresses scanned: - @@ -156,7 +156,7 @@ Supported chips: * Renesas ISL69225 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69225' Addresses scanned: - @@ -166,7 +166,7 @@ Supported chips: * Renesas ISL69227 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl69227' Addresses scanned: - @@ -176,7 +176,7 @@ Supported chips: * Renesas ISL69228 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl69228' Addresses scanned: - @@ -186,7 +186,7 @@ Supported chips: * Renesas ISL69234 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69234' Addresses scanned: - @@ -196,7 +196,7 @@ Supported chips: * Renesas ISL69236 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69236' Addresses scanned: - @@ -206,7 +206,7 @@ Supported chips: * Renesas ISL69239 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl69239' Addresses scanned: - @@ -216,7 +216,7 @@ Supported chips: * Renesas ISL69242 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69242' Addresses scanned: - @@ -226,7 +226,7 @@ Supported chips: * Renesas ISL69243 - Prefix: 'raa_dmpvr2_1rail' + Prefix: 'isl69243' Addresses scanned: - @@ -236,7 +236,7 @@ Supported chips: * Renesas ISL69247 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69247' Addresses scanned: - @@ -246,7 +246,7 @@ Supported chips: * Renesas ISL69248 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69248' Addresses scanned: - @@ -256,7 +256,7 @@ Supported chips: * Renesas ISL69254 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69254' Addresses scanned: - @@ -266,7 +266,7 @@ Supported chips: * Renesas ISL69255 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69255' Addresses scanned: - @@ -276,7 +276,7 @@ Supported chips: * Renesas ISL69256 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69256' Addresses scanned: - @@ -286,7 +286,7 @@ Supported chips: * Renesas ISL69259 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69259' Addresses scanned: - @@ -296,7 +296,7 @@ Supported chips: * Renesas ISL69260 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69260' Addresses scanned: - @@ -306,7 +306,7 @@ Supported chips: * Renesas ISL69268 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69268' Addresses scanned: - @@ -316,7 +316,7 @@ Supported chips: * Renesas ISL69269 - Prefix: 'raa_dmpvr2_3rail' + Prefix: 'isl69269' Addresses scanned: - @@ -326,7 +326,7 @@ Supported chips: * Renesas ISL69298 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'isl69298' Addresses scanned: - @@ -336,7 +336,7 @@ Supported chips: * Renesas RAA228000 - Prefix: 'raa_dmpvr2_hv' + Prefix: 'raa228000' Addresses scanned: - @@ -346,7 +346,7 @@ Supported chips: * Renesas RAA228004 - Prefix: 'raa_dmpvr2_hv' + Prefix: 'raa228004' Addresses scanned: - @@ -356,7 +356,7 @@ Supported chips: * Renesas RAA228006 - Prefix: 'raa_dmpvr2_hv' + Prefix: 'raa228006' Addresses scanned: - @@ -366,7 +366,7 @@ Supported chips: * Renesas RAA228228 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'raa228228' Addresses scanned: - @@ -376,7 +376,7 @@ Supported chips: * Renesas RAA229001 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'raa229001' Addresses scanned: - @@ -386,7 +386,7 @@ Supported chips: * Renesas RAA229004 - Prefix: 'raa_dmpvr2_2rail' + Prefix: 'raa229004' Addresses scanned: - diff --git a/drivers/hwmon/pmbus/isl68137.c b/drivers/hwmon/pmbus/isl68137.c index 4d2315208bb5..0c622711ef7e 100644 --- a/drivers/hwmon/pmbus/isl68137.c +++ b/drivers/hwmon/pmbus/isl68137.c @@ -21,8 +21,50 @@ #define ISL68137_VOUT_AVS 0x30 #define RAA_DMPVR2_READ_VMON 0xc8 -enum versions { +enum chips { isl68137, + isl68220, + isl68221, + isl68222, + isl68223, + isl68224, + isl68225, + isl68226, + isl68227, + isl68229, + isl68233, + isl68239, + isl69222, + isl69223, + isl69224, + isl69225, + isl69227, + isl69228, + isl69234, + isl69236, + isl69239, + isl69242, + isl69243, + isl69247, + isl69248, + isl69254, + isl69255, + isl69256, + isl69259, + isl69260, + isl69268, + isl69269, + isl69298, + raa228000, + raa228004, + raa228006, + raa228228, + raa229001, + raa229004, +}; + +enum variants { + raa_dmpvr1_2rail, raa_dmpvr2_1rail, raa_dmpvr2_2rail, raa_dmpvr2_3rail, @@ -186,7 +228,7 @@ static int isl68137_probe(struct i2c_client *client, memcpy(info, &raa_dmpvr_info, sizeof(*info)); switch (id->driver_data) { - case isl68137: + case raa_dmpvr1_2rail: info->pages = 2; info->R[PSC_VOLTAGE_IN] = 3; info->func[0] &= ~PMBUS_HAVE_VMON; @@ -224,11 +266,47 @@ static int isl68137_probe(struct i2c_client *client, } static const struct i2c_device_id raa_dmpvr_id[] = { - {"isl68137", isl68137}, - {"raa_dmpvr2_1rail", raa_dmpvr2_1rail}, - {"raa_dmpvr2_2rail", raa_dmpvr2_2rail}, - {"raa_dmpvr2_3rail", raa_dmpvr2_3rail}, - {"raa_dmpvr2_hv", raa_dmpvr2_hv}, + {"isl68137", raa_dmpvr1_2rail}, + {"isl68220", raa_dmpvr2_2rail}, + {"isl68221", raa_dmpvr2_3rail}, + {"isl68222", raa_dmpvr2_2rail}, + {"isl68223", raa_dmpvr2_2rail}, + {"isl68224", raa_dmpvr2_3rail}, + {"isl68225", raa_dmpvr2_2rail}, + {"isl68226", raa_dmpvr2_3rail}, + {"isl68227", raa_dmpvr2_1rail}, + {"isl68229", raa_dmpvr2_3rail}, + {"isl68233", raa_dmpvr2_2rail}, + {"isl68239", raa_dmpvr2_3rail}, + + {"isl69222", raa_dmpvr2_2rail}, + {"isl69223", raa_dmpvr2_3rail}, + {"isl69224", raa_dmpvr2_2rail}, + {"isl69225", raa_dmpvr2_2rail}, + {"isl69227", raa_dmpvr2_3rail}, + {"isl69228", raa_dmpvr2_3rail}, + {"isl69234", raa_dmpvr2_2rail}, + {"isl69236", raa_dmpvr2_2rail}, + {"isl69239", raa_dmpvr2_3rail}, + {"isl69242", raa_dmpvr2_2rail}, + {"isl69243", raa_dmpvr2_1rail}, + {"isl69247", raa_dmpvr2_2rail}, + {"isl69248", raa_dmpvr2_2rail}, + {"isl69254", raa_dmpvr2_2rail}, + {"isl69255", raa_dmpvr2_2rail}, + {"isl69256", raa_dmpvr2_2rail}, + {"isl69259", raa_dmpvr2_2rail}, + {"isl69260", raa_dmpvr2_2rail}, + {"isl69268", raa_dmpvr2_2rail}, + {"isl69269", raa_dmpvr2_3rail}, + {"isl69298", raa_dmpvr2_2rail}, + + {"raa228000", raa_dmpvr2_hv}, + {"raa228004", raa_dmpvr2_hv}, + {"raa228006", raa_dmpvr2_hv}, + {"raa228228", raa_dmpvr2_2rail}, + {"raa229001", raa_dmpvr2_2rail}, + {"raa229004", raa_dmpvr2_2rail}, {} }; From 6bdf8f3efe867c5893e27431a555e41f54ed7f9a Mon Sep 17 00:00:00 2001 From: Ann T Ropea Date: Tue, 7 Apr 2020 01:55:21 +0200 Subject: [PATCH 097/331] hwmon: (drivetemp) Use drivetemp's true module name in Kconfig section The addition of the support for reading the temperature of ATA drives as per commit 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") lists in the respective Kconfig section the name of the module to be optionally built as "satatemp". However, building the kernel modules with "CONFIG_SENSORS_DRIVETEMP=m", does not generate a file named "satatemp.ko". Instead, the rest of the original commit uses the term "drivetemp" and a file named "drivetemp.ko" ends up in the kernel's modules directory. This file has the right ingredients: $ strings /path/to/drivetemp.ko | grep ^description description=Hard drive temperature monitor and modprobing it produces the expected result: # drivetemp is not loaded $ sensors -u drivetemp-scsi-4-0 Specified sensor(s) not found! $ sudo modprobe drivetemp $ sensors -u drivetemp-scsi-4-0 drivetemp-scsi-4-0 Adapter: SCSI adapter temp1: temp1_input: 35.000 temp1_max: 60.000 temp1_min: 0.000 temp1_crit: 70.000 temp1_lcrit: -40.000 temp1_lowest: 20.000 temp1_highest: 36.000 Fix Kconfig by referring to the true name of the module. Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") Signed-off-by: Ann T Ropea Link: https://lore.kernel.org/r/20200406235521.185309-1-bedhanger@gmx.de Signed-off-by: Guenter Roeck --- drivers/hwmon/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index 05a30832c6ba..4c62f900bf7e 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -412,7 +412,7 @@ config SENSORS_DRIVETEMP hard disk drives. This driver can also be built as a module. If so, the module - will be called satatemp. + will be called drivetemp. config SENSORS_DS620 tristate "Dallas Semiconductor DS620" From ed08ebb7124e90a99420bb913d602907d377d03d Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 8 Apr 2020 20:37:30 -0700 Subject: [PATCH 098/331] hwmon: (drivetemp) Return -ENODATA for invalid temperatures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Holger Hoffstätte observed that Samsung 850 Pro may return invalid temperatures for a short period of time after resume. Return -ENODATA to userspace if this is observed. Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors") Reported-by: Holger Hoffstätte Cc: Holger Hoffstätte Signed-off-by: Guenter Roeck --- drivers/hwmon/drivetemp.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c index 370d0c74eb01..9179460c2d9d 100644 --- a/drivers/hwmon/drivetemp.c +++ b/drivers/hwmon/drivetemp.c @@ -264,12 +264,18 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val) return err; switch (attr) { case hwmon_temp_input: + if (!temp_is_valid(buf[SCT_STATUS_TEMP])) + return -ENODATA; *val = temp_from_sct(buf[SCT_STATUS_TEMP]); break; case hwmon_temp_lowest: + if (!temp_is_valid(buf[SCT_STATUS_TEMP_LOWEST])) + return -ENODATA; *val = temp_from_sct(buf[SCT_STATUS_TEMP_LOWEST]); break; case hwmon_temp_highest: + if (!temp_is_valid(buf[SCT_STATUS_TEMP_HIGHEST])) + return -ENODATA; *val = temp_from_sct(buf[SCT_STATUS_TEMP_HIGHEST]); break; default: From 0e786f328b382e6df64f31390973b81f8fb9a044 Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Thu, 9 Apr 2020 16:45:02 +0800 Subject: [PATCH 099/331] hwmon: (k10temp) make some symbols static Fix the following sparse warning: drivers/hwmon/k10temp.c:189:12: warning: symbol 'k10temp_temp_label' was not declared. Should it be static? drivers/hwmon/k10temp.c:202:12: warning: symbol 'k10temp_in_label' was not declared. Should it be static? drivers/hwmon/k10temp.c:207:12: warning: symbol 'k10temp_curr_label' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Link: https://lore.kernel.org/r/20200409084502.42126-1-yanaijie@huawei.com Signed-off-by: Guenter Roeck --- drivers/hwmon/k10temp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index 3f37d5d81fe4..9915578533bb 100644 --- a/drivers/hwmon/k10temp.c +++ b/drivers/hwmon/k10temp.c @@ -186,7 +186,7 @@ static long get_raw_temp(struct k10temp_data *data) return temp; } -const char *k10temp_temp_label[] = { +static const char *k10temp_temp_label[] = { "Tctl", "Tdie", "Tccd1", @@ -199,12 +199,12 @@ const char *k10temp_temp_label[] = { "Tccd8", }; -const char *k10temp_in_label[] = { +static const char *k10temp_in_label[] = { "Vcore", "Vsoc", }; -const char *k10temp_curr_label[] = { +static const char *k10temp_curr_label[] = { "Icore", "Isoc", }; From 88357580854aab29d27e1a443575caaedd081612 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Sun, 12 Apr 2020 21:12:49 -0600 Subject: [PATCH 100/331] io_uring: correct O_NONBLOCK check for splice punt The splice file punt check uses file->f_mode to check for O_NONBLOCK, but it should be checking file->f_flags. This leads to punting even for files that have O_NONBLOCK set, which isn't necessary. This equates to checking for FMODE_PATH, which will never be set on the fd in question. Fixes: 7d67af2c0134 ("io_uring: add splice(2) support") Signed-off-by: Jens Axboe --- fs/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 68a678a0056b..0d1b5d5f1251 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2763,7 +2763,7 @@ static bool io_splice_punt(struct file *file) return false; if (!io_file_supports_async(file)) return true; - return !(file->f_mode & O_NONBLOCK); + return !(file->f_flags & O_NONBLOCK); } static int io_splice(struct io_kiocb *req, bool force_nonblock) From 3fe260e00cd0bf0be853c48fcc1e19853df615bb Mon Sep 17 00:00:00 2001 From: Gilberto Bertin Date: Fri, 10 Apr 2020 18:20:59 +0200 Subject: [PATCH 101/331] net: tun: record RX queue in skb before do_xdp_generic() This allows netif_receive_generic_xdp() to correctly determine the RX queue from which the skb is coming, so that the context passed to the XDP program will contain the correct RX queue index. Signed-off-by: Gilberto Bertin Signed-off-by: David S. Miller --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 07476c6510f2..44889eba1dbc 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1888,6 +1888,7 @@ drop: skb_reset_network_header(skb); skb_probe_transport_header(skb); + skb_record_rx_queue(skb, tfile->queue_index); if (skb_xdp) { struct bpf_prog *xdp_prog; @@ -2459,6 +2460,7 @@ build: skb->protocol = eth_type_trans(skb, tun->dev); skb_reset_network_header(skb); skb_probe_transport_header(skb); + skb_record_rx_queue(skb, tfile->queue_index); if (skb_xdp) { err = do_xdp_generic(xdp_prog, skb); @@ -2470,7 +2472,6 @@ build: !tfile->detached) rxhash = __skb_get_hash_symmetric(skb); - skb_record_rx_queue(skb, tfile->queue_index); netif_receive_skb(skb); /* No need for get_cpu_ptr() here since this function is From e154659ba39a1c2be576aaa0a5bda8088d707950 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Sat, 11 Apr 2020 21:05:01 +0200 Subject: [PATCH 102/331] mptcp: fix double-unlock in mptcp_poll mptcp_connect/28740 is trying to release lock (sk_lock-AF_INET) at: [] mptcp_poll+0xb9/0x550 but there are no more locks to release! Call Trace: lock_release+0x50f/0x750 release_sock+0x171/0x1b0 mptcp_poll+0xb9/0x550 sock_poll+0x157/0x470 ? get_net_ns+0xb0/0xb0 do_sys_poll+0x63c/0xdd0 Problem is that __mptcp_tcp_fallback() releases the mptcp socket lock, but after recent change it doesn't do this in all of its return paths. To fix this, remove the unlock from __mptcp_tcp_fallback() and always do the unlock in the caller. Also add a small comment as to why we have this __mptcp_needs_tcp_fallback(). Fixes: 0b4f33def7bbde ("mptcp: fix tcp fallback crash") Reported-by: syzbot+e56606435b7bfeea8cf5@syzkaller.appspotmail.com Signed-off-by: Florian Westphal Signed-off-by: David S. Miller --- net/mptcp/protocol.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 939a5045181a..9936e33ac351 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -97,12 +97,7 @@ static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk) if (likely(!__mptcp_needs_tcp_fallback(msk))) return NULL; - if (msk->subflow) { - release_sock((struct sock *)msk); - return msk->subflow; - } - - return NULL; + return msk->subflow; } static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk) @@ -734,9 +729,10 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } +fallback: ssock = __mptcp_tcp_fallback(msk); if (unlikely(ssock)) { -fallback: + release_sock(sk); pr_debug("fallback passthrough"); ret = sock_sendmsg(ssock, msg); return ret >= 0 ? ret + copied : (copied ? copied : ret); @@ -769,8 +765,14 @@ fallback: if (ret < 0) break; if (ret == 0 && unlikely(__mptcp_needs_tcp_fallback(msk))) { + /* Can happen for passive sockets: + * 3WHS negotiated MPTCP, but first packet after is + * plain TCP (e.g. due to middlebox filtering unknown + * options). + * + * Fall back to TCP. + */ release_sock(ssk); - ssock = __mptcp_tcp_fallback(msk); goto fallback; } @@ -883,6 +885,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, ssock = __mptcp_tcp_fallback(msk); if (unlikely(ssock)) { fallback: + release_sock(sk); pr_debug("fallback-read subflow=%p", mptcp_subflow_ctx(ssock->sk)); copied = sock_recvmsg(ssock, msg, flags); @@ -1467,12 +1470,11 @@ static int mptcp_setsockopt(struct sock *sk, int level, int optname, */ lock_sock(sk); ssock = __mptcp_tcp_fallback(msk); + release_sock(sk); if (ssock) return tcp_setsockopt(ssock->sk, level, optname, optval, optlen); - release_sock(sk); - return -EOPNOTSUPP; } @@ -1492,12 +1494,11 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname, */ lock_sock(sk); ssock = __mptcp_tcp_fallback(msk); + release_sock(sk); if (ssock) return tcp_getsockopt(ssock->sk, level, optname, optval, option); - release_sock(sk); - return -EOPNOTSUPP; } From 664d035c4707ac8643c2846d1a1d4cdf9ce89b90 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 12 Apr 2020 23:20:34 +0200 Subject: [PATCH 103/331] net: mvneta: Fix a typo s/mvmeta/mvneta/ Signed-off-by: Christophe JAILLET Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 5be61f73b6ab..51889770958d 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -5383,7 +5383,7 @@ static int __init mvneta_driver_init(void) { int ret; - ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "net/mvmeta:online", + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "net/mvneta:online", mvneta_cpu_online, mvneta_cpu_down_prepare); if (ret < 0) From 5b69c23799ecfc279897e77f43625cc876d92765 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 7 Apr 2020 12:29:35 +0300 Subject: [PATCH 104/331] platform/chrome: cros_ec_sensorhub: Off by one in cros_sensorhub_send_sample() The sensorhub->push_data[] array has sensorhub->sensor_num elements. It's allocated in cros_ec_sensorhub_ring_add(). So the > should be >= to prevent a read one element beyond the end of the array. Fixes: 145d59baff59 ("platform/chrome: cros_ec_sensorhub: Add FIFO support") Signed-off-by: Dan Carpenter Reviewed-by: Guenter Roeck Signed-off-by: Enric Balletbo i Serra --- drivers/platform/chrome/cros_ec_sensorhub_ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/chrome/cros_ec_sensorhub_ring.c b/drivers/platform/chrome/cros_ec_sensorhub_ring.c index 230e6cf3da2f..85e8ba782f0c 100644 --- a/drivers/platform/chrome/cros_ec_sensorhub_ring.c +++ b/drivers/platform/chrome/cros_ec_sensorhub_ring.c @@ -40,7 +40,7 @@ cros_sensorhub_send_sample(struct cros_ec_sensorhub *sensorhub, int id = sample->sensor_id; struct iio_dev *indio_dev; - if (id > sensorhub->sensor_num) + if (id >= sensorhub->sensor_num) return -EINVAL; cb = sensorhub->push_data[id].push_data_cb; From 0e4e1de5b63fa423b13593337a27fd2d2b0bcf77 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Fri, 13 Mar 2020 11:20:51 +0100 Subject: [PATCH 105/331] rbd: avoid a deadlock on header_rwsem when flushing notifies rbd_unregister_watch() flushes notifies and therefore cannot be called under header_rwsem because a header update notify takes header_rwsem to synchronize with "rbd map". If mapping an image fails after the watch is established and a header update notify sneaks in, we deadlock when erroring out from rbd_dev_image_probe(). Move watch registration and unregistration out of the critical section. The only reason they were put there was to make header_rwsem management slightly more obvious. Fixes: 811c66887746 ("rbd: fix rbd map vs notify races") Signed-off-by: Ilya Dryomov Reviewed-by: Jason Dillaman --- drivers/block/rbd.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 1e0a6b19ae0d..ff2377e6d12c 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -4527,6 +4527,10 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev) cancel_work_sync(&rbd_dev->unlock_work); } +/* + * header_rwsem must not be held to avoid a deadlock with + * rbd_dev_refresh() when flushing notifies. + */ static void rbd_unregister_watch(struct rbd_device *rbd_dev) { cancel_tasks_sync(rbd_dev); @@ -6907,6 +6911,9 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev) * device. If this image is the one being mapped (i.e., not a * parent), initiate a watch on its header object before using that * object to get detailed information about the rbd image. + * + * On success, returns with header_rwsem held for write if called + * with @depth == 0. */ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) { @@ -6936,6 +6943,9 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) } } + if (!depth) + down_write(&rbd_dev->header_rwsem); + ret = rbd_dev_header_info(rbd_dev); if (ret) { if (ret == -ENOENT && !need_watch) @@ -6987,6 +6997,8 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) err_out_probe: rbd_dev_unprobe(rbd_dev); err_out_watch: + if (!depth) + up_write(&rbd_dev->header_rwsem); if (need_watch) rbd_unregister_watch(rbd_dev); err_out_format: @@ -7050,12 +7062,9 @@ static ssize_t do_rbd_add(struct bus_type *bus, goto err_out_rbd_dev; } - down_write(&rbd_dev->header_rwsem); rc = rbd_dev_image_probe(rbd_dev, 0); - if (rc < 0) { - up_write(&rbd_dev->header_rwsem); + if (rc < 0) goto err_out_rbd_dev; - } if (rbd_dev->opts->alloc_size > rbd_dev->layout.object_size) { rbd_warn(rbd_dev, "alloc_size adjusted to %u", From 952c48b0ed18919bff7528501e9a3fff8a24f8cd Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 16 Mar 2020 15:52:54 +0100 Subject: [PATCH 106/331] rbd: call rbd_dev_unprobe() after unwatching and flushing notifies rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(), including rbd_dev_header_info(), which means that rbd_dev_header_info() isn't supposed to be called after rbd_dev_unprobe(). However, rbd_dev_image_release() calls rbd_dev_unprobe() before rbd_unregister_watch(). This is racy because a header update notify can sneak in: "rbd unmap" thread ceph-watch-notify worker rbd_dev_image_release() rbd_dev_unprobe() free and zero out header rbd_watch_cb() rbd_dev_refresh() rbd_dev_header_info() read in header The same goes for "rbd map" because rbd_dev_image_probe() calls rbd_dev_unprobe() on errors. In both cases this results in a memory leak. Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()") Signed-off-by: Ilya Dryomov Reviewed-by: Jason Dillaman --- drivers/block/rbd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index ff2377e6d12c..7aec8bc5df6e 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -6898,9 +6898,10 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap) static void rbd_dev_image_release(struct rbd_device *rbd_dev) { - rbd_dev_unprobe(rbd_dev); if (rbd_dev->opts) rbd_unregister_watch(rbd_dev); + + rbd_dev_unprobe(rbd_dev); rbd_dev->image_format = 0; kfree(rbd_dev->spec->image_id); rbd_dev->spec->image_id = NULL; @@ -6950,7 +6951,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) if (ret) { if (ret == -ENOENT && !need_watch) rbd_print_dne(rbd_dev, false); - goto err_out_watch; + goto err_out_probe; } /* @@ -6995,12 +6996,11 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) return 0; err_out_probe: - rbd_dev_unprobe(rbd_dev); -err_out_watch: if (!depth) up_write(&rbd_dev->header_rwsem); if (need_watch) rbd_unregister_watch(rbd_dev); + rbd_dev_unprobe(rbd_dev); err_out_format: rbd_dev->image_format = 0; kfree(rbd_dev->spec->image_id); From b8776051529230f76e464d5ffc5d1cf8465576bf Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 16 Mar 2020 17:16:28 +0100 Subject: [PATCH 107/331] rbd: don't test rbd_dev->opts in rbd_dev_image_release() rbd_dev->opts is used to distinguish between the image that is being mapped and a parent. However, because we no longer establish watch for read-only mappings, this test is imprecise and results in unnecessary rbd_unregister_watch() calls. Make it consistent with need_watch in rbd_dev_image_probe(). Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings") Signed-off-by: Ilya Dryomov Reviewed-by: Jason Dillaman --- drivers/block/rbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 7aec8bc5df6e..205192a5ec8f 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -6898,7 +6898,7 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap) static void rbd_dev_image_release(struct rbd_device *rbd_dev) { - if (rbd_dev->opts) + if (!rbd_is_ro(rbd_dev)) rbd_unregister_watch(rbd_dev); rbd_dev_unprobe(rbd_dev); From 8ae0299a4b72f2f9ad2b755da91c6a2beabaee62 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Tue, 17 Mar 2020 15:18:48 +0100 Subject: [PATCH 108/331] rbd: don't mess with a page vector in rbd_notify_op_lock() rbd_notify_op_lock() isn't interested in a notify reply. Instead of accepting that page vector just to free it, have watch-notify code take care of it. Signed-off-by: Ilya Dryomov Reviewed-by: Jason Dillaman --- drivers/block/rbd.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 205192a5ec8f..67d65ac785e9 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3754,11 +3754,7 @@ static int __rbd_notify_op_lock(struct rbd_device *rbd_dev, static void rbd_notify_op_lock(struct rbd_device *rbd_dev, enum rbd_notify_op notify_op) { - struct page **reply_pages; - size_t reply_len; - - __rbd_notify_op_lock(rbd_dev, notify_op, &reply_pages, &reply_len); - ceph_release_page_vector(reply_pages, calc_pages_for(0, reply_len)); + __rbd_notify_op_lock(rbd_dev, notify_op, NULL, NULL); } static void rbd_notify_acquired_lock(struct work_struct *work) From aca48b61f963869ccbb5cf84805a7ad68bf812cd Mon Sep 17 00:00:00 2001 From: Rajendra Nayak Date: Wed, 8 Apr 2020 19:16:27 +0530 Subject: [PATCH 109/331] opp: Manage empty OPP tables with clk handle With OPP core now supporting DVFS for IO devices, we have instances of IO devices (same IP block) which require an OPP on some platforms/SoCs while just needing to scale the clock on some others. In order to avoid conditional code in every driver which supports such devices (to check for availability of OPPs and then deciding to do either dev_pm_opp_set_rate() or clk_set_rate()) add support to manage empty OPP tables with a clk handle. This makes dev_pm_opp_set_rate() equivalent of a clk_set_rate() for devices with just a clk and no OPPs specified, and makes dev_pm_opp_set_rate(0) bail out without throwing an error. Signed-off-by: Rajendra Nayak Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index ba43e6a3dc0a..e4f01e7771a2 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -819,6 +819,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) if (unlikely(!target_freq)) { if (opp_table->required_opp_tables) { ret = _set_required_opps(dev, opp_table, NULL); + } else if (!_get_opp_count(opp_table)) { + return 0; } else { dev_err(dev, "target frequency can't be 0\n"); ret = -EINVAL; @@ -849,6 +851,18 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) goto put_opp_table; } + /* + * For IO devices which require an OPP on some platforms/SoCs + * while just needing to scale the clock on some others + * we look for empty OPP tables with just a clock handle and + * scale only the clk. This makes dev_pm_opp_set_rate() + * equivalent to a clk_set_rate() + */ + if (!_get_opp_count(opp_table)) { + ret = _generic_set_opp_clk_only(dev, clk, freq); + goto put_opp_table; + } + temp_freq = old_freq; old_opp = _find_freq_ceil(opp_table, &temp_freq); if (IS_ERR(old_opp)) { From c72057b56f7e24865840a6961d801a7f21d30a5f Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 8 Apr 2020 16:13:20 +0100 Subject: [PATCH 110/331] afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus() If we receive a status record that has VNOVNODE set in the abort field, xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() don't advance the XDR pointer, thereby corrupting anything subsequent decodes from the same block of data. This has the potential to affect AFS.InlineBulkStatus and YFS.InlineBulkStatus operation, but probably doesn't since the status records are extracted as individual blocks of data and the buffer pointer is reset between blocks. It does affect YFS.RemoveFile2 operation, corrupting the volsync record - though that is not currently used. Other operations abort the entire operation rather than returning an error inline, in which case there is no decoding to be done. Fix this by unconditionally advancing the xdr pointer. Fixes: 684b0f68cf1c ("afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility") Signed-off-by: David Howells --- fs/afs/fsclient.c | 14 +++++++++----- fs/afs/yfsclient.c | 12 ++++++++---- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 1f9c5d8e6fe5..fae73e13976a 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -65,6 +65,7 @@ static int xdr_decode_AFSFetchStatus(const __be32 **_bp, bool inline_error = (call->operation_ID == afs_FS_InlineBulkStatus); u64 data_version, size; u32 type, abort_code; + int ret; abort_code = ntohl(xdr->abort_code); @@ -78,7 +79,7 @@ static int xdr_decode_AFSFetchStatus(const __be32 **_bp, */ status->abort_code = abort_code; scb->have_error = true; - return 0; + goto good; } pr_warn("Unknown AFSFetchStatus version %u\n", ntohl(xdr->if_version)); @@ -87,7 +88,7 @@ static int xdr_decode_AFSFetchStatus(const __be32 **_bp, if (abort_code != 0 && inline_error) { status->abort_code = abort_code; - return 0; + goto good; } type = ntohl(xdr->type); @@ -123,13 +124,16 @@ static int xdr_decode_AFSFetchStatus(const __be32 **_bp, data_version |= (u64)ntohl(xdr->data_version_hi) << 32; status->data_version = data_version; scb->have_status = true; - +good: + ret = 0; +advance: *_bp = (const void *)*_bp + sizeof(*xdr); - return 0; + return ret; bad: xdr_dump_bad(*_bp); - return afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); + ret = afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); + goto advance; } static time64_t xdr_decode_expiry(struct afs_call *call, u32 expiry) diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index a26126ac7bf1..a0f7c3186645 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -186,13 +186,14 @@ static int xdr_decode_YFSFetchStatus(const __be32 **_bp, const struct yfs_xdr_YFSFetchStatus *xdr = (const void *)*_bp; struct afs_file_status *status = &scb->status; u32 type; + int ret; status->abort_code = ntohl(xdr->abort_code); if (status->abort_code != 0) { if (status->abort_code == VNOVNODE) status->nlink = 0; scb->have_error = true; - return 0; + goto good; } type = ntohl(xdr->type); @@ -220,13 +221,16 @@ static int xdr_decode_YFSFetchStatus(const __be32 **_bp, status->size = xdr_to_u64(xdr->size); status->data_version = xdr_to_u64(xdr->data_version); scb->have_status = true; - +good: + ret = 0; +advance: *_bp += xdr_size(xdr); - return 0; + return ret; bad: xdr_dump_bad(*_bp); - return afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); + ret = afs_protocol_error(call, -EBADMSG, afs_eproto_bad_status); + goto advance; } /* From 3e0d9892c0e7fa426ca6bf921cb4b543ca265714 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 8 Apr 2020 17:32:10 +0100 Subject: [PATCH 111/331] afs: Fix decoding of inline abort codes from version 1 status records If we're decoding an AFSFetchStatus record and we see that the version is 1 and the abort code is set and we're expecting inline errors, then we store the abort code and ignore the remaining status record (which is correct), but we don't set the flag to say we got a valid abort code. This can affect operation of YFS.RemoveFile2 when removing a file and the operation of {,Y}FS.InlineBulkStatus when prospectively constructing or updating of a set of inodes during a lookup. Fix this to indicate the reception of a valid abort code. Fixes: a38a75581e6e ("afs: Fix unlink to handle YFS.RemoveFile2 better") Signed-off-by: David Howells --- fs/afs/fsclient.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index fae73e13976a..de4331670c84 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -88,6 +88,7 @@ static int xdr_decode_AFSFetchStatus(const __be32 **_bp, if (abort_code != 0 && inline_error) { status->abort_code = abort_code; + scb->have_error = true; goto good; } From b98f0ec91c42d87a70da42726b852ac8d78a3257 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 8 Apr 2020 20:56:20 +0100 Subject: [PATCH 112/331] afs: Fix rename operation status delivery The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only decode the second file status returned unless the parent directories are different - unfortunately, this means that the xdr pointer isn't advanced and the volsync record will be read incorrectly in such an instance. Fix this by always decoding the second status into the second status/callback block which wasn't being used if the dirs were the same. The afs_update_dentry_version() calls that update the directory data version numbers on the dentries can then unconditionally use the second status record as this will always reflect the state of the destination dir (the two records will be identical if the destination dir is the same as the source dir) Fixes: 260a980317da ("[AFS]: Add "directory write" support.") Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells --- fs/afs/dir.c | 13 +++---------- fs/afs/fsclient.c | 12 ++++++------ fs/afs/yfsclient.c | 8 +++----- 3 files changed, 12 insertions(+), 21 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 5c794f4b051a..31d297e0f765 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1892,7 +1892,6 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, if (afs_begin_vnode_operation(&fc, orig_dvnode, key, true)) { afs_dataversion_t orig_data_version; afs_dataversion_t new_data_version; - struct afs_status_cb *new_scb = &scb[1]; orig_data_version = orig_dvnode->status.data_version + 1; @@ -1904,7 +1903,6 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, new_data_version = new_dvnode->status.data_version + 1; } else { new_data_version = orig_data_version; - new_scb = &scb[0]; } while (afs_select_fileserver(&fc)) { @@ -1912,7 +1910,7 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, fc.cb_break_2 = afs_calc_vnode_cb_break(new_dvnode); afs_fs_rename(&fc, old_dentry->d_name.name, new_dvnode, new_dentry->d_name.name, - &scb[0], new_scb); + &scb[0], &scb[1]); } afs_vnode_commit_status(&fc, orig_dvnode, fc.cb_break, @@ -1957,13 +1955,8 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, * Note that if we ever implement RENAME_EXCHANGE, we'll have * to update both dentries with opposing dir versions. */ - if (new_dvnode != orig_dvnode) { - afs_update_dentry_version(&fc, old_dentry, &scb[1]); - afs_update_dentry_version(&fc, new_dentry, &scb[1]); - } else { - afs_update_dentry_version(&fc, old_dentry, &scb[0]); - afs_update_dentry_version(&fc, new_dentry, &scb[0]); - } + afs_update_dentry_version(&fc, old_dentry, &scb[1]); + afs_update_dentry_version(&fc, new_dentry, &scb[1]); d_move(old_dentry, new_dentry); goto error_tmp; } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index de4331670c84..68fc46634346 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -986,16 +986,16 @@ static int afs_deliver_fs_rename(struct afs_call *call) if (ret < 0) return ret; - /* unmarshall the reply once we've received all of it */ + /* If the two dirs are the same, we have two copies of the same status + * report, so we just decode it twice. + */ bp = call->buffer; ret = xdr_decode_AFSFetchStatus(&bp, call, call->out_dir_scb); if (ret < 0) return ret; - if (call->out_dir_scb != call->out_scb) { - ret = xdr_decode_AFSFetchStatus(&bp, call, call->out_scb); - if (ret < 0) - return ret; - } + ret = xdr_decode_AFSFetchStatus(&bp, call, call->out_scb); + if (ret < 0) + return ret; xdr_decode_AFSVolSync(&bp, call->out_volsync); _leave(" = 0 [done]"); diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index a0f7c3186645..83b6d67325f6 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -1157,11 +1157,9 @@ static int yfs_deliver_fs_rename(struct afs_call *call) ret = xdr_decode_YFSFetchStatus(&bp, call, call->out_dir_scb); if (ret < 0) return ret; - if (call->out_dir_scb != call->out_scb) { - ret = xdr_decode_YFSFetchStatus(&bp, call, call->out_scb); - if (ret < 0) - return ret; - } + ret = xdr_decode_YFSFetchStatus(&bp, call, call->out_scb); + if (ret < 0) + return ret; xdr_decode_YFSVolSync(&bp, call->out_volsync); _leave(" = 0 [done]"); From 3efe55b09a92a59ed8214db801683cf13c9742c4 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 1 Apr 2020 23:32:12 +0100 Subject: [PATCH 113/331] afs: Fix length of dump of bad YFSFetchStatus record Fix the length of the dump of a bad YFSFetchStatus record. The function was copied from the AFS version, but the YFS variant contains bigger fields and extra information, so expand the dump to match. Signed-off-by: David Howells --- fs/afs/yfsclient.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 83b6d67325f6..b5b45c57e1b1 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -165,15 +165,15 @@ static void xdr_dump_bad(const __be32 *bp) int i; pr_notice("YFS XDR: Bad status record\n"); - for (i = 0; i < 5 * 4 * 4; i += 16) { + for (i = 0; i < 6 * 4 * 4; i += 16) { memcpy(x, bp, 16); bp += 4; pr_notice("%03x: %08x %08x %08x %08x\n", i, ntohl(x[0]), ntohl(x[1]), ntohl(x[2]), ntohl(x[3])); } - memcpy(x, bp, 4); - pr_notice("0x50: %08x\n", ntohl(x[0])); + memcpy(x, bp, 8); + pr_notice("0x60: %08x %08x\n", ntohl(x[0]), ntohl(x[1])); } /* From 2105c2820d366b76f38e6ad61c75771881ecc532 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 10 Apr 2020 15:23:27 +0100 Subject: [PATCH 114/331] afs: Fix race between post-modification dir edit and readdir/d_revalidate AFS directories are retained locally as a structured file, with lookup being effected by a local search of the file contents. When a modification (such as mkdir) happens, the dir file content is modified locally rather than redownloading the directory. The directory contents are accessed in a number of ways, with a number of different locks schemes: (1) Download of contents - dvnode->validate_lock/write in afs_read_dir(). (2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(), downgrading from (1) if necessary. (3) d_revalidate of child dentry - dvnode->validate_lock/read in afs_do_lookup_one() downgrading from (1) if necessary. (4) Edit of dir after modification - page locks on individual dir pages. Unfortunately, because (4) uses different locking scheme to (1) - (3), nothing protects against the page being scanned whilst the edit is underway. Even download is not safe as it doesn't lock the pages - relying instead on the validate_lock to serialise as a whole (the theory being that directory contents are treated as a block and always downloaded as a block). Fix this by write-locking dvnode->validate_lock around the edits. Care must be taken in the rename case as there may be two different dirs - but they need not be locked at the same time. In any case, once the lock is taken, the directory version must be rechecked, and the edit skipped if a later version has been downloaded by revalidation (there can't have been any local changes because the VFS holds the inode lock, but there can have been remote changes). Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Signed-off-by: David Howells --- fs/afs/dir.c | 89 +++++++++++++++++++++++++++++++--------------- fs/afs/dir_silly.c | 22 ++++++++---- 2 files changed, 76 insertions(+), 35 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 31d297e0f765..d6278616fb88 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1275,6 +1275,7 @@ static int afs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) struct afs_fs_cursor fc; struct afs_vnode *dvnode = AFS_FS_I(dir); struct key *key; + afs_dataversion_t data_version; int ret; mode |= S_IFDIR; @@ -1295,7 +1296,7 @@ static int afs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t data_version = dvnode->status.data_version + 1; + data_version = dvnode->status.data_version + 1; while (afs_select_fileserver(&fc)) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); @@ -1316,10 +1317,14 @@ static int afs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) goto error_key; } - if (ret == 0 && - test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) - afs_edit_dir_add(dvnode, &dentry->d_name, &iget_data.fid, - afs_edit_dir_for_create); + if (ret == 0) { + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) + afs_edit_dir_add(dvnode, &dentry->d_name, &iget_data.fid, + afs_edit_dir_for_create); + up_write(&dvnode->validate_lock); + } key_put(key); kfree(scb); @@ -1360,6 +1365,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) struct afs_fs_cursor fc; struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; + afs_dataversion_t data_version; int ret; _enter("{%llx:%llu},{%pd}", @@ -1391,7 +1397,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t data_version = dvnode->status.data_version + 1; + data_version = dvnode->status.data_version + 1; while (afs_select_fileserver(&fc)) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); @@ -1404,9 +1410,12 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) ret = afs_end_vnode_operation(&fc); if (ret == 0) { afs_dir_remove_subdir(dentry); - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) afs_edit_dir_remove(dvnode, &dentry->d_name, afs_edit_dir_for_rmdir); + up_write(&dvnode->validate_lock); } } @@ -1544,10 +1553,15 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry) ret = afs_end_vnode_operation(&fc); if (ret == 0 && !(scb[1].have_status || scb[1].have_error)) ret = afs_dir_remove_link(dvnode, dentry, key); - if (ret == 0 && - test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) - afs_edit_dir_remove(dvnode, &dentry->d_name, - afs_edit_dir_for_unlink); + + if (ret == 0) { + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) + afs_edit_dir_remove(dvnode, &dentry->d_name, + afs_edit_dir_for_unlink); + up_write(&dvnode->validate_lock); + } } if (need_rehash && ret < 0 && ret != -ENOENT) @@ -1573,6 +1587,7 @@ static int afs_create(struct inode *dir, struct dentry *dentry, umode_t mode, struct afs_status_cb *scb; struct afs_vnode *dvnode = AFS_FS_I(dir); struct key *key; + afs_dataversion_t data_version; int ret; mode |= S_IFREG; @@ -1597,7 +1612,7 @@ static int afs_create(struct inode *dir, struct dentry *dentry, umode_t mode, ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t data_version = dvnode->status.data_version + 1; + data_version = dvnode->status.data_version + 1; while (afs_select_fileserver(&fc)) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); @@ -1618,9 +1633,12 @@ static int afs_create(struct inode *dir, struct dentry *dentry, umode_t mode, goto error_key; } - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) afs_edit_dir_add(dvnode, &dentry->d_name, &iget_data.fid, afs_edit_dir_for_create); + up_write(&dvnode->validate_lock); kfree(scb); key_put(key); @@ -1648,6 +1666,7 @@ static int afs_link(struct dentry *from, struct inode *dir, struct afs_vnode *dvnode = AFS_FS_I(dir); struct afs_vnode *vnode = AFS_FS_I(d_inode(from)); struct key *key; + afs_dataversion_t data_version; int ret; _enter("{%llx:%llu},{%llx:%llu},{%pd}", @@ -1672,7 +1691,7 @@ static int afs_link(struct dentry *from, struct inode *dir, ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t data_version = dvnode->status.data_version + 1; + data_version = dvnode->status.data_version + 1; if (mutex_lock_interruptible_nested(&vnode->io_lock, 1) < 0) { afs_end_vnode_operation(&fc); @@ -1702,9 +1721,12 @@ static int afs_link(struct dentry *from, struct inode *dir, goto error_key; } - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) afs_edit_dir_add(dvnode, &dentry->d_name, &vnode->fid, afs_edit_dir_for_link); + up_write(&dvnode->validate_lock); key_put(key); kfree(scb); @@ -1732,6 +1754,7 @@ static int afs_symlink(struct inode *dir, struct dentry *dentry, struct afs_status_cb *scb; struct afs_vnode *dvnode = AFS_FS_I(dir); struct key *key; + afs_dataversion_t data_version; int ret; _enter("{%llx:%llu},{%pd},%s", @@ -1759,7 +1782,7 @@ static int afs_symlink(struct inode *dir, struct dentry *dentry, ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t data_version = dvnode->status.data_version + 1; + data_version = dvnode->status.data_version + 1; while (afs_select_fileserver(&fc)) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); @@ -1780,9 +1803,12 @@ static int afs_symlink(struct inode *dir, struct dentry *dentry, goto error_key; } - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == data_version) afs_edit_dir_add(dvnode, &dentry->d_name, &iget_data.fid, afs_edit_dir_for_symlink); + up_write(&dvnode->validate_lock); key_put(key); kfree(scb); @@ -1812,6 +1838,8 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *tmp = NULL, *rehash = NULL; struct inode *new_inode; struct key *key; + afs_dataversion_t orig_data_version; + afs_dataversion_t new_data_version; bool new_negative = d_is_negative(new_dentry); int ret; @@ -1890,9 +1918,6 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, ret = -ERESTARTSYS; if (afs_begin_vnode_operation(&fc, orig_dvnode, key, true)) { - afs_dataversion_t orig_data_version; - afs_dataversion_t new_data_version; - orig_data_version = orig_dvnode->status.data_version + 1; if (orig_dvnode != new_dvnode) { @@ -1928,18 +1953,25 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, if (ret == 0) { if (rehash) d_rehash(rehash); - if (test_bit(AFS_VNODE_DIR_VALID, &orig_dvnode->flags)) - afs_edit_dir_remove(orig_dvnode, &old_dentry->d_name, - afs_edit_dir_for_rename_0); + down_write(&orig_dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &orig_dvnode->flags) && + orig_dvnode->status.data_version == orig_data_version) + afs_edit_dir_remove(orig_dvnode, &old_dentry->d_name, + afs_edit_dir_for_rename_0); + if (orig_dvnode != new_dvnode) { + up_write(&orig_dvnode->validate_lock); - if (!new_negative && - test_bit(AFS_VNODE_DIR_VALID, &new_dvnode->flags)) - afs_edit_dir_remove(new_dvnode, &new_dentry->d_name, - afs_edit_dir_for_rename_1); + down_write(&new_dvnode->validate_lock); + } + if (test_bit(AFS_VNODE_DIR_VALID, &new_dvnode->flags) && + orig_dvnode->status.data_version == new_data_version) { + if (!new_negative) + afs_edit_dir_remove(new_dvnode, &new_dentry->d_name, + afs_edit_dir_for_rename_1); - if (test_bit(AFS_VNODE_DIR_VALID, &new_dvnode->flags)) afs_edit_dir_add(new_dvnode, &new_dentry->d_name, &vnode->fid, afs_edit_dir_for_rename_2); + } new_inode = d_inode(new_dentry); if (new_inode) { @@ -1958,6 +1990,7 @@ static int afs_rename(struct inode *old_dir, struct dentry *old_dentry, afs_update_dentry_version(&fc, old_dentry, &scb[1]); afs_update_dentry_version(&fc, new_dentry, &scb[1]); d_move(old_dentry, new_dentry); + up_write(&new_dvnode->validate_lock); goto error_tmp; } diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c index 361088a5edb9..d94e2b7cddff 100644 --- a/fs/afs/dir_silly.c +++ b/fs/afs/dir_silly.c @@ -21,6 +21,7 @@ static int afs_do_silly_rename(struct afs_vnode *dvnode, struct afs_vnode *vnode { struct afs_fs_cursor fc; struct afs_status_cb *scb; + afs_dataversion_t dir_data_version; int ret = -ERESTARTSYS; _enter("%pd,%pd", old, new); @@ -31,7 +32,7 @@ static int afs_do_silly_rename(struct afs_vnode *dvnode, struct afs_vnode *vnode trace_afs_silly_rename(vnode, false); if (afs_begin_vnode_operation(&fc, dvnode, key, true)) { - afs_dataversion_t dir_data_version = dvnode->status.data_version + 1; + dir_data_version = dvnode->status.data_version + 1; while (afs_select_fileserver(&fc)) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); @@ -54,12 +55,15 @@ static int afs_do_silly_rename(struct afs_vnode *dvnode, struct afs_vnode *vnode dvnode->silly_key = key_get(key); } - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == dir_data_version) { afs_edit_dir_remove(dvnode, &old->d_name, afs_edit_dir_for_silly_0); - if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) afs_edit_dir_add(dvnode, &new->d_name, &vnode->fid, afs_edit_dir_for_silly_1); + } + up_write(&dvnode->validate_lock); } kfree(scb); @@ -181,10 +185,14 @@ static int afs_do_silly_unlink(struct afs_vnode *dvnode, struct afs_vnode *vnode clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags); } } - if (ret == 0 && - test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) - afs_edit_dir_remove(dvnode, &dentry->d_name, - afs_edit_dir_for_unlink); + if (ret == 0) { + down_write(&dvnode->validate_lock); + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) && + dvnode->status.data_version == dir_data_version) + afs_edit_dir_remove(dvnode, &dentry->d_name, + afs_edit_dir_for_unlink); + up_write(&dvnode->validate_lock); + } } kfree(scb); From 40fc81027f892284ce31f8b6de1e497f5b47e71f Mon Sep 17 00:00:00 2001 From: David Howells Date: Sat, 11 Apr 2020 08:50:45 +0100 Subject: [PATCH 115/331] afs: Fix afs_d_validate() to set the right directory version If a dentry's version is somewhere between invalid_before and the current directory version, we should be setting it forward to the current version, not backwards to the invalid_before version. Note that we're only doing this at all because dentry::d_fsdata isn't large enough on a 32-bit system. Fix this by using a separate variable for invalid_before so that we don't accidentally clobber the current dir version. Fixes: a4ff7401fbfa ("afs: Keep track of invalid-before version for dentry coherency") Signed-off-by: David Howells --- fs/afs/dir.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index d6278616fb88..d1e1caa23c8b 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1032,7 +1032,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) struct dentry *parent; struct inode *inode; struct key *key; - afs_dataversion_t dir_version; + afs_dataversion_t dir_version, invalid_before; long de_version; int ret; @@ -1084,8 +1084,8 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) if (de_version == (long)dir_version) goto out_valid_noupdate; - dir_version = dir->invalid_before; - if (de_version - (long)dir_version >= 0) + invalid_before = dir->invalid_before; + if (de_version - (long)invalid_before >= 0) goto out_valid; _debug("dir modified"); From 538b8471fee89eaf18f6bfbbc0576473f952b83e Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 11 Apr 2020 16:58:44 +0200 Subject: [PATCH 116/331] platform/chrome: cros_ec_sensorhub: Add missing '\n' in log messages Message logged by 'dev_xxx()' or 'pr_xxx()' should end with a '\n'. Fixes: 145d59baff59 ("platform/chrome: cros_ec_sensorhub: Add FIFO support") Signed-off-by: Christophe JAILLET Signed-off-by: Enric Balletbo i Serra --- drivers/platform/chrome/cros_ec_sensorhub_ring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/platform/chrome/cros_ec_sensorhub_ring.c b/drivers/platform/chrome/cros_ec_sensorhub_ring.c index 85e8ba782f0c..c48e5b38a441 100644 --- a/drivers/platform/chrome/cros_ec_sensorhub_ring.c +++ b/drivers/platform/chrome/cros_ec_sensorhub_ring.c @@ -820,7 +820,7 @@ static void cros_ec_sensorhub_ring_handler(struct cros_ec_sensorhub *sensorhub) if (fifo_info->count > sensorhub->fifo_size || fifo_info->size != sensorhub->fifo_size) { dev_warn(sensorhub->dev, - "Mismatch EC data: count %d, size %d - expected %d", + "Mismatch EC data: count %d, size %d - expected %d\n", fifo_info->count, fifo_info->size, sensorhub->fifo_size); goto error; @@ -851,14 +851,14 @@ static void cros_ec_sensorhub_ring_handler(struct cros_ec_sensorhub *sensorhub) } if (number_data > fifo_info->count - i) { dev_warn(sensorhub->dev, - "Invalid EC data: too many entry received: %d, expected %d", + "Invalid EC data: too many entry received: %d, expected %d\n", number_data, fifo_info->count - i); break; } if (out + number_data > sensorhub->ring + fifo_info->count) { dev_warn(sensorhub->dev, - "Too many samples: %d (%zd data) to %d entries for expected %d entries", + "Too many samples: %d (%zd data) to %d entries for expected %d entries\n", i, out - sensorhub->ring, i + number_data, fifo_info->count); break; From 4b674b9ac852937af1f8c62f730c325fb6eadcdb Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Sun, 12 Apr 2020 13:11:10 -0700 Subject: [PATCH 117/331] xfs: acquire superblock freeze protection on eofblocks scans The filesystem freeze sequence in XFS waits on any background eofblocks or cowblocks scans to complete before the filesystem is quiesced. At this point, the freezer has already stopped the transaction subsystem, however, which means a truncate or cowblock cancellation in progress is likely blocked in transaction allocation. This results in a deadlock between freeze and the associated scanner. Fix this problem by holding superblock write protection across calls into the block reapers. Since protection for background scans is acquired from the workqueue task context, trylock to avoid a similar deadlock between freeze and blocking on the write lock. Fixes: d6b636ebb1c9f ("xfs: halt auto-reclamation activities while rebuilding rmap") Reported-by: Paul Furtado Signed-off-by: Brian Foster Reviewed-by: Chandan Rajendra Reviewed-by: Christoph Hellwig Reviewed-by: Allison Collins Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 10 ++++++++++ fs/xfs/xfs_ioctl.c | 5 ++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index a7be7a9e5c1a..8bf1d15be3f6 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -911,7 +911,12 @@ xfs_eofblocks_worker( { struct xfs_mount *mp = container_of(to_delayed_work(work), struct xfs_mount, m_eofblocks_work); + + if (!sb_start_write_trylock(mp->m_super)) + return; xfs_icache_free_eofblocks(mp, NULL); + sb_end_write(mp->m_super); + xfs_queue_eofblocks(mp); } @@ -938,7 +943,12 @@ xfs_cowblocks_worker( { struct xfs_mount *mp = container_of(to_delayed_work(work), struct xfs_mount, m_cowblocks_work); + + if (!sb_start_write_trylock(mp->m_super)) + return; xfs_icache_free_cowblocks(mp, NULL); + sb_end_write(mp->m_super); + xfs_queue_cowblocks(mp); } diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index cdfb3cd9a25b..309958186d33 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -2363,7 +2363,10 @@ xfs_file_ioctl( if (error) return error; - return xfs_icache_free_eofblocks(mp, &keofb); + sb_start_write(mp->m_super); + error = xfs_icache_free_eofblocks(mp, &keofb); + sb_end_write(mp->m_super); + return error; } default: From c142932c29e533ee892f87b44d8abc5719edceec Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 12 Apr 2020 13:11:11 -0700 Subject: [PATCH 118/331] xfs: fix partially uninitialized structure in xfs_reflink_remap_extent In the reflink extent remap function, it turns out that uirec (the block mapping corresponding only to the part of the passed-in mapping that got unmapped) was not fully initialized. Specifically, br_state was not being copied from the passed-in struct to the uirec. This could lead to unpredictable results such as the reflinked mapping being marked unwritten in the destination file. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/xfs_reflink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index b0ce04ffd3cd..107bf2a2f344 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1051,6 +1051,7 @@ xfs_reflink_remap_extent( uirec.br_startblock = irec->br_startblock + rlen; uirec.br_startoff = irec->br_startoff + rlen; uirec.br_blockcount = unmap_len - rlen; + uirec.br_state = irec->br_state; unmap_len = rlen; /* If this isn't a real mapping, we're done. */ From 25faa4bd37c10f19e4b848b9032a17a3d44c6f09 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 10:20:29 +0200 Subject: [PATCH 119/331] ALSA: hda: Don't release card at firmware loading error At the error path of the firmware loading error, the driver tries to release the card object and set NULL to drvdata. This may be referred badly at the possible PM action, as the driver itself is still bound and the PM callbacks read the card object. Instead, we continue the probing as if it were no option set. This is often a better choice than the forced abort, too. Fixes: 5cb543dba986 ("ALSA: hda - Deferred probing with request_firmware_nowait()") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Link: https://lore.kernel.org/r/20200413082034.25166-2-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index bd093593f8fb..a2e811375750 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2027,24 +2027,15 @@ static void azx_firmware_cb(const struct firmware *fw, void *context) { struct snd_card *card = context; struct azx *chip = card->private_data; - struct pci_dev *pci = chip->pci; - if (!fw) { - dev_err(card->dev, "Cannot load firmware, aborting\n"); - goto error; - } - - chip->fw = fw; + if (fw) + chip->fw = fw; + else + dev_err(card->dev, "Cannot load firmware, continue without patching\n"); if (!chip->disabled) { /* continue probing */ - if (azx_probe_continue(chip)) - goto error; + azx_probe_continue(chip); } - return; /* OK */ - - error: - snd_card_free(card); - pci_set_drvdata(pci, NULL); } #endif From 10db5bccc390e8e4bd9fcd1fbd4f1b23f271a405 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 10:20:30 +0200 Subject: [PATCH 120/331] ALSA: hda: Honor PM disablement in PM freeze and thaw_noirq ops freeze_noirq and thaw_noirq need to check the PM availability like other PM ops. There are cases where the device got disabled due to the error, and the PM operation should be ignored for that. Fixes: 3e6db33aaf1d ("ALSA: hda - Set SKL+ hda controller power at freeze() and thaw()") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Link: https://lore.kernel.org/r/20200413082034.25166-3-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index a2e811375750..f41d8b7864c1 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1071,6 +1071,8 @@ static int azx_freeze_noirq(struct device *dev) struct azx *chip = card->private_data; struct pci_dev *pci = to_pci_dev(dev); + if (!azx_is_pm_ready(card)) + return 0; if (chip->driver_type == AZX_DRIVER_SKL) pci_set_power_state(pci, PCI_D3hot); @@ -1083,6 +1085,8 @@ static int azx_thaw_noirq(struct device *dev) struct azx *chip = card->private_data; struct pci_dev *pci = to_pci_dev(dev); + if (!azx_is_pm_ready(card)) + return 0; if (chip->driver_type == AZX_DRIVER_SKL) pci_set_power_state(pci, PCI_D0); From 2393e7555b531a534152ffe7bfd1862cacedaacb Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 10:20:31 +0200 Subject: [PATCH 121/331] ALSA: hda: Release resources at error in delayed probe snd-hda-intel driver handles the most of its probe task in the delayed work (either via workqueue or via firmware loader). When an error happens in the later delayed probe, we can't deregister the device itself because the probe callback already returned success and the device was bound. So, for now, we set hda->init_failed flag and make the rest untouched until the device gets really unbound. However, this leaves the device up running, keeping the resources without any use that prevents other operations. In this patch, we release the resources at first when a probe error happens in the delayed probe stage, but keeps the top-level object, so that the PM and other ops can still refer to the object itself. Also for simplicity, snd_hda_intel object is allocated via devm, so that we can get rid of the explicit kfree calls. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Link: https://lore.kernel.org/r/20200413082034.25166-4-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 29 ++++++++++++++++------------- sound/pci/hda/hda_intel.h | 1 + 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index f41d8b7864c1..692857904d49 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1203,10 +1203,8 @@ static void azx_vs_set_state(struct pci_dev *pci, if (!disabled) { dev_info(chip->card->dev, "Start delayed initialization\n"); - if (azx_probe_continue(chip) < 0) { + if (azx_probe_continue(chip) < 0) dev_err(chip->card->dev, "initialization error\n"); - hda->init_failed = true; - } } } else { dev_info(chip->card->dev, "%s via vga_switcheroo\n", @@ -1339,12 +1337,15 @@ static int register_vga_switcheroo(struct azx *chip) /* * destructor */ -static int azx_free(struct azx *chip) +static void azx_free(struct azx *chip) { struct pci_dev *pci = chip->pci; struct hda_intel *hda = container_of(chip, struct hda_intel, chip); struct hdac_bus *bus = azx_bus(chip); + if (hda->freed) + return; + if (azx_has_pm_runtime(chip) && chip->running) pm_runtime_get_noresume(&pci->dev); chip->running = 0; @@ -1388,9 +1389,8 @@ static int azx_free(struct azx *chip) if (chip->driver_caps & AZX_DCAPS_I915_COMPONENT) snd_hdac_i915_exit(bus); - kfree(hda); - return 0; + hda->freed = 1; } static int azx_dev_disconnect(struct snd_device *device) @@ -1406,7 +1406,8 @@ static int azx_dev_disconnect(struct snd_device *device) static int azx_dev_free(struct snd_device *device) { - return azx_free(device->device_data); + azx_free(device->device_data); + return 0; } #ifdef SUPPORT_VGA_SWITCHEROO @@ -1773,7 +1774,7 @@ static int azx_create(struct snd_card *card, struct pci_dev *pci, if (err < 0) return err; - hda = kzalloc(sizeof(*hda), GFP_KERNEL); + hda = devm_kzalloc(&pci->dev, sizeof(*hda), GFP_KERNEL); if (!hda) { pci_disable_device(pci); return -ENOMEM; @@ -1814,7 +1815,6 @@ static int azx_create(struct snd_card *card, struct pci_dev *pci, err = azx_bus_init(chip, model[dev]); if (err < 0) { - kfree(hda); pci_disable_device(pci); return err; } @@ -2340,13 +2340,16 @@ static int azx_probe_continue(struct azx *chip) pm_runtime_put_autosuspend(&pci->dev); out_free: - if (err < 0 || !hda->need_i915_power) + if (err < 0) { + azx_free(chip); + return err; + } + + if (!hda->need_i915_power) display_power(chip, false); - if (err < 0) - hda->init_failed = 1; complete_all(&hda->probe_wait); to_hda_bus(bus)->bus_probing = 0; - return err; + return 0; } static void azx_remove(struct pci_dev *pci) diff --git a/sound/pci/hda/hda_intel.h b/sound/pci/hda/hda_intel.h index 2acfff3da1a0..3fb119f09040 100644 --- a/sound/pci/hda/hda_intel.h +++ b/sound/pci/hda/hda_intel.h @@ -27,6 +27,7 @@ struct hda_intel { unsigned int use_vga_switcheroo:1; unsigned int vga_switcheroo_registered:1; unsigned int init_failed:1; /* delayed init failed */ + unsigned int freed:1; /* resources already released */ bool need_i915_power:1; /* the hda controller needs i915 power */ }; From 9479e75fca370a5220784f7596bf598c4dad0b9b Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 10:20:32 +0200 Subject: [PATCH 122/331] ALSA: hda: Keep the controller initialization even if no codecs found Currently, when the HD-audio controller driver doesn't detect any codecs, it tries to abort the probe. But this abort happens at the delayed probe, i.e. the primary probe call already returned success, hence the driver is never unbound until user does so explicitly. As a result, it may leave the HD-audio device in the running state without the runtime PM. More badly, if the device is a HD-audio bus that is tied with a GPU, GPU cannot reach to the full power down and consumes unnecessarily much power. This patch changes the logic after no-codec situation; it continues probing without the further codec initialization but keep the controller driver running normally. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Tested-by: Roy Spliet Link: https://lore.kernel.org/r/20200413082034.25166-5-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 692857904d49..aa0be85614b6 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2009,7 +2009,7 @@ static int azx_first_init(struct azx *chip) /* codec detection */ if (!azx_bus(chip)->codec_mask) { dev_err(card->dev, "no codecs found!\n"); - return -ENODEV; + /* keep running the rest for the runtime PM */ } if (azx_acquire_irq(chip, 0) < 0) @@ -2303,9 +2303,11 @@ static int azx_probe_continue(struct azx *chip) #endif /* create codec instances */ - err = azx_probe_codecs(chip, azx_max_codecs[chip->driver_type]); - if (err < 0) - goto out_free; + if (bus->codec_mask) { + err = azx_probe_codecs(chip, azx_max_codecs[chip->driver_type]); + if (err < 0) + goto out_free; + } #ifdef CONFIG_SND_HDA_PATCH_LOADER if (chip->fw) { @@ -2319,7 +2321,7 @@ static int azx_probe_continue(struct azx *chip) #endif } #endif - if ((probe_only[dev] & 1) == 0) { + if (bus->codec_mask && !(probe_only[dev] & 1)) { err = azx_codec_configure(chip); if (err < 0) goto out_free; From c4c8dd6ef807663e42a5f04ea77cd62029eb99fa Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 10:20:33 +0200 Subject: [PATCH 123/331] ALSA: hda: Skip controller resume if not needed The HD-audio controller does system-suspend and resume operations by directly calling its helpers __azx_runtime_suspend() and __azx_runtime_resume(). However, in general, we don't have to resume always the device fully at the system resume; typically, if a device has been runtime-suspended, we can leave it to runtime resume. Usually for achieving this, the driver would call pm_runtime_force_suspend() and pm_runtime_force_resume() pairs in the system suspend and resume ops. Unfortunately, this doesn't work for the resume path in our case. For handling the jack detection at the system resume, a child codec device may need the (literally) forcibly resume even if it's been runtime-suspended, and for that, the controller device must be also resumed even if it's been suspended. This patch is an attempt to improve the situation. It replaces the direct __azx_runtime_suspend()/_resume() calls with with pm_runtime_force_suspend() and pm_runtime_force_resume() with a slight trick as we've done for the codec side. More exactly: - azx_has_pm_runtime() check is dropped from azx_runtime_suspend() and azx_runtime_resume(), so that it can be properly executed from the system-suspend/resume path - The WAKEEN handling depends on the card's power state now; it's set and cleared only for the runtime-suspend - azx_resume() checks whether any codec may need the forcible resume beforehand. If the forcible resume is required, it does temporary PM refcount up/down for actually triggering the runtime resume. - A new helper function, hda_codec_need_resume(), is introduced for checking whether the codec needs a forcible runtime-resume, and the existing code is rewritten with that. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Link: https://lore.kernel.org/r/20200413082034.25166-6-tiwai@suse.de Signed-off-by: Takashi Iwai --- include/sound/hda_codec.h | 5 +++++ sound/pci/hda/hda_codec.c | 2 +- sound/pci/hda/hda_intel.c | 38 +++++++++++++++++++++++++++----------- 3 files changed, 33 insertions(+), 12 deletions(-) diff --git a/include/sound/hda_codec.h b/include/sound/hda_codec.h index 3ee8036f5436..225154a4f2ed 100644 --- a/include/sound/hda_codec.h +++ b/include/sound/hda_codec.h @@ -494,6 +494,11 @@ void snd_hda_update_power_acct(struct hda_codec *codec); static inline void snd_hda_set_power_save(struct hda_bus *bus, int delay) {} #endif +static inline bool hda_codec_need_resume(struct hda_codec *codec) +{ + return !codec->relaxed_resume && codec->jacktbl.used; +} + #ifdef CONFIG_SND_HDA_PATCH_LOADER /* * patch firmware diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index a34a2c9f4bcf..86a632bf4d50 100644 --- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2951,7 +2951,7 @@ static int hda_codec_runtime_resume(struct device *dev) static int hda_codec_force_resume(struct device *dev) { struct hda_codec *codec = dev_to_hda_codec(dev); - bool forced_resume = !codec->relaxed_resume && codec->jacktbl.used; + bool forced_resume = hda_codec_need_resume(codec); int ret; /* The get/put pair below enforces the runtime resume even if the diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index aa0be85614b6..02c6308502b1 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1027,7 +1027,7 @@ static int azx_suspend(struct device *dev) chip = card->private_data; bus = azx_bus(chip); snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); - __azx_runtime_suspend(chip); + pm_runtime_force_suspend(dev); if (bus->irq >= 0) { free_irq(bus->irq, chip); bus->irq = -1; @@ -1044,7 +1044,9 @@ static int azx_suspend(struct device *dev) static int azx_resume(struct device *dev) { struct snd_card *card = dev_get_drvdata(dev); + struct hda_codec *codec; struct azx *chip; + bool forced_resume = false; if (!azx_is_pm_ready(card)) return 0; @@ -1055,7 +1057,20 @@ static int azx_resume(struct device *dev) chip->msi = 0; if (azx_acquire_irq(chip, 1) < 0) return -EIO; - __azx_runtime_resume(chip, false); + + /* check for the forced resume */ + list_for_each_codec(codec, &chip->bus) { + if (hda_codec_need_resume(codec)) { + forced_resume = true; + break; + } + } + + if (forced_resume) + pm_runtime_get_noresume(dev); + pm_runtime_force_resume(dev); + if (forced_resume) + pm_runtime_put(dev); snd_power_change_state(card, SNDRV_CTL_POWER_D0); trace_azx_resume(chip); @@ -1102,12 +1117,12 @@ static int azx_runtime_suspend(struct device *dev) if (!azx_is_pm_ready(card)) return 0; chip = card->private_data; - if (!azx_has_pm_runtime(chip)) - return 0; /* enable controller wake up event */ - azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) | - STATESTS_INT_MASK); + if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0) { + azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) | + STATESTS_INT_MASK); + } __azx_runtime_suspend(chip); trace_azx_runtime_suspend(chip); @@ -1118,17 +1133,18 @@ static int azx_runtime_resume(struct device *dev) { struct snd_card *card = dev_get_drvdata(dev); struct azx *chip; + bool from_rt = snd_power_get_state(card) == SNDRV_CTL_POWER_D0; if (!azx_is_pm_ready(card)) return 0; chip = card->private_data; - if (!azx_has_pm_runtime(chip)) - return 0; - __azx_runtime_resume(chip, true); + __azx_runtime_resume(chip, from_rt); /* disable controller Wake Up event*/ - azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) & - ~STATESTS_INT_MASK); + if (from_rt) { + azx_writew(chip, WAKEEN, azx_readw(chip, WAKEEN) & + ~STATESTS_INT_MASK); + } trace_azx_runtime_resume(chip); return 0; From 3ba21113bd33d49f3c300a23fc08cf114c434995 Mon Sep 17 00:00:00 2001 From: Roy Spliet Date: Mon, 13 Apr 2020 10:20:34 +0200 Subject: [PATCH 124/331] ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported This fixes runtime PM not working after a suspend-to-RAM cycle at least for the codec-less HDA device found on NVIDIA GPUs. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043 Signed-off-by: Roy Spliet Link: https://lore.kernel.org/r/20200413082034.25166-7-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 02c6308502b1..8519051a426e 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2354,8 +2354,10 @@ static int azx_probe_continue(struct azx *chip) set_default_power_save(chip); - if (azx_has_pm_runtime(chip)) + if (azx_has_pm_runtime(chip)) { + pm_runtime_use_autosuspend(&pci->dev); pm_runtime_put_autosuspend(&pci->dev); + } out_free: if (err < 0) { From 028cfb2444b94d4f394a6fa4ca46182481236e91 Mon Sep 17 00:00:00 2001 From: Evan Quan Date: Fri, 10 Apr 2020 15:38:44 +0800 Subject: [PATCH 125/331] drm/amdgpu: fix wrong vram lost counter increment V2 Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by: Evan Quan Acked-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/cik.c | 2 -- drivers/gpu/drm/amd/amdgpu/nv.c | 4 ---- drivers/gpu/drm/amd/amdgpu/soc15.c | 4 ---- drivers/gpu/drm/amd/amdgpu/vi.c | 2 -- 5 files changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 559dc24ef436..7d35b0a366a2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2008,8 +2008,24 @@ static void amdgpu_device_fill_reset_magic(struct amdgpu_device *adev) */ static bool amdgpu_device_check_vram_lost(struct amdgpu_device *adev) { - return !!memcmp(adev->gart.ptr, adev->reset_magic, - AMDGPU_RESET_MAGIC_NUM); + if (memcmp(adev->gart.ptr, adev->reset_magic, + AMDGPU_RESET_MAGIC_NUM)) + return true; + + if (!adev->in_gpu_reset) + return false; + + /* + * For all ASICs with baco/mode1 reset, the VRAM is + * always assumed to be lost. + */ + switch (amdgpu_asic_reset_method(adev)) { + case AMD_RESET_METHOD_BACO: + case AMD_RESET_METHOD_MODE1: + return true; + default: + return false; + } } /** diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c index 006f21ef7ddf..62635e58e45e 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik.c +++ b/drivers/gpu/drm/amd/amdgpu/cik.c @@ -1358,8 +1358,6 @@ static int cik_asic_reset(struct amdgpu_device *adev) int r; if (cik_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) { - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); r = amdgpu_dpm_baco_reset(adev); } else { r = cik_asic_pci_config_reset(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 033cbbca2072..52318b03c424 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -351,8 +351,6 @@ static int nv_asic_reset(struct amdgpu_device *adev) struct smu_context *smu = &adev->smu; if (nv_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) { - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); ret = smu_baco_enter(smu); if (ret) return ret; @@ -360,8 +358,6 @@ static int nv_asic_reset(struct amdgpu_device *adev) if (ret) return ret; } else { - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); ret = nv_asic_mode1_reset(adev); } diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index a40499d51c93..d42a8d8a0dea 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -569,14 +569,10 @@ static int soc15_asic_reset(struct amdgpu_device *adev) switch (soc15_asic_reset_method(adev)) { case AMD_RESET_METHOD_BACO: - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); return soc15_asic_baco_reset(adev); case AMD_RESET_METHOD_MODE2: return amdgpu_dpm_mode2_reset(adev); default: - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); return soc15_asic_mode1_reset(adev); } } diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c index 78b35901643b..3ce10e05d0d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/vi.c +++ b/drivers/gpu/drm/amd/amdgpu/vi.c @@ -765,8 +765,6 @@ static int vi_asic_reset(struct amdgpu_device *adev) int r; if (vi_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) { - if (!adev->in_suspend) - amdgpu_inc_vram_lost(adev); r = amdgpu_dpm_baco_reset(adev); } else { r = vi_asic_pci_config_reset(adev); From 74ce6ce43d4fc6ce15efb21378d9ef26125c298b Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 13 Apr 2020 11:09:12 -0600 Subject: [PATCH 126/331] io_uring: check for need to re-wait in polled async handling We added this for just the regular poll requests in commit a6ba632d2c24 ("io_uring: retry poll if we got woken with non-matching mask"), we should do the same for the poll handler used pollable async requests. Move the re-wait check and arm into a helper, and call it from io_async_task_func() as well. Signed-off-by: Jens Axboe --- fs/io_uring.c | 43 +++++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 0d1b5d5f1251..7b41f6231955 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4156,6 +4156,26 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, return 1; } +static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll) + __acquires(&req->ctx->completion_lock) +{ + struct io_ring_ctx *ctx = req->ctx; + + if (!req->result && !READ_ONCE(poll->canceled)) { + struct poll_table_struct pt = { ._key = poll->events }; + + req->result = vfs_poll(req->file, &pt) & poll->events; + } + + spin_lock_irq(&ctx->completion_lock); + if (!req->result && !READ_ONCE(poll->canceled)) { + add_wait_queue(poll->head, &poll->wait); + return true; + } + + return false; +} + static void io_async_task_func(struct callback_head *cb) { struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work); @@ -4164,14 +4184,16 @@ static void io_async_task_func(struct callback_head *cb) trace_io_uring_task_run(req->ctx, req->opcode, req->user_data); - WARN_ON_ONCE(!list_empty(&req->apoll->poll.wait.entry)); - - if (hash_hashed(&req->hash_node)) { - spin_lock_irq(&ctx->completion_lock); - hash_del(&req->hash_node); + if (io_poll_rewait(req, &apoll->poll)) { spin_unlock_irq(&ctx->completion_lock); + return; } + if (hash_hashed(&req->hash_node)) + hash_del(&req->hash_node); + + spin_unlock_irq(&ctx->completion_lock); + /* restore ->work in case we need to retry again */ memcpy(&req->work, &apoll->work, sizeof(req->work)); @@ -4436,18 +4458,11 @@ static void io_poll_task_handler(struct io_kiocb *req, struct io_kiocb **nxt) struct io_ring_ctx *ctx = req->ctx; struct io_poll_iocb *poll = &req->poll; - if (!req->result && !READ_ONCE(poll->canceled)) { - struct poll_table_struct pt = { ._key = poll->events }; - - req->result = vfs_poll(req->file, &pt) & poll->events; - } - - spin_lock_irq(&ctx->completion_lock); - if (!req->result && !READ_ONCE(poll->canceled)) { - add_wait_queue(poll->head, &poll->wait); + if (io_poll_rewait(req, poll)) { spin_unlock_irq(&ctx->completion_lock); return; } + hash_del(&req->hash_node); io_poll_complete(req, req->result, 0); req->flags |= REQ_F_COMP_LOCKED; From 2bae047ec9576da72d5003487de0bb93e747fff7 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 13 Apr 2020 11:16:34 -0600 Subject: [PATCH 127/331] io_uring: io_async_task_func() should check and honor cancelation If the request has been marked as canceled, don't try and issue it. Instead just fill a canceled event and finish the request. Signed-off-by: Jens Axboe --- fs/io_uring.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7b41f6231955..aac54772e12e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4181,6 +4181,7 @@ static void io_async_task_func(struct callback_head *cb) struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work); struct async_poll *apoll = req->apoll; struct io_ring_ctx *ctx = req->ctx; + bool canceled; trace_io_uring_task_run(req->ctx, req->opcode, req->user_data); @@ -4192,8 +4193,22 @@ static void io_async_task_func(struct callback_head *cb) if (hash_hashed(&req->hash_node)) hash_del(&req->hash_node); + canceled = READ_ONCE(apoll->poll.canceled); + if (canceled) { + io_cqring_fill_event(req, -ECANCELED); + io_commit_cqring(ctx); + } + spin_unlock_irq(&ctx->completion_lock); + if (canceled) { + kfree(apoll); + io_cqring_ev_posted(ctx); + req_set_fail_links(req); + io_put_req(req); + return; + } + /* restore ->work in case we need to retry again */ memcpy(&req->work, &apoll->work, sizeof(req->work)); From 1d95b8a2d41f0cfbf3cefb5d986941bde2e1378f Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 2 Apr 2020 16:58:12 +0800 Subject: [PATCH 128/331] scsi: hisi_sas: Fix build error without SATA_HOST If SATA_HOST is n, build fails: drivers/scsi/hisi_sas/hisi_sas_main.o: In function `hisi_sas_fill_ata_reset_cmd': hisi_sas_main.c:(.text+0x2500): undefined reference to `ata_tf_to_fis' Select SATA_HOST to fix this. Link: https://lore.kernel.org/r/20200402085812.32948-1-yuehaibing@huawei.com Fixes: bd322af15ce9 ("ata: make SATA_PMP option selectable only if any SATA host driver is enabled") Reported-by: Hulk Robot Reviewed-by: Bartlomiej Zolnierkiewicz Acked-by: John Garry Signed-off-by: YueHaibing Signed-off-by: Martin K. Petersen --- drivers/scsi/hisi_sas/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/hisi_sas/Kconfig b/drivers/scsi/hisi_sas/Kconfig index 90a17452a50d..13ed9073fc72 100644 --- a/drivers/scsi/hisi_sas/Kconfig +++ b/drivers/scsi/hisi_sas/Kconfig @@ -6,6 +6,7 @@ config SCSI_HISI_SAS select SCSI_SAS_LIBSAS select BLK_DEV_INTEGRITY depends on ATA + select SATA_HOST help This driver supports HiSilicon's SAS HBA, including support based on platform device From 2a575f138d003fff0f4930b5cfae4a1c46343b8f Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 8 Apr 2020 08:41:38 -0400 Subject: [PATCH 129/331] ceph: fix potential bad pointer deref in async dirops cb's The new async dirops callback routines can pass ERR_PTR values to ceph_mdsc_free_path, which could cause an oops. Make ceph_mdsc_free_path ignore ERR_PTR values. Also, ensure that the pr_warn messages look sane even if ceph_mdsc_build_path fails. Reported-by: Dan Carpenter Signed-off-by: Jeff Layton Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov --- fs/ceph/dir.c | 4 ++-- fs/ceph/file.c | 4 ++-- fs/ceph/mds_client.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index d594c2627430..4c4202c93b71 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1051,8 +1051,8 @@ static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc, /* If op failed, mark everyone involved for errors */ if (result) { - int pathlen; - u64 base; + int pathlen = 0; + u64 base = 0; char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen, &base, 0); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4a5ccbb7e808..afdfca965a7f 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -527,8 +527,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc, if (result) { struct dentry *dentry = req->r_dentry; - int pathlen; - u64 base; + int pathlen = 0; + u64 base = 0; char *path = ceph_mdsc_build_path(req->r_dentry, &pathlen, &base, 0); diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 4e5be79bf080..903d9edfd4bf 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -521,7 +521,7 @@ extern void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc); static inline void ceph_mdsc_free_path(char *path, int len) { - if (path) + if (!IS_ERR_OR_NULL(path)) __putname(path - (PATH_MAX - 1 - len)); } From bb46737ec09e9a072424bf46def2977c5b6b925d Mon Sep 17 00:00:00 2001 From: Nilesh Javali Date: Fri, 3 Apr 2020 01:40:17 -0700 Subject: [PATCH 130/331] scsi: qla2xxx: Fix regression warnings drivers/scsi/qla2xxx/qla_dbg.c:2542:7: warning: The scope of the variable 'pbuf' can be reduced. [variableScope] drivers/scsi/qla2xxx/qla_init.c:3615:6: warning: Variable 'rc' is assigned a value that is never used. [unreadVariable] drivers/scsi/qla2xxx/qla_isr.c:81:11-29: WARNING: dma_alloc_coherent use in rsp_els already zeroes out memory, so memset is not needed drivers/scsi/qla2xxx/qla_mbx.c:4889:15-33: WARNING: dma_alloc_coherent use in els_cmd_map already zeroes out memory, so memset is not needed [mkp: added newline after variable declaration] Link: https://lore.kernel.org/r/20200403084018.30766-2-njavali@marvell.com Reported-by: kbuild test robot Signed-off-by: Nilesh Javali Signed-off-by: Martin K. Petersen --- drivers/scsi/qla2xxx/qla_dbg.c | 3 ++- drivers/scsi/qla2xxx/qla_init.c | 2 -- drivers/scsi/qla2xxx/qla_isr.c | 1 - drivers/scsi/qla2xxx/qla_mbx.c | 2 -- 4 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c index f301a8048b2f..bf1e98f11990 100644 --- a/drivers/scsi/qla2xxx/qla_dbg.c +++ b/drivers/scsi/qla2xxx/qla_dbg.c @@ -2539,7 +2539,6 @@ ql_dbg(uint level, scsi_qla_host_t *vha, uint id, const char *fmt, ...) { va_list va; struct va_format vaf; - char pbuf[64]; va_start(va, fmt); @@ -2547,6 +2546,8 @@ ql_dbg(uint level, scsi_qla_host_t *vha, uint id, const char *fmt, ...) vaf.va = &va; if (!ql_mask_match(level)) { + char pbuf[64]; + if (vha != NULL) { const struct pci_dev *pdev = vha->hw->pdev; /* : Message */ diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 5b2deaa730bf..caa6b840e459 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -3611,8 +3611,6 @@ qla24xx_detect_sfp(scsi_qla_host_t *vha) ha->lr_distance = LR_DISTANCE_5K; } - if (!vha->flags.init_done) - rc = QLA_SUCCESS; out: ql_dbg(ql_dbg_async, vha, 0x507b, "SFP detect: %s-Range SFP %s (nvr=%x ll=%x lr=%x lrd=%x).\n", diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 8d7a905f6247..8a78d395bbc8 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -87,7 +87,6 @@ qla24xx_process_abts(struct scsi_qla_host *vha, void *pkt) } /* terminate exchange */ - memset(rsp_els, 0, sizeof(*rsp_els)); rsp_els->entry_type = ELS_IOCB_TYPE; rsp_els->entry_count = 1; rsp_els->nport_handle = ~0; diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c index 9fd83d1bffe0..4ed90437e8c4 100644 --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -4894,8 +4894,6 @@ qla25xx_set_els_cmds_supported(scsi_qla_host_t *vha) return QLA_MEMORY_ALLOC_FAILED; } - memset(els_cmd_map, 0, ELS_CMD_MAP_SIZE); - els_cmd_map[index] |= 1 << bit; mcp->mb[0] = MBC_SET_RNID_PARAMS; From d6b23a7ce0f781ba2844adcade289ebbc57df8e7 Mon Sep 17 00:00:00 2001 From: Nilesh Javali Date: Fri, 3 Apr 2020 01:40:18 -0700 Subject: [PATCH 131/331] scsi: MAINTAINERS: Update qla2xxx FC-SCSI driver maintainer Add njavali@marvell.com as new maintainer. Also add Marvell Upstream email alias to the maintainers list. Link: https://lore.kernel.org/r/20200403084018.30766-3-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Nilesh Javali Signed-off-by: Martin K. Petersen --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e64e5db31497..e9a621e9f2aa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13853,7 +13853,8 @@ S: Maintained F: drivers/scsi/qla1280.[ch] QLOGIC QLA2XXX FC-SCSI DRIVER -M: hmadhani@marvell.com +M: Nilesh Javali +M: GR-QLogic-Storage-Upstream@marvell.com L: linux-scsi@vger.kernel.org S: Supported F: Documentation/scsi/LICENSE.qla2xxx From 13ef143ddd93a5c8ee1e721683786a82eb9b126d Mon Sep 17 00:00:00 2001 From: Bodo Stroesser Date: Wed, 8 Apr 2020 15:26:09 +0200 Subject: [PATCH 132/331] scsi: target: Write NULL to *port_nexus_ptr if no ISID This patch fixes a minor flaw that could be triggered by a PR OUT RESERVE on iSCSI, if TRANSPORT IDs with and without ISID are used in the same command. In case an ISCSI Transport ID has no ISID, port_nexus_ptr was not used to write NULL, so value from previous call might persist. I don't know if that ever could happen, but with the change the code is cleaner, I think. Link: https://lore.kernel.org/r/20200408132610.14623-2-bstroesser@ts.fujitsu.com Signed-off-by: Bodo Stroesser Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen --- drivers/target/target_core_fabric_lib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_fabric_lib.c b/drivers/target/target_core_fabric_lib.c index 6b4b354c88aa..f5f673e128ef 100644 --- a/drivers/target/target_core_fabric_lib.c +++ b/drivers/target/target_core_fabric_lib.c @@ -341,7 +341,8 @@ static char *iscsi_parse_pr_out_transport_id( *p = tolower(*p); p++; } - } + } else + *port_nexus_ptr = NULL; return &buf[4]; } From 8fed04eb79a74cbf471dfaa755900a51b37273ab Mon Sep 17 00:00:00 2001 From: Bodo Stroesser Date: Wed, 8 Apr 2020 15:26:10 +0200 Subject: [PATCH 133/331] scsi: target: fix PR IN / READ FULL STATUS for FC Creation of the response to READ FULL STATUS fails for FC based reservations. Reason is the too high loop limit (< 24) in fc_get_pr_transport_id(). The string representation of FC WWPN is 23 chars long only ("11:22:33:44:55:66:77:88"). So when i is 23, the loop body is executed a last time for the ending '\0' of the string and thus hex2bin() reports an error. Link: https://lore.kernel.org/r/20200408132610.14623-3-bstroesser@ts.fujitsu.com Signed-off-by: Bodo Stroesser Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen --- drivers/target/target_core_fabric_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_fabric_lib.c b/drivers/target/target_core_fabric_lib.c index f5f673e128ef..1e031d81e59e 100644 --- a/drivers/target/target_core_fabric_lib.c +++ b/drivers/target/target_core_fabric_lib.c @@ -63,7 +63,7 @@ static int fc_get_pr_transport_id( * encoded TransportID. */ ptr = &se_nacl->initiatorname[0]; - for (i = 0; i < 24; ) { + for (i = 0; i < 23; ) { if (!strncmp(&ptr[i], ":", 1)) { i++; continue; From 066f79a5fd6d1b9a5cc57b5cd445b3e4bb68a5b2 Mon Sep 17 00:00:00 2001 From: Bodo Stroesser Date: Thu, 9 Apr 2020 12:10:26 +0200 Subject: [PATCH 134/331] scsi: target: tcmu: reset_ring should reset TCMU_DEV_BIT_BROKEN In case command ring buffer becomes inconsistent, tcmu sets device flag TCMU_DEV_BIT_BROKEN. If the bit is set, tcmu rejects new commands from LIO core with TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, and no longer processes completions from the ring. The reset_ring attribute can be used to completely clean up the command ring, so after reset_ring the ring no longer is inconsistent. Therefore reset_ring also should reset bit TCMU_DEV_BIT_BROKEN to allow normal processing. Link: https://lore.kernel.org/r/20200409101026.17872-1-bstroesser@ts.fujitsu.com Acked-by: Mike Christie Signed-off-by: Bodo Stroesser Signed-off-by: Martin K. Petersen --- drivers/target/target_core_user.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 0b9dfa6b17bc..f769bb1e3735 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -2073,6 +2073,7 @@ static void tcmu_reset_ring(struct tcmu_dev *udev, u8 err_level) mb->cmd_tail = 0; mb->cmd_head = 0; tcmu_flush_dcache_range(mb, sizeof(*mb)); + clear_bit(TCMU_DEV_BIT_BROKEN, &udev->flags); del_timer(&udev->cmd_timer); From ac4075bca10b29bfa0fed99d0a11826ef9ee5e69 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 13 Apr 2020 11:35:52 +0200 Subject: [PATCH 135/331] m68k: Drop redundant generic-y += hardirq.h The cleanup in commit 630f289b7114c0e6 ("asm-generic: make more kernel-space headers mandatory") did not take into account the recently added line for hardirq.h in commit acc45648b9aefa90 ("m68k: Switch to asm-generic/hardirq.h"), leading to the following message during the build: scripts/Makefile.asm-generic:25: redundant generic-y found in arch/m68k/include/asm/Kbuild: hardirq.h Fix this by dropping the now redundant line. Fixes: 630f289b7114c0e6 ("asm-generic: make more kernel-space headers mandatory") Signed-off-by: Geert Uytterhoeven Reviewed-by: Masahiro Yamada Signed-off-by: Linus Torvalds --- arch/m68k/include/asm/Kbuild | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index a0765aa60ea9..1bff55aa2d54 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -1,7 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 generated-y += syscall_table.h generic-y += extable.h -generic-y += hardirq.h generic-y += kvm_para.h generic-y += local64.h generic-y += mcs_spinlock.h From 924ed1f5c181132897c5928af7f3afd28792889c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 8 Apr 2020 17:53:43 +0200 Subject: [PATCH 136/331] clk: asm9260: fix __clk_hw_register_fixed_rate_with_accuracy typo The __clk_hw_register_fixed_rate_with_accuracy() function (with two '_') does not exist, and apparently never did: drivers/clk/clk-asm9260.c: In function 'asm9260_acc_init': drivers/clk/clk-asm9260.c:279:7: error: implicit declaration of function '__clk_hw_register_fixed_rate_with_accuracy'; did you mean 'clk_hw_register_fixed_rate_with_accuracy'? [-Werror=implicit-function-declaration] 279 | hw = __clk_hw_register_fixed_rate_with_accuracy(NULL, NULL, pll_clk, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | clk_hw_register_fixed_rate_with_accuracy drivers/clk/clk-asm9260.c:279:5: error: assignment to 'struct clk_hw *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion] 279 | hw = __clk_hw_register_fixed_rate_with_accuracy(NULL, NULL, pll_clk, | ^ From what I can tell, __clk_hw_register_fixed_rate() is the correct API here, so use that instead. Fixes: 728e3096741a ("clk: asm9260: Use parent accuracy in fixed rate clk") Signed-off-by: Arnd Bergmann Link: https://lkml.kernel.org/r/20200408155402.2138446-1-arnd@arndb.de Signed-off-by: Stephen Boyd --- drivers/clk/clk-asm9260.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/clk-asm9260.c b/drivers/clk/clk-asm9260.c index 536b59aabd2c..bacebd457e6f 100644 --- a/drivers/clk/clk-asm9260.c +++ b/drivers/clk/clk-asm9260.c @@ -276,7 +276,7 @@ static void __init asm9260_acc_init(struct device_node *np) /* TODO: Convert to DT parent scheme */ ref_clk = of_clk_get_parent_name(np, 0); - hw = __clk_hw_register_fixed_rate_with_accuracy(NULL, NULL, pll_clk, + hw = __clk_hw_register_fixed_rate(NULL, NULL, pll_clk, ref_clk, NULL, NULL, 0, rate, 0, CLK_FIXED_RATE_PARENT_ACCURACY); From 742b50f9dccf20b4f908162de986b78e7186bb80 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 8 Apr 2020 18:05:07 +0200 Subject: [PATCH 137/331] clk: mmp2: fix link error without mmp2 The newly added function is only built into the kernel if mmp2 is enabled, causing a link error otherwise. arm-linux-gnueabi-ld: drivers/clk/mmp/clk.o: in function `mmp_register_pll_clks': clk.c:(.text+0x6dc): undefined reference to `mmp_clk_register_pll' Move it to a different file to get it to link. Fixes: 5d34d0b32d6c ("clk: mmp2: Add support for PLL clock sources") Signed-off-by: Arnd Bergmann Link: https://lkml.kernel.org/r/20200408160518.2798571-1-arnd@arndb.de Reported-by: Guenter Roeck Reported-by: kbuild test robot Signed-off-by: Stephen Boyd --- drivers/clk/mmp/clk-pll.c | 33 ++++++++++++++++++++++++++++++++- drivers/clk/mmp/clk.c | 31 ------------------------------- drivers/clk/mmp/clk.h | 7 ------- 3 files changed, 32 insertions(+), 39 deletions(-) diff --git a/drivers/clk/mmp/clk-pll.c b/drivers/clk/mmp/clk-pll.c index 7077be293871..962014cfdc44 100644 --- a/drivers/clk/mmp/clk-pll.c +++ b/drivers/clk/mmp/clk-pll.c @@ -97,7 +97,7 @@ static const struct clk_ops mmp_clk_pll_ops = { .recalc_rate = mmp_clk_pll_recalc_rate, }; -struct clk *mmp_clk_register_pll(char *name, +static struct clk *mmp_clk_register_pll(char *name, unsigned long default_rate, void __iomem *enable_reg, u32 enable, void __iomem *reg, u8 shift, @@ -137,3 +137,34 @@ struct clk *mmp_clk_register_pll(char *name, return clk; } + +void mmp_register_pll_clks(struct mmp_clk_unit *unit, + struct mmp_param_pll_clk *clks, + void __iomem *base, int size) +{ + struct clk *clk; + int i; + + for (i = 0; i < size; i++) { + void __iomem *reg = NULL; + + if (clks[i].offset) + reg = base + clks[i].offset; + + clk = mmp_clk_register_pll(clks[i].name, + clks[i].default_rate, + base + clks[i].enable_offset, + clks[i].enable, + reg, clks[i].shift, + clks[i].input_rate, + base + clks[i].postdiv_offset, + clks[i].postdiv_shift); + if (IS_ERR(clk)) { + pr_err("%s: failed to register clock %s\n", + __func__, clks[i].name); + continue; + } + if (clks[i].id) + unit->clk_table[clks[i].id] = clk; + } +} diff --git a/drivers/clk/mmp/clk.c b/drivers/clk/mmp/clk.c index 317123641d1e..ca7d37e2c7be 100644 --- a/drivers/clk/mmp/clk.c +++ b/drivers/clk/mmp/clk.c @@ -176,37 +176,6 @@ void mmp_register_div_clks(struct mmp_clk_unit *unit, } } -void mmp_register_pll_clks(struct mmp_clk_unit *unit, - struct mmp_param_pll_clk *clks, - void __iomem *base, int size) -{ - struct clk *clk; - int i; - - for (i = 0; i < size; i++) { - void __iomem *reg = NULL; - - if (clks[i].offset) - reg = base + clks[i].offset; - - clk = mmp_clk_register_pll(clks[i].name, - clks[i].default_rate, - base + clks[i].enable_offset, - clks[i].enable, - reg, clks[i].shift, - clks[i].input_rate, - base + clks[i].postdiv_offset, - clks[i].postdiv_shift); - if (IS_ERR(clk)) { - pr_err("%s: failed to register clock %s\n", - __func__, clks[i].name); - continue; - } - if (clks[i].id) - unit->clk_table[clks[i].id] = clk; - } -} - void mmp_clk_add(struct mmp_clk_unit *unit, unsigned int id, struct clk *clk) { diff --git a/drivers/clk/mmp/clk.h b/drivers/clk/mmp/clk.h index 971b4d6d992f..20dc1e5dd756 100644 --- a/drivers/clk/mmp/clk.h +++ b/drivers/clk/mmp/clk.h @@ -238,13 +238,6 @@ void mmp_register_pll_clks(struct mmp_clk_unit *unit, struct mmp_param_pll_clk *clks, void __iomem *base, int size); -extern struct clk *mmp_clk_register_pll(char *name, - unsigned long default_rate, - void __iomem *enable_reg, u32 enable, - void __iomem *reg, u8 shift, - unsigned long input_rate, - void __iomem *postdiv_reg, u8 postdiv_shift); - #define DEFINE_MIX_REG_INFO(w_d, s_d, w_m, s_m, fc) \ { \ .width_div = (w_d), \ From ca6df49d62d7cc4c1653a4d9b1ecc61ecd530e02 Mon Sep 17 00:00:00 2001 From: Chunyan Zhang Date: Wed, 8 Apr 2020 10:02:34 +0800 Subject: [PATCH 138/331] clk: sprd: don't gate uart console clock Don't gate uart1_eb which provides console clock, gating that clock would make serial stop working if serial driver didn't enable that explicitly. Fixes: 0e4b8a2349f3 ("clk: sprd: add clocks support for SC9863A") Signed-off-by: Chunyan Zhang Link: https://lkml.kernel.org/r/20200408020234.31764-1-zhang.lyra@gmail.com Signed-off-by: Stephen Boyd --- drivers/clk/sprd/sc9863a-clk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/clk/sprd/sc9863a-clk.c b/drivers/clk/sprd/sc9863a-clk.c index a0631f7756cf..2e2dfb2d48ff 100644 --- a/drivers/clk/sprd/sc9863a-clk.c +++ b/drivers/clk/sprd/sc9863a-clk.c @@ -1641,8 +1641,9 @@ static SPRD_SC_GATE_CLK_FW_NAME(i2c4_eb, "i2c4-eb", "ext-26m", 0x0, 0x1000, BIT(12), 0, 0); static SPRD_SC_GATE_CLK_FW_NAME(uart0_eb, "uart0-eb", "ext-26m", 0x0, 0x1000, BIT(13), 0, 0); +/* uart1_eb is for console, don't gate even if unused */ static SPRD_SC_GATE_CLK_FW_NAME(uart1_eb, "uart1-eb", "ext-26m", 0x0, - 0x1000, BIT(14), 0, 0); + 0x1000, BIT(14), CLK_IGNORE_UNUSED, 0); static SPRD_SC_GATE_CLK_FW_NAME(uart2_eb, "uart2-eb", "ext-26m", 0x0, 0x1000, BIT(15), 0, 0); static SPRD_SC_GATE_CLK_FW_NAME(uart3_eb, "uart3-eb", "ext-26m", 0x0, From fbf4bcc9a8373122881909331f2f9566a128126e Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 13 Apr 2020 15:55:21 -0400 Subject: [PATCH 139/331] NFS: Fix an ABBA spinlock issue in pnfs_update_layout() We need to drop the inode spinlock while calling nfs4_select_rw_stateid(), since nfs4_copy_delegation_stateid() could take the delegation lock. Note that it is safe to do this, since all other calls to pnfs_update_layout() for that inode will find themselves blocked by the lock we hold on NFS_LAYOUT_FIRST_LAYOUTGET. Fixes: fc51b1cf391d ("NFS: Beware when dereferencing the delegation cred") Signed-off-by: Trond Myklebust --- fs/nfs/pnfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index f2dc35c22964..b8d78f393365 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -2023,6 +2023,7 @@ lookup_again: goto lookup_again; } + spin_unlock(&ino->i_lock); first = true; status = nfs4_select_rw_stateid(ctx->state, iomode == IOMODE_RW ? FMODE_WRITE : FMODE_READ, @@ -2032,12 +2033,12 @@ lookup_again: trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg, PNFS_UPDATE_LAYOUT_INVALID_OPEN); - spin_unlock(&ino->i_lock); nfs4_schedule_stateid_recovery(server, ctx->state); pnfs_clear_first_layoutget(lo); pnfs_put_layout_hdr(lo); goto lookup_again; } + spin_lock(&ino->i_lock); } else { nfs4_stateid_copy(&stateid, &lo->plh_stateid); } From f8e4ae10de43fbb7ce85f79e04eca2988b6b2c40 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 13 Apr 2020 22:19:19 +0200 Subject: [PATCH 140/331] ALSA: hda: Allow setting preallocation again for x86 The commit c31427d0d21e ("ALSA: hda: No preallocation on x86 platforms") changed CONFIG_SND_HDA_PREALLOC_SIZE setup and its default to zero for x86, as the preallocation should work almost all cases. However, this expectation was too naive; some applications try to allocate as the max buffer size as possible, and it leads to the memory exhaustion. More badly, the commit changed the kconfig no longer adjustable for x86, so you can't fix it statically (although it can be still adjusted via procfs). So, practically seen, it's more recommended to set a reasonable limit for x86, too. This patch follows to that experience, and changes the default to 2048 and allow the kconfig adjustable again. Fixes: c31427d0d21e ("ALSA: hda: No preallocation on x86 platforms") Cc: BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207223 Link: https://lore.kernel.org/r/20200413201919.24241-1-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/hda/Kconfig | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sound/hda/Kconfig b/sound/hda/Kconfig index 4ca6b09056f3..3bc9224d5e4f 100644 --- a/sound/hda/Kconfig +++ b/sound/hda/Kconfig @@ -21,16 +21,17 @@ config SND_HDA_EXT_CORE select SND_HDA_CORE config SND_HDA_PREALLOC_SIZE - int "Pre-allocated buffer size for HD-audio driver" if !SND_DMA_SGBUF + int "Pre-allocated buffer size for HD-audio driver" range 0 32768 - default 0 if SND_DMA_SGBUF + default 2048 if SND_DMA_SGBUF default 64 if !SND_DMA_SGBUF help Specifies the default pre-allocated buffer-size in kB for the HD-audio driver. A larger buffer (e.g. 2048) is preferred for systems using PulseAudio. The default 64 is chosen just for compatibility reasons. - On x86 systems, the default is zero as we need no preallocation. + On x86 systems, the default is 2048 as a reasonable value for + most of modern systems. Note that the pre-allocation size can be changed dynamically via a proc file (/proc/asound/card*/pcm*/sub*/prealloc), too. From bcad588dea538a4fc173d16a90a005536ec8dbf2 Mon Sep 17 00:00:00 2001 From: Ashutosh Dixit Date: Wed, 8 Apr 2020 16:42:01 -0700 Subject: [PATCH 141/331] drm/i915/perf: Do not clear pollin for small user read buffers It is wrong to block the user thread in the next poll when OA data is already available which could not fit in the user buffer provided in the previous read. In several cases the exact user buffer size is not known. Blocking user space in poll can lead to data loss when the buffer size used is smaller than the available data. This change fixes this issue and allows user space to read all OA data even when using a buffer size smaller than the available data using multiple non-blocking reads rather than staying blocked in poll till the next timer interrupt. v2: Fix ret value for blocking reads (Umesh) v3: Mistake during patch send (Ashutosh) v4: Remove -EAGAIN from comment (Umesh) v5: Improve condition for clearing pollin and return (Lionel) v6: Improve blocking read loop and other cleanups (Lionel) v7: Added Cc stable Testcase: igt/perf/polling-small-buf Reviewed-by: Lionel Landwerlin Signed-off-by: Ashutosh Dixit Cc: Umesh Nerlige Ramappa Cc: Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com (cherry-picked from commit 6352219c39c04ed3f9a8d1cf93f87c21753a213e) Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/i915_perf.c | 65 ++++++-------------------------- 1 file changed, 11 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 551be589d6f4..66a46e41d5ef 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2940,49 +2940,6 @@ void i915_oa_init_reg_state(const struct intel_context *ce, gen8_update_reg_state_unlocked(ce, stream); } -/** - * i915_perf_read_locked - &i915_perf_stream_ops->read with error normalisation - * @stream: An i915 perf stream - * @file: An i915 perf stream file - * @buf: destination buffer given by userspace - * @count: the number of bytes userspace wants to read - * @ppos: (inout) file seek position (unused) - * - * Besides wrapping &i915_perf_stream_ops->read this provides a common place to - * ensure that if we've successfully copied any data then reporting that takes - * precedence over any internal error status, so the data isn't lost. - * - * For example ret will be -ENOSPC whenever there is more buffered data than - * can be copied to userspace, but that's only interesting if we weren't able - * to copy some data because it implies the userspace buffer is too small to - * receive a single record (and we never split records). - * - * Another case with ret == -EFAULT is more of a grey area since it would seem - * like bad form for userspace to ask us to overrun its buffer, but the user - * knows best: - * - * http://yarchive.net/comp/linux/partial_reads_writes.html - * - * Returns: The number of bytes copied or a negative error code on failure. - */ -static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream, - struct file *file, - char __user *buf, - size_t count, - loff_t *ppos) -{ - /* Note we keep the offset (aka bytes read) separate from any - * error status so that the final check for whether we return - * the bytes read with a higher precedence than any error (see - * comment below) doesn't need to be handled/duplicated in - * stream->ops->read() implementations. - */ - size_t offset = 0; - int ret = stream->ops->read(stream, buf, count, &offset); - - return offset ?: (ret ?: -EAGAIN); -} - /** * i915_perf_read - handles read() FOP for i915 perf stream FDs * @file: An i915 perf stream file @@ -3008,7 +2965,8 @@ static ssize_t i915_perf_read(struct file *file, { struct i915_perf_stream *stream = file->private_data; struct i915_perf *perf = stream->perf; - ssize_t ret; + size_t offset = 0; + int ret; /* To ensure it's handled consistently we simply treat all reads of a * disabled stream as an error. In particular it might otherwise lead @@ -3031,13 +2989,12 @@ static ssize_t i915_perf_read(struct file *file, return ret; mutex_lock(&perf->lock); - ret = i915_perf_read_locked(stream, file, - buf, count, ppos); + ret = stream->ops->read(stream, buf, count, &offset); mutex_unlock(&perf->lock); - } while (ret == -EAGAIN); + } while (!offset && !ret); } else { mutex_lock(&perf->lock); - ret = i915_perf_read_locked(stream, file, buf, count, ppos); + ret = stream->ops->read(stream, buf, count, &offset); mutex_unlock(&perf->lock); } @@ -3048,15 +3005,15 @@ static ssize_t i915_perf_read(struct file *file, * and read() returning -EAGAIN. Clearing the oa.pollin state here * effectively ensures we back off until the next hrtimer callback * before reporting another EPOLLIN event. + * The exception to this is if ops->read() returned -ENOSPC which means + * that more OA data is available than could fit in the user provided + * buffer. In this case we want the next poll() call to not block. */ - if (ret >= 0 || ret == -EAGAIN) { - /* Maybe make ->pollin per-stream state if we support multiple - * concurrent streams in the future. - */ + if (ret != -ENOSPC) stream->pollin = false; - } - return ret; + /* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, ... */ + return offset ?: (ret ?: -EAGAIN); } static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) From 8e2e1faf28b3e66430f55f0b0ee83370ecc150af Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 13 Apr 2020 17:05:14 -0600 Subject: [PATCH 142/331] io_uring: only post events in io_poll_remove_all() if we completed some MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit syzbot reports this crash: BUG: unable to handle page fault for address: ffffffffffffffe8 PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G B L 5.7.0-rc1-next-20200413 #4 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017 RIP: 0010:__wake_up_common+0x98/0x290 el/sched/wait.c:87 Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 Call Trace: __wake_up_common_lock+0xea/0x150 ommon_lock at kernel/sched/wait.c:124 ? __wake_up_common+0x290/0x290 ? lockdep_hardirqs_on+0x16/0x2c0 __wake_up+0x13/0x20 io_cqring_ev_posted+0x75/0xe0 v_posted at fs/io_uring.c:1160 io_ring_ctx_wait_and_kill+0x1c0/0x2f0 l at fs/io_uring.c:7305 io_uring_create+0xa8d/0x13b0 ? io_req_defer_prep+0x990/0x990 ? __kasan_check_write+0x14/0x20 io_uring_setup+0xb8/0x130 ? io_uring_create+0x13b0/0x13b0 ? check_flags.part.28+0x220/0x220 ? lockdep_hardirqs_on+0x16/0x2c0 __x64_sys_io_uring_setup+0x31/0x40 do_syscall_64+0xcc/0xaf0 ? syscall_return_slowpath+0x580/0x580 ? lockdep_hardirqs_off+0x1f/0x140 ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3 ? trace_hardirqs_off_caller+0x3a/0x150 ? trace_hardirqs_off_thunk+0x1a/0x1c entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fdcb9dd76ed Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54 RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000 R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002 R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000 Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc] CR2: ffffffffffffffe8 ---[ end trace f9502383d57e0e22 ]--- RIP: 0010:__wake_up_common+0x98/0x290 Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]— which is due to error injection (or allocation failure) preventing the rings from being setup. On shutdown, we attempt to remove any pending requests, and for poll request, we call io_cqring_ev_posted() when we've killed poll requests. However, since the rings aren't setup, we won't find any poll requests. Make the calling of io_cqring_ev_posted() dependent on actually having completed requests. This fixes this setup corner case, and removes spurious calls if we remove poll requests and don't find any. Reported-by: Qian Cai Signed-off-by: Jens Axboe --- fs/io_uring.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index aac54772e12e..32cbace58256 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4392,7 +4392,7 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx) { struct hlist_node *tmp; struct io_kiocb *req; - int i; + int posted = 0, i; spin_lock_irq(&ctx->completion_lock); for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) { @@ -4400,11 +4400,12 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx) list = &ctx->cancel_hash[i]; hlist_for_each_entry_safe(req, tmp, list, hash_node) - io_poll_remove_one(req); + posted += io_poll_remove_one(req); } spin_unlock_irq(&ctx->completion_lock); - io_cqring_ev_posted(ctx); + if (posted) + io_cqring_ev_posted(ctx); } static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr) From 849f8583e955dbe3a1806e03ecacd5e71cce0a08 Mon Sep 17 00:00:00 2001 From: Li Bin Date: Mon, 13 Apr 2020 19:29:21 +0800 Subject: [PATCH 143/331] scsi: sg: add sg_remove_request in sg_common_write If the dxfer_len is greater than 256M then the request is invalid and we need to call sg_remove_request in sg_common_write. Link: https://lore.kernel.org/r/1586777361-17339-1-git-send-email-huawei.libin@huawei.com Fixes: f930c7043663 ("scsi: sg: only check for dxfer_len greater than 256M") Acked-by: Douglas Gilbert Signed-off-by: Li Bin Signed-off-by: Martin K. Petersen --- drivers/scsi/sg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 4e6af592f018..9c0ee192f0f9 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -793,8 +793,10 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp, "sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n", (int) cmnd[0], (int) hp->cmd_len)); - if (hp->dxfer_len >= SZ_256M) + if (hp->dxfer_len >= SZ_256M) { + sg_remove_request(sfp, srp); return -EINVAL; + } k = sg_start_req(srp, cmnd); if (k) { From b450b30b97010e5c68ab522c6f6c54ef76bd0683 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 9 Apr 2020 15:04:26 +0200 Subject: [PATCH 144/331] efi/cper: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200311072145.5001-1-tiwai@suse.de Link: https://lore.kernel.org/r/20200409130434.6736-2-ardb@kernel.org --- drivers/firmware/efi/cper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index b1af0de2e100..9d2512913d25 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -101,7 +101,7 @@ void cper_print_bits(const char *pfx, unsigned int bits, if (!len) len = snprintf(buf, sizeof(buf), "%s%s", pfx, str); else - len += snprintf(buf+len, sizeof(buf)-len, ", %s", str); + len += scnprintf(buf+len, sizeof(buf)-len, ", %s", str); } if (len) printk("%s\n", buf); From 05a08796281feefcbe5cfdd67b48f5073d309aa8 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 9 Apr 2020 15:04:27 +0200 Subject: [PATCH 145/331] efi/libstub/x86: Remove redundant assignment to pointer hdr The pointer hdr is being assigned a value that is never read and it is being updated later with a new value. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200402102537.503103-1-colin.king@canonical.com Link: https://lore.kernel.org/r/20200409130434.6736-3-ardb@kernel.org --- drivers/firmware/efi/libstub/x86-stub.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 8d3a707789de..e02ea51273ff 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -392,8 +392,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle, image_base = efi_table_attr(image, image_base); image_offset = (void *)startup_32 - image_base; - hdr = &((struct boot_params *)image_base)->hdr; - status = efi_allocate_pages(0x4000, (unsigned long *)&boot_params, ULONG_MAX); if (status != EFI_SUCCESS) { efi_printk("Failed to allocate lowmem for boot params\n"); From 105cb9544b161819b7be23a8a8419353a3218807 Mon Sep 17 00:00:00 2001 From: Arvind Sankar Date: Thu, 9 Apr 2020 15:04:28 +0200 Subject: [PATCH 146/331] efi/x86: Move efi stub globals from .bss to .data Commit 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage") removed the .bss section from the bzImage. However, while a PE loader is required to zero-initialize the .bss section before calling the PE entry point, the EFI handover protocol does not currently document any requirement that .bss be initialized by the bootloader prior to calling the handover entry. When systemd-boot is used to boot a unified kernel image [1], the image is constructed by embedding the bzImage as a .linux section in a PE executable that contains a small stub loader from systemd together with additional sections and potentially an initrd. As the .bss section within the bzImage is no longer explicitly present as part of the file, it is not initialized before calling the EFI handover entry. Furthermore, as the size of the embedded .linux section is only the size of the bzImage file itself, the .bss section's memory may not even have been allocated. In particular, this can result in efi_disable_pci_dma being true even when it was not specified via the command line or configuration option, which in turn causes crashes while booting on some systems. To avoid issues, place all EFI stub global variables into the .data section instead of .bss. As of this writing, only boolean flags for a few command line arguments and the sys_table pointer were in .bss and will now move into the .data section. [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage") Reported-by: Sergey Shatunov Signed-off-by: Arvind Sankar Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200406180614.429454-1-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200409130434.6736-4-ardb@kernel.org --- drivers/firmware/efi/libstub/efistub.h | 2 +- drivers/firmware/efi/libstub/x86-stub.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index cc90a748bcf0..67d26949fd26 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -25,7 +25,7 @@ #define EFI_ALLOC_ALIGN EFI_PAGE_SIZE #endif -#ifdef CONFIG_ARM +#if defined(CONFIG_ARM) || defined(CONFIG_X86) #define __efistub_global __section(.data) #else #define __efistub_global diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index e02ea51273ff..867a57e28980 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -20,7 +20,7 @@ /* Maximum physical address for 64-bit kernel with 4-level paging */ #define MAXMEM_X86_64_4LEVEL (1ull << 46) -static efi_system_table_t *sys_table; +static efi_system_table_t *sys_table __efistub_global; extern const bool efi_is64; extern u32 image_offset; From 21cb9b414301c76f77f70d990a784ad6360e5a20 Mon Sep 17 00:00:00 2001 From: Arvind Sankar Date: Thu, 9 Apr 2020 15:04:29 +0200 Subject: [PATCH 147/331] efi/x86: Always relocate the kernel for EFI handover entry Commit d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary") tries to avoid relocating the kernel in the EFI stub as far as possible. However, when systemd-boot is used to boot a unified kernel image [1], the image is constructed by embedding the bzImage as a .linux section in a PE executable that contains a small stub loader from systemd that will call the EFI stub handover entry, together with additional sections and potentially an initrd. When this image is constructed, by for example dracut, the initrd is placed after the bzImage without ensuring that at least init_size bytes are available for the bzImage. If the kernel is not relocated by the EFI stub, this could result in the compressed kernel's startup code in head_{32,64}.S overwriting the initrd. To prevent this, unconditionally relocate the kernel if the EFI stub was entered via the handover entry point. [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images Fixes: d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary") Reported-by: Sergey Shatunov Signed-off-by: Arvind Sankar Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200406180614.429454-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200409130434.6736-5-ardb@kernel.org --- drivers/firmware/efi/libstub/x86-stub.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 867a57e28980..05ccb229fb45 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -740,8 +740,15 @@ unsigned long efi_main(efi_handle_t handle, * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what * KASLR uses. * - * Also relocate it if image_offset is zero, i.e. we weren't loaded by - * LoadImage, but we are not aligned correctly. + * Also relocate it if image_offset is zero, i.e. the kernel wasn't + * loaded by LoadImage, but rather by a bootloader that called the + * handover entry. The reason we must always relocate in this case is + * to handle the case of systemd-boot booting a unified kernel image, + * which is a PE executable that contains the bzImage and an initrd as + * COFF sections. The initrd section is placed after the bzImage + * without ensuring that there are at least init_size bytes available + * for the bzImage, and thus the compressed kernel's startup code may + * overwrite the initrd unless it is moved out of the way. */ buffer_start = ALIGN(bzimage_addr - image_offset, @@ -751,8 +758,7 @@ unsigned long efi_main(efi_handle_t handle, if ((buffer_start < LOAD_PHYSICAL_ADDR) || (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) || (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) || - (image_offset == 0 && !IS_ALIGNED(bzimage_addr, - hdr->kernel_alignment))) { + (image_offset == 0)) { status = efi_relocate_kernel(&bzimage_addr, hdr->init_size, hdr->init_size, hdr->pref_address, From a94691680bace7e1404e4f235badb74e30467e86 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 9 Apr 2020 15:04:30 +0200 Subject: [PATCH 148/331] efi/arm: Deal with ADR going out of range in efi_enter_kernel() Commit 0698fac4ac2a ("efi/arm: Clean EFI stub exit code from cache instead of avoiding it") introduced a PC-relative reference to 'call_cache_fn' into efi_enter_kernel(), which lives way at the end of head.S. In some cases, the ARM version of the ADR instruction does not have sufficient range, resulting in a build error: arch/arm/boot/compressed/head.S:1453: Error: invalid constant (fffffffffffffbe4) after fixup ARM defines an alternative with a wider range, called ADRL, but this does not exist for Thumb-2. At the same time, the ADR instruction in Thumb-2 has a wider range, and so it does not suffer from the same issue. So let's switch to ADRL for ARM builds, and keep the ADR for Thumb-2 builds. Reported-by: Arnd Bergmann Tested-by: Arnd Bergmann Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200409130434.6736-6-ardb@kernel.org --- arch/arm/boot/compressed/head.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index cabdd8f4a248..e8e1c866e413 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -1450,7 +1450,8 @@ ENTRY(efi_enter_kernel) @ running beyond the PoU, and so calling cache_off below from @ inside the PE/COFF loader allocated region is unsafe unless @ we explicitly clean it to the PoC. - adr r0, call_cache_fn @ region of code we will + ARM( adrl r0, call_cache_fn ) + THUMB( adr r0, call_cache_fn ) @ region of code we will adr r1, 0f @ run with MMU off bl cache_clean_flush bl cache_off From 8b84769a7a1505b279b337dae83d16390e83f5c1 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 9 Apr 2020 15:04:31 +0200 Subject: [PATCH 149/331] Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements The EFI handover protocol was introduced on x86 to permit the boot loader to pass a populated boot_params structure as an additional function argument to the entry point. This allows the bootloader to pass the base and size of a initrd image, which is more flexible than relying on the EFI stub's file I/O routines, which can only access the file system from which the kernel image itself was loaded from firmware. This approach requires a fair amount of internal knowledge regarding the layout of the boot_params structure on the part of the boot loader, as well as knowledge regarding the allowed placement of the initrd in memory, and so it has been deprecated in favour of a new initrd loading method that is based on existing UEFI protocols and best practices. So update the x86 boot protocol documentation to clarify that the EFI handover protocol has been deprecated, and while at it, add a note that invoking the EFI handover protocol still requires the PE/COFF image to be loaded properly (as opposed to simply being copied into memory). Also, drop the code32_start header field from the list of values that need to be provided, as this is no longer required. Reviewed-by: Borislav Petkov Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200409130434.6736-7-ardb@kernel.org --- Documentation/x86/boot.rst | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index fa7ddc0428c8..5325c71ca877 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -1399,8 +1399,8 @@ must have read/write permission; CS must be __BOOT_CS and DS, ES, SS must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base address of the struct boot_params. -EFI Handover Protocol -===================== +EFI Handover Protocol (deprecated) +================================== This protocol allows boot loaders to defer initialisation to the EFI boot stub. The boot loader is required to load the kernel/initrd(s) @@ -1408,6 +1408,12 @@ from the boot media and jump to the EFI handover protocol entry point which is hdr->handover_offset bytes from the beginning of startup_{32,64}. +The boot loader MUST respect the kernel's PE/COFF metadata when it comes +to section alignment, the memory footprint of the executable image beyond +the size of the file itself, and any other aspect of the PE/COFF header +that may affect correct operation of the image as a PE/COFF binary in the +execution context provided by the EFI firmware. + The function prototype for the handover entry point looks like this:: efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp) @@ -1419,9 +1425,18 @@ UEFI specification. 'bp' is the boot loader-allocated boot params. The boot loader *must* fill out the following fields in bp:: - - hdr.code32_start - hdr.cmd_line_ptr - hdr.ramdisk_image (if applicable) - hdr.ramdisk_size (if applicable) All other fields should be zero. + +NOTE: The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF + entry point, combined with the LINUX_EFI_INITRD_MEDIA_GUID based initrd + loading protocol (refer to [0] for an example of the bootloader side of + this), which removes the need for any knowledge on the part of the EFI + bootloader regarding the internal representation of boot_params or any + requirements/limitations regarding the placement of the command line + and ramdisk in memory, or the placement of the kernel image itself. + +[0] https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0 From 464fb126d98a047953040cc9c754801dbda54e5d Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 9 Apr 2020 15:04:32 +0200 Subject: [PATCH 150/331] efi/libstub/file: Merge file name buffers to reduce stack usage Arnd reports that commit 9302c1bb8e47 ("efi/libstub: Rewrite file I/O routine") reworks the file I/O routines in a way that triggers the following warning: drivers/firmware/efi/libstub/file.c:240:1: warning: the frame size of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=] We can work around this issue dropping an instance of efi_char16_t[256] from the stack frame, and reusing the 'filename' field of the file info struct that we use to obtain file information from EFI (which contains the file name even though we already know it since we used it to open the file in the first place) Reported-by: Arnd Bergmann Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200409130434.6736-8-ardb@kernel.org --- drivers/firmware/efi/libstub/file.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/firmware/efi/libstub/file.c b/drivers/firmware/efi/libstub/file.c index d4c7e5f59d2c..ea66b1f16a79 100644 --- a/drivers/firmware/efi/libstub/file.c +++ b/drivers/firmware/efi/libstub/file.c @@ -29,30 +29,31 @@ */ #define EFI_READ_CHUNK_SIZE SZ_1M +struct finfo { + efi_file_info_t info; + efi_char16_t filename[MAX_FILENAME_SIZE]; +}; + static efi_status_t efi_open_file(efi_file_protocol_t *volume, - efi_char16_t *filename_16, + struct finfo *fi, efi_file_protocol_t **handle, unsigned long *file_size) { - struct { - efi_file_info_t info; - efi_char16_t filename[MAX_FILENAME_SIZE]; - } finfo; efi_guid_t info_guid = EFI_FILE_INFO_ID; efi_file_protocol_t *fh; unsigned long info_sz; efi_status_t status; - status = volume->open(volume, &fh, filename_16, EFI_FILE_MODE_READ, 0); + status = volume->open(volume, &fh, fi->filename, EFI_FILE_MODE_READ, 0); if (status != EFI_SUCCESS) { pr_efi_err("Failed to open file: "); - efi_char16_printk(filename_16); + efi_char16_printk(fi->filename); efi_printk("\n"); return status; } - info_sz = sizeof(finfo); - status = fh->get_info(fh, &info_guid, &info_sz, &finfo); + info_sz = sizeof(struct finfo); + status = fh->get_info(fh, &info_guid, &info_sz, fi); if (status != EFI_SUCCESS) { pr_efi_err("Failed to get file info\n"); fh->close(fh); @@ -60,7 +61,7 @@ static efi_status_t efi_open_file(efi_file_protocol_t *volume, } *handle = fh; - *file_size = finfo.info.file_size; + *file_size = fi->info.file_size; return EFI_SUCCESS; } @@ -146,13 +147,13 @@ static efi_status_t handle_cmdline_files(efi_loaded_image_t *image, alloc_addr = alloc_size = 0; do { - efi_char16_t filename[MAX_FILENAME_SIZE]; + struct finfo fi; unsigned long size; void *addr; offset = find_file_option(cmdline, cmdline_len, optstr, optstr_size, - filename, ARRAY_SIZE(filename)); + fi.filename, ARRAY_SIZE(fi.filename)); if (!offset) break; @@ -166,7 +167,7 @@ static efi_status_t handle_cmdline_files(efi_loaded_image_t *image, return status; } - status = efi_open_file(volume, filename, &file, &size); + status = efi_open_file(volume, &fi, &file, &size); if (status != EFI_SUCCESS) goto err_close_volume; From a4b81ccfd4caba017d2b84720b6de4edd16911a0 Mon Sep 17 00:00:00 2001 From: Gary Lin Date: Thu, 9 Apr 2020 15:04:33 +0200 Subject: [PATCH 151/331] efi/x86: Fix the deletion of variables in mixed mode efi_thunk_set_variable() treated the NULL "data" pointer as an invalid parameter, and this broke the deletion of variables in mixed mode. This commit fixes the check of data so that the userspace program can delete a variable in mixed mode. Fixes: 8319e9d5ad98ffcc ("efi/x86: Handle by-ref arguments covering multiple pages in mixed mode") Signed-off-by: Gary Lin Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200408081606.1504-1-glin@suse.com Link: https://lore.kernel.org/r/20200409130434.6736-9-ardb@kernel.org --- arch/x86/platform/efi/efi_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 211bb9358b73..e0e2e8136cf5 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -638,7 +638,7 @@ efi_thunk_set_variable(efi_char16_t *name, efi_guid_t *vendor, phys_vendor = virt_to_phys_or_null(vnd); phys_data = virt_to_phys_or_null_size(data, data_size); - if (!phys_name || !phys_data) + if (!phys_name || (data && !phys_data)) status = EFI_INVALID_PARAMETER; else status = efi_thunk(set_variable, phys_name, phys_vendor, @@ -669,7 +669,7 @@ efi_thunk_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor, phys_vendor = virt_to_phys_or_null(vnd); phys_data = virt_to_phys_or_null_size(data, data_size); - if (!phys_name || !phys_data) + if (!phys_name || (data && !phys_data)) status = EFI_INVALID_PARAMETER; else status = efi_thunk(set_variable, phys_name, phys_vendor, From f6103162008dfd37567f240b50e5e1ea7cf2e00c Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 9 Apr 2020 15:04:34 +0200 Subject: [PATCH 152/331] efi/x86: Don't remap text<->rodata gap read-only for mixed mode Commit d9e3d2c4f10320 ("efi/x86: Don't map the entire kernel text RW for mixed mode") updated the code that creates the 1:1 memory mapping to use read-only attributes for the 1:1 alias of the kernel's text and rodata sections, to protect it from inadvertent modification. However, it failed to take into account that the unused gap between text and rodata is given to the page allocator for general use. If the vmap'ed stack happens to be allocated from this region, any by-ref output arguments passed to EFI runtime services that are allocated on the stack (such as the 'datasize' argument taken by GetVariable() when invoked from efivar_entry_size()) will be referenced via a read-only mapping, resulting in a page fault if the EFI code tries to write to it: BUG: unable to handle page fault for address: 00000000386aae88 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD fd61063 P4D fd61063 PUD fd62063 PMD 386000e1 Oops: 0003 [#1] SMP PTI CPU: 2 PID: 255 Comm: systemd-sysv-ge Not tainted 5.6.0-rc4-default+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0008:0x3eaeed95 Code: ... <89> 03 be 05 00 00 80 a1 74 63 b1 3e 83 c0 48 e8 44 d2 ff ff eb 05 RSP: 0018:000000000fd73fa0 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 00000000386aae88 RCX: 000000003e9f1120 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 000000000fd73fd8 R08: 00000000386aae88 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc0f040220000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f21160ac940(0000) GS:ffff9cf23d500000(0000) knlGS:0000000000000000 CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 00000000386aae88 CR3: 000000000fd6c004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: Modules linked in: CR2: 00000000386aae88 ---[ end trace a8bfbd202e712834 ]--- Let's fix this by remapping text and rodata individually, and leave the gaps mapped read-write. Fixes: d9e3d2c4f10320 ("efi/x86: Don't map the entire kernel text RW for mixed mode") Reported-by: Jiri Slaby Tested-by: Jiri Slaby Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200409130434.6736-10-ardb@kernel.org --- arch/x86/platform/efi/efi_64.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index e0e2e8136cf5..c5e393f8bb3f 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -202,7 +202,7 @@ virt_to_phys_or_null_size(void *va, unsigned long size) int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) { - unsigned long pfn, text, pf; + unsigned long pfn, text, pf, rodata; struct page *page; unsigned npages; pgd_t *pgd = efi_mm.pgd; @@ -256,7 +256,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) efi_scratch.phys_stack = page_to_phys(page + 1); /* stack grows down */ - npages = (__end_rodata_aligned - _text) >> PAGE_SHIFT; + npages = (_etext - _text) >> PAGE_SHIFT; text = __pa(_text); pfn = text >> PAGE_SHIFT; @@ -266,6 +266,14 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 1; } + npages = (__end_rodata - __start_rodata) >> PAGE_SHIFT; + rodata = __pa(__start_rodata); + pfn = rodata >> PAGE_SHIFT; + if (kernel_map_pages_in_pgd(pgd, pfn, rodata, npages, pf)) { + pr_err("Failed to map kernel rodata 1:1\n"); + return 1; + } + return 0; } From a088b858f16af85e3db359b6c6aaa92dd3bc0921 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 10 Apr 2020 09:43:20 +0200 Subject: [PATCH 153/331] efi/x86: Revert struct layout change to fix kexec boot regression Commit 0a67361dcdaa29 ("efi/x86: Remove runtime table address from kexec EFI setup data") removed the code that retrieves the non-remapped UEFI runtime services pointer from the data structure provided by kexec, as it was never really needed on the kexec boot path: mapping the runtime services table at its non-remapped address is only needed when calling SetVirtualAddressMap(), which never happens during a kexec boot in the first place. However, dropping the 'runtime' member from struct efi_setup_data was a mistake. That struct is shared ABI between the kernel and the kexec tooling for x86, and so we cannot simply change its layout. So let's put back the removed field, but call it 'unused' to reflect the fact that we never look at its contents. While at it, add a comment to remind our future selves that the layout is external ABI. Fixes: 0a67361dcdaa29 ("efi/x86: Remove runtime table address from kexec EFI setup data") Reported-by: Theodore Ts'o Tested-by: Theodore Ts'o Reviewed-by: Dave Young Signed-off-by: Ard Biesheuvel Signed-off-by: Ingo Molnar --- arch/x86/include/asm/efi.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cdcf48d52a12..8391c115c0ec 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -178,8 +178,10 @@ extern void efi_free_boot_services(void); extern pgd_t * __init efi_uv1_memmap_phys_prolog(void); extern void __init efi_uv1_memmap_phys_epilog(pgd_t *save_pgd); +/* kexec external ABI */ struct efi_setup_data { u64 fw_vendor; + u64 __unused; u64 tables; u64 smbios; u64 reserved[8]; From 07d8350ede4c4c29634b26c163a1eecdf39dfcfb Mon Sep 17 00:00:00 2001 From: afzal mohammed Date: Fri, 27 Mar 2020 21:41:16 +0530 Subject: [PATCH 154/331] genirq: Remove setup_irq() and remove_irq() Now that all the users of setup_irq() & remove_irq() have been replaced by request_irq() & free_irq() respectively, delete them. Signed-off-by: afzal mohammed Signed-off-by: Thomas Gleixner Reviewed-by: Linus Walleij Link: https://lkml.kernel.org/r/0aa8771ada1ac8e1312f6882980c9c08bd023148.1585320721.git.afzal.mohd.ma@gmail.com --- include/linux/irq.h | 2 -- kernel/irq/manage.c | 44 -------------------------------------------- 2 files changed, 46 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 9315fbb87db3..c63c2aa915ff 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -573,8 +573,6 @@ enum { #define IRQ_DEFAULT_INIT_FLAGS ARCH_IRQ_INIT_FLAGS struct irqaction; -extern int setup_irq(unsigned int irq, struct irqaction *new); -extern void remove_irq(unsigned int irq, struct irqaction *act); extern int setup_percpu_irq(unsigned int irq, struct irqaction *new); extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index fe40c658f86f..453a8a0f4804 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1690,34 +1690,6 @@ out_mput: return ret; } -/** - * setup_irq - setup an interrupt - * @irq: Interrupt line to setup - * @act: irqaction for the interrupt - * - * Used to statically setup interrupts in the early boot process. - */ -int setup_irq(unsigned int irq, struct irqaction *act) -{ - int retval; - struct irq_desc *desc = irq_to_desc(irq); - - if (!desc || WARN_ON(irq_settings_is_per_cpu_devid(desc))) - return -EINVAL; - - retval = irq_chip_pm_get(&desc->irq_data); - if (retval < 0) - return retval; - - retval = __setup_irq(irq, desc, act); - - if (retval) - irq_chip_pm_put(&desc->irq_data); - - return retval; -} -EXPORT_SYMBOL_GPL(setup_irq); - /* * Internal function to unregister an irqaction - used to free * regular and special interrupts that are part of the architecture. @@ -1858,22 +1830,6 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id) return action; } -/** - * remove_irq - free an interrupt - * @irq: Interrupt line to free - * @act: irqaction for the interrupt - * - * Used to remove interrupts statically setup by the early boot process. - */ -void remove_irq(unsigned int irq, struct irqaction *act) -{ - struct irq_desc *desc = irq_to_desc(irq); - - if (desc && !WARN_ON(irq_settings_is_per_cpu_devid(desc))) - __free_irq(desc, act->dev_id); -} -EXPORT_SYMBOL_GPL(remove_irq); - /** * free_irq - free an interrupt allocated with request_irq * @irq: Interrupt line to free From 776d95b768e664efdc9f5cc078b981a006d3bff4 Mon Sep 17 00:00:00 2001 From: Yan Zhao Date: Thu, 12 Mar 2020 23:10:25 -0400 Subject: [PATCH 155/331] drm/i915/gvt: hold reference of VFIO group during opening of vgpu hold reference count of the VFIO group for each vgpu at vgpu opening and release the reference at vgpu releasing. Signed-off-by: Yan Zhao Reviewed-by: Zhenyu Wang Signed-off-by: Zhenyu Wang Link: http://patchwork.freedesktop.org/patch/msgid/20200313031025.7936-1-yan.y.zhao@intel.com --- drivers/gpu/drm/i915/gvt/kvmgt.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 074c4efb58eb..811cee28ae06 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -131,6 +131,7 @@ struct kvmgt_vdev { struct work_struct release_work; atomic_t released; struct vfio_device *vfio_device; + struct vfio_group *vfio_group; }; static inline struct kvmgt_vdev *kvmgt_vdev(struct intel_vgpu *vgpu) @@ -792,6 +793,7 @@ static int intel_vgpu_open(struct mdev_device *mdev) struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu); unsigned long events; int ret; + struct vfio_group *vfio_group; vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier; vdev->group_notifier.notifier_call = intel_vgpu_group_notifier; @@ -814,6 +816,14 @@ static int intel_vgpu_open(struct mdev_device *mdev) goto undo_iommu; } + vfio_group = vfio_group_get_external_user_from_dev(mdev_dev(mdev)); + if (IS_ERR_OR_NULL(vfio_group)) { + ret = !vfio_group ? -EFAULT : PTR_ERR(vfio_group); + gvt_vgpu_err("vfio_group_get_external_user_from_dev failed\n"); + goto undo_register; + } + vdev->vfio_group = vfio_group; + /* Take a module reference as mdev core doesn't take * a reference for vendor driver. */ @@ -830,6 +840,10 @@ static int intel_vgpu_open(struct mdev_device *mdev) return ret; undo_group: + vfio_group_put_external_user(vdev->vfio_group); + vdev->vfio_group = NULL; + +undo_register: vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &vdev->group_notifier); @@ -884,6 +898,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu) kvmgt_guest_exit(info); intel_vgpu_release_msi_eventfd_ctx(vgpu); + vfio_group_put_external_user(vdev->vfio_group); vdev->kvm = NULL; vgpu->handle = 0; From b59b2a3ee567e5a30688e148556ae33a3196bc9d Mon Sep 17 00:00:00 2001 From: Yan Zhao Date: Thu, 12 Mar 2020 23:11:09 -0400 Subject: [PATCH 156/331] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw As a device model, it is better to read/write guest memory using vfio interface, so that vfio is able to maintain dirty info of device IOVAs. Cc: Kevin Tian Signed-off-by: Yan Zhao Reviewed-by: Zhenyu Wang Signed-off-by: Zhenyu Wang Link: http://patchwork.freedesktop.org/patch/msgid/20200313031109.7989-1-yan.y.zhao@intel.com --- drivers/gpu/drm/i915/gvt/kvmgt.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 811cee28ae06..cee7376ba39d 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -2050,33 +2050,14 @@ static int kvmgt_rw_gpa(unsigned long handle, unsigned long gpa, void *buf, unsigned long len, bool write) { struct kvmgt_guest_info *info; - struct kvm *kvm; - int idx, ret; - bool kthread = current->mm == NULL; if (!handle_valid(handle)) return -ESRCH; info = (struct kvmgt_guest_info *)handle; - kvm = info->kvm; - if (kthread) { - if (!mmget_not_zero(kvm->mm)) - return -EFAULT; - use_mm(kvm->mm); - } - - idx = srcu_read_lock(&kvm->srcu); - ret = write ? kvm_write_guest(kvm, gpa, buf, len) : - kvm_read_guest(kvm, gpa, buf, len); - srcu_read_unlock(&kvm->srcu, idx); - - if (kthread) { - unuse_mm(kvm->mm); - mmput(kvm->mm); - } - - return ret; + return vfio_dma_rw(kvmgt_vdev(info->vgpu)->vfio_group, + gpa, buf, len, write); } static int kvmgt_read_gpa(unsigned long handle, unsigned long gpa, From ec7301d5146c9abe8aaf6e16e420ea3951018503 Mon Sep 17 00:00:00 2001 From: Yan Zhao Date: Thu, 12 Mar 2020 23:11:51 -0400 Subject: [PATCH 157/331] drm/i915/gvt: switch to user vfio_group_pin/upin_pages substitute vfio_pin_pages() and vfio_unpin_pages() with vfio_group_pin_pages() and vfio_group_unpin_pages(), so that it will not go through looking up, checking, referencing, dereferencing of VFIO group in each call. Signed-off-by: Yan Zhao Reviewed-by: Zhenyu Wang Signed-off-by: Zhenyu Wang Link: http://patchwork.freedesktop.org/patch/msgid/20200313031151.8042-1-yan.y.zhao@intel.com --- drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index cee7376ba39d..eee530453aa6 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -152,6 +152,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, unsigned long size) { struct drm_i915_private *i915 = vgpu->gvt->gt->i915; + struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu); int total_pages; int npage; int ret; @@ -161,7 +162,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, for (npage = 0; npage < total_pages; npage++) { unsigned long cur_gfn = gfn + npage; - ret = vfio_unpin_pages(mdev_dev(kvmgt_vdev(vgpu)->mdev), &cur_gfn, 1); + ret = vfio_group_unpin_pages(vdev->vfio_group, &cur_gfn, 1); drm_WARN_ON(&i915->drm, ret != 1); } } @@ -170,6 +171,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, unsigned long size, struct page **page) { + struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu); unsigned long base_pfn = 0; int total_pages; int npage; @@ -184,8 +186,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, unsigned long cur_gfn = gfn + npage; unsigned long pfn; - ret = vfio_pin_pages(mdev_dev(kvmgt_vdev(vgpu)->mdev), &cur_gfn, 1, - IOMMU_READ | IOMMU_WRITE, &pfn); + ret = vfio_group_pin_pages(vdev->vfio_group, &cur_gfn, 1, + IOMMU_READ | IOMMU_WRITE, &pfn); if (ret != 1) { gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n", cur_gfn, ret); From bd841d6154f5f41f8a32d3c1b0bc229e326e640a Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 1 Apr 2020 13:23:25 -0500 Subject: [PATCH 158/331] objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings CONFIG_UBSAN_TRAP causes GCC to emit a UD2 whenever it encounters an unreachable code path. This includes __builtin_unreachable(). Because the BUG() macro uses __builtin_unreachable() after it emits its own UD2, this results in a double UD2. In this case objtool rightfully detects that the second UD2 is unreachable: init/main.o: warning: objtool: repair_env_string()+0x1c8: unreachable instruction We weren't able to figure out a way to get rid of the double UD2s, so just silence the warning. Reported-by: Randy Dunlap Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Reviewed-by: Kees Cook Reviewed-by: Miroslav Benes Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/6653ad73c6b59c049211bd7c11ed3809c20ee9f5.1585761021.git.jpoimboe@redhat.com --- tools/objtool/check.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 8dd01f986fbb..481132539384 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2364,14 +2364,27 @@ static bool ignore_unreachable_insn(struct instruction *insn) !strcmp(insn->sec->name, ".altinstr_aux")) return true; + if (!insn->func) + return false; + + /* + * CONFIG_UBSAN_TRAP inserts a UD2 when it sees + * __builtin_unreachable(). The BUG() macro has an unreachable() after + * the UD2, which causes GCC's undefined trap logic to emit another UD2 + * (or occasionally a JMP to UD2). + */ + if (list_prev_entry(insn, list)->dead_end && + (insn->type == INSN_BUG || + (insn->type == INSN_JUMP_UNCONDITIONAL && + insn->jump_dest && insn->jump_dest->type == INSN_BUG))) + return true; + /* * Check if this (or a subsequent) instruction is related to * CONFIG_UBSAN or CONFIG_KASAN. * * End the search at 5 instructions to avoid going into the weeds. */ - if (!insn->func) - return false; for (i = 0; i < 5; i++) { if (is_kasan_insn(insn) || is_ubsan_insn(insn)) From 8782e7cab51b6bf01a5a86471dd82228af1ac185 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 1 Apr 2020 13:23:26 -0500 Subject: [PATCH 159/331] objtool: Support Clang non-section symbols in ORC dump Historically, the relocation symbols for ORC entries have only been section symbols: .text+0: sp:sp+8 bp:(und) type:call end:0 However, the Clang assembler is aggressive about stripping section symbols. In that case we will need to use function symbols: freezing_slow_path+0: sp:sp+8 bp:(und) type:call end:0 In preparation for the generation of such entries in "objtool orc generate", add support for reading them in "objtool orc dump". Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Reviewed-by: Miroslav Benes Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/b811b5eb1a42602c3b523576dc5efab9ad1c174d.1585761021.git.jpoimboe@redhat.com --- tools/objtool/orc_dump.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c index 13ccf775a83a..ba4cbb1cdd63 100644 --- a/tools/objtool/orc_dump.c +++ b/tools/objtool/orc_dump.c @@ -66,7 +66,7 @@ int orc_dump(const char *_objname) char *name; size_t nr_sections; Elf64_Addr orc_ip_addr = 0; - size_t shstrtab_idx; + size_t shstrtab_idx, strtab_idx = 0; Elf *elf; Elf_Scn *scn; GElf_Shdr sh; @@ -127,6 +127,8 @@ int orc_dump(const char *_objname) if (!strcmp(name, ".symtab")) { symtab = data; + } else if (!strcmp(name, ".strtab")) { + strtab_idx = i; } else if (!strcmp(name, ".orc_unwind")) { orc = data->d_buf; orc_size = sh.sh_size; @@ -138,7 +140,7 @@ int orc_dump(const char *_objname) } } - if (!symtab || !orc || !orc_ip) + if (!symtab || !strtab_idx || !orc || !orc_ip) return 0; if (orc_size % sizeof(*orc) != 0) { @@ -159,21 +161,29 @@ int orc_dump(const char *_objname) return -1; } - scn = elf_getscn(elf, sym.st_shndx); - if (!scn) { - WARN_ELF("elf_getscn"); - return -1; - } + if (GELF_ST_TYPE(sym.st_info) == STT_SECTION) { + scn = elf_getscn(elf, sym.st_shndx); + if (!scn) { + WARN_ELF("elf_getscn"); + return -1; + } - if (!gelf_getshdr(scn, &sh)) { - WARN_ELF("gelf_getshdr"); - return -1; - } + if (!gelf_getshdr(scn, &sh)) { + WARN_ELF("gelf_getshdr"); + return -1; + } - name = elf_strptr(elf, shstrtab_idx, sh.sh_name); - if (!name || !*name) { - WARN_ELF("elf_strptr"); - return -1; + name = elf_strptr(elf, shstrtab_idx, sh.sh_name); + if (!name) { + WARN_ELF("elf_strptr"); + return -1; + } + } else { + name = elf_strptr(elf, strtab_idx, sym.st_name); + if (!name) { + WARN_ELF("elf_strptr"); + return -1; + } } printf("%s+%llx:", name, (unsigned long long)rela.r_addend); From e81e0724432542af8d8c702c31e9d82f57b1ff31 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 1 Apr 2020 13:23:27 -0500 Subject: [PATCH 160/331] objtool: Support Clang non-section symbols in ORC generation When compiling the kernel with AS=clang, objtool produces a lot of warnings: warning: objtool: missing symbol for section .text warning: objtool: missing symbol for section .init.text warning: objtool: missing symbol for section .ref.text It then fails to generate the ORC table. The problem is that objtool assumes text section symbols always exist. But the Clang assembler is aggressive about removing them. When generating relocations for the ORC table, objtool always tries to reference instructions by their section symbol offset. If the section symbol doesn't exist, it bails. Do a fallback: when a section symbol isn't available, reference a function symbol instead. Reported-by: Dmitry Golovin Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Tested-by: Nathan Chancellor Reviewed-by: Miroslav Benes Acked-by: Peter Zijlstra (Intel) Link: https://github.com/ClangBuiltLinux/linux/issues/669 Link: https://lkml.kernel.org/r/9a9cae7fcf628843aabe5a086b1a3c5bf50f42e8.1585761021.git.jpoimboe@redhat.com --- tools/objtool/orc_gen.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c index 41e4a2754da4..4c0dabd28000 100644 --- a/tools/objtool/orc_gen.c +++ b/tools/objtool/orc_gen.c @@ -88,11 +88,6 @@ static int create_orc_entry(struct elf *elf, struct section *u_sec, struct secti struct orc_entry *orc; struct rela *rela; - if (!insn_sec->sym) { - WARN("missing symbol for section %s", insn_sec->name); - return -1; - } - /* populate ORC data */ orc = (struct orc_entry *)u_sec->data->d_buf + idx; memcpy(orc, o, sizeof(*orc)); @@ -105,8 +100,32 @@ static int create_orc_entry(struct elf *elf, struct section *u_sec, struct secti } memset(rela, 0, sizeof(*rela)); - rela->sym = insn_sec->sym; - rela->addend = insn_off; + if (insn_sec->sym) { + rela->sym = insn_sec->sym; + rela->addend = insn_off; + } else { + /* + * The Clang assembler doesn't produce section symbols, so we + * have to reference the function symbol instead: + */ + rela->sym = find_symbol_containing(insn_sec, insn_off); + if (!rela->sym) { + /* + * Hack alert. This happens when we need to reference + * the NOP pad insn immediately after the function. + */ + rela->sym = find_symbol_containing(insn_sec, + insn_off - 1); + } + if (!rela->sym) { + WARN("missing symbol for insn at offset 0x%lx\n", + insn_off); + return -1; + } + + rela->addend = insn_off - rela->sym->offset; + } + rela->type = R_X86_64_PC32; rela->offset = idx * sizeof(int); rela->sec = ip_relasec; From b401efc120a399dfda1f4d2858a4de365c9b08ef Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 1 Apr 2020 13:23:28 -0500 Subject: [PATCH 161/331] objtool: Fix switch table detection in .text.unlikely If a switch jump table's indirect branch is in a ".cold" subfunction in .text.unlikely, objtool doesn't detect it, and instead prints a false warning: drivers/media/v4l2-core/v4l2-ioctl.o: warning: objtool: v4l_print_format.cold()+0xd6: sibling call from callable instruction with modified stack frame drivers/hwmon/max6650.o: warning: objtool: max6650_probe.cold()+0xa5: sibling call from callable instruction with modified stack frame drivers/media/dvb-frontends/drxk_hard.o: warning: objtool: init_drxk.cold()+0x16f: sibling call from callable instruction with modified stack frame Fix it by comparing the function, instead of the section and offset. Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions") Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Reviewed-by: Miroslav Benes Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/157c35d42ca9b6354bbb1604fe9ad7d1153ccb21.1585761021.git.jpoimboe@redhat.com --- tools/objtool/check.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 481132539384..cb2d299664e7 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1050,10 +1050,7 @@ static struct rela *find_jump_table(struct objtool_file *file, * it. */ for (; - &insn->list != &file->insn_list && - insn->sec == func->sec && - insn->offset >= func->offset; - + &insn->list != &file->insn_list && insn->func && insn->func->pfunc == func; insn = insn->first_jump_src ?: list_prev_entry(insn, list)) { if (insn != orig_insn && insn->type == INSN_JUMP_DYNAMIC) From b296695298d8632d8b703ac25fe70be34a07c0d9 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 1 Apr 2020 13:23:29 -0500 Subject: [PATCH 162/331] objtool: Make BP scratch register warning more robust If func is NULL, a seg fault can result. This is a theoretical issue which was found by Coverity, ID: 1492002 ("Dereference after null check"). Fixes: c705cecc8431 ("objtool: Track original function across branches") Reported-by: Gustavo A. R. Silva Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/afc628693a37acd287e843bcc5c0430263d93c74.1585761021.git.jpoimboe@redhat.com --- tools/objtool/check.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index cb2d299664e7..4b170fd08a28 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2005,8 +2005,8 @@ static int validate_return(struct symbol *func, struct instruction *insn, struct } if (state->bp_scratch) { - WARN("%s uses BP as a scratch register", - func->name); + WARN_FUNC("BP used as a scratch register", + insn->sec, insn->offset); return 1; } From 0e012b4e4b5ec8e064be3502382579dd0bb43269 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Sun, 12 Apr 2020 00:40:30 +0200 Subject: [PATCH 163/331] nl80211: fix NL80211_ATTR_FTM_RESPONDER policy The nested policy here should be established using the NLA_POLICY_NESTED() macro so the length is properly filled in. Cc: stable@vger.kernel.org Fixes: 81e54d08d9d8 ("cfg80211: support FTM responder configuration/statistics") Link: https://lore.kernel.org/r/20200412004029.9d0722bb56c8.Ie690bfcc4a1a61ff8d8ca7e475d59fcaa52fb2da@changeid Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 5fa402144cda..692bcd35f809 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -644,10 +644,8 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_HE_CAPABILITY] = { .type = NLA_BINARY, .len = NL80211_HE_MAX_CAPABILITY_LEN }, - [NL80211_ATTR_FTM_RESPONDER] = { - .type = NLA_NESTED, - .validation_data = nl80211_ftm_responder_policy, - }, + [NL80211_ATTR_FTM_RESPONDER] = + NLA_POLICY_NESTED(nl80211_ftm_responder_policy), [NL80211_ATTR_TIMEOUT] = NLA_POLICY_MIN(NLA_U32, 1), [NL80211_ATTR_PEER_MEASUREMENTS] = NLA_POLICY_NESTED(nl80211_pmsr_attr_policy), From 7ea862048317aa76d0f22334202779a25530980c Mon Sep 17 00:00:00 2001 From: Tuomas Tynkkynen Date: Fri, 10 Apr 2020 15:32:57 +0300 Subject: [PATCH 164/331] mac80211_hwsim: Use kstrndup() in place of kasprintf() syzbot reports a warning: precision 33020 too large WARNING: CPU: 0 PID: 9618 at lib/vsprintf.c:2471 set_precision+0x150/0x180 lib/vsprintf.c:2471 vsnprintf+0xa7b/0x19a0 lib/vsprintf.c:2547 kvasprintf+0xb2/0x170 lib/kasprintf.c:22 kasprintf+0xbb/0xf0 lib/kasprintf.c:59 hwsim_del_radio_nl+0x63a/0x7e0 drivers/net/wireless/mac80211_hwsim.c:3625 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] ... entry_SYSCALL_64_after_hwframe+0x49/0xbe Thus it seems that kasprintf() with "%.*s" format can not be used for duplicating a string with arbitrary length. Replace it with kstrndup(). Note that later this string is limited to NL80211_WIPHY_NAME_MAXLEN == 64, but the code is simpler this way. Reported-by: syzbot+6693adf1698864d21734@syzkaller.appspotmail.com Reported-by: syzbot+a4aee3f42d7584d76761@syzkaller.appspotmail.com Cc: stable@kernel.org Signed-off-by: Tuomas Tynkkynen Link: https://lore.kernel.org/r/20200410123257.14559-1-tuomas.tynkkynen@iki.fi [johannes: add note about length limit] Signed-off-by: Johannes Berg --- drivers/net/wireless/mac80211_hwsim.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 7fe8207db6ae..7c4b7c31d07a 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3669,9 +3669,9 @@ static int hwsim_new_radio_nl(struct sk_buff *msg, struct genl_info *info) } if (info->attrs[HWSIM_ATTR_RADIO_NAME]) { - hwname = kasprintf(GFP_KERNEL, "%.*s", - nla_len(info->attrs[HWSIM_ATTR_RADIO_NAME]), - (char *)nla_data(info->attrs[HWSIM_ATTR_RADIO_NAME])); + hwname = kstrndup((char *)nla_data(info->attrs[HWSIM_ATTR_RADIO_NAME]), + nla_len(info->attrs[HWSIM_ATTR_RADIO_NAME]), + GFP_KERNEL); if (!hwname) return -ENOMEM; param.hwname = hwname; @@ -3691,9 +3691,9 @@ static int hwsim_del_radio_nl(struct sk_buff *msg, struct genl_info *info) if (info->attrs[HWSIM_ATTR_RADIO_ID]) { idx = nla_get_u32(info->attrs[HWSIM_ATTR_RADIO_ID]); } else if (info->attrs[HWSIM_ATTR_RADIO_NAME]) { - hwname = kasprintf(GFP_KERNEL, "%.*s", - nla_len(info->attrs[HWSIM_ATTR_RADIO_NAME]), - (char *)nla_data(info->attrs[HWSIM_ATTR_RADIO_NAME])); + hwname = kstrndup((char *)nla_data(info->attrs[HWSIM_ATTR_RADIO_NAME]), + nla_len(info->attrs[HWSIM_ATTR_RADIO_NAME]), + GFP_KERNEL); if (!hwname) return -ENOMEM; } else From a710d21451ff2917b9004b65ba2f0db6380671d5 Mon Sep 17 00:00:00 2001 From: Lothar Rubusch Date: Wed, 8 Apr 2020 23:10:13 +0000 Subject: [PATCH 165/331] cfg80211: fix kernel-doc notation Update missing kernel-doc annotations and fix of related warnings at 'make htmldocs'. Signed-off-by: Lothar Rubusch Link: https://lore.kernel.org/r/20200408231013.28370-1-l.rubusch@gmail.com [fix indentation, attribute references] Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index c78bd4ff9e33..70e48f66dac8 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -905,6 +905,8 @@ struct survey_info { * protocol frames. * @control_port_over_nl80211: TRUE if userspace expects to exchange control * port frames over NL80211 instead of the network interface. + * @control_port_no_preauth: disables pre-auth rx over the nl80211 control + * port for mac80211 * @wep_keys: static WEP keys, if not NULL points to an array of * CFG80211_MAX_WEP_KEYS WEP keys * @wep_tx_key: key index (0..3) of the default TX static WEP key @@ -1222,6 +1224,7 @@ struct sta_txpwr { * @he_capa: HE capabilities of station * @he_capa_len: the length of the HE capabilities * @airtime_weight: airtime scheduler weight for this station + * @txpwr: transmit power for an associated station */ struct station_parameters { const u8 *supported_rates; @@ -4666,6 +4669,9 @@ struct wiphy_iftype_akm_suites { * @txq_memory_limit: configuration internal TX queue memory limit * @txq_quantum: configuration of internal TX queue scheduler quantum * + * @tx_queue_len: allow setting transmit queue len for drivers not using + * wake_tx_queue + * * @support_mbssid: can HW support association with nontransmitted AP * @support_only_he_mbssid: don't parse MBSSID elements if it is not * HE AP, in order to avoid compatibility issues. @@ -4681,6 +4687,10 @@ struct wiphy_iftype_akm_suites { * supported by the driver for each peer * @tid_config_support.max_retry: maximum supported retry count for * long/short retry configuration + * + * @max_data_retry_count: maximum supported per TID retry count for + * configuration through the %NL80211_TID_CONFIG_ATTR_RETRY_SHORT and + * %NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes */ struct wiphy { /* assign these fields before you register the wiphy */ From bab1a501e6587590dda4c6cd92250cfedcd1553f Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Wed, 1 Apr 2020 12:12:19 -0300 Subject: [PATCH 166/331] tools arch x86: Sync the msr-index.h copy with the kernel sources To pick up the changes in: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Which causes these changes in tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-04-01 12:11:14.789344795 -0300 +++ after 2020-04-01 12:11:56.907798879 -0300 @@ -10,6 +10,7 @@ [0x00000029] = "KNC_EVNTSEL1", [0x0000002a] = "IA32_EBL_CR_POWERON", [0x0000002c] = "EBC_FREQUENCY_ID", + [0x00000033] = "TEST_CTRL", [0x00000034] = "SMI_COUNT", [0x0000003a] = "IA32_FEAT_CTL", [0x0000003b] = "IA32_TSC_ADJUST", @@ -27,6 +28,7 @@ [0x000000c2] = "IA32_PERFCTR1", [0x000000cd] = "FSB_FREQ", [0x000000ce] = "PLATFORM_INFO", + [0x000000cf] = "IA32_CORE_CAPS", [0x000000e2] = "PKG_CST_CONFIG_CONTROL", [0x000000e7] = "IA32_MPERF", [0x000000e8] = "IA32_APERF", $ $ make -C tools/perf O=/tmp/build/perf install-bin CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf Now one can do: perf trace -e msr:* --filter=msr==IA32_CORE_CAPS or: perf trace -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' And see only those MSRs being accessed via: # perf trace -v -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL' New filter for msr:read_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:write_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) New filter for msr:rdpmc: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250) Cc: Adrian Hunter Cc: Borislav Petkov Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra (Intel) Link: https://lore.kernel.org/lkml/20200401153325.GC12534@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/x86/include/asm/msr-index.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index d5e517d1c3dd..12c9684d59ba 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -41,6 +41,10 @@ /* Intel MSRs. Some also available on other CPUs */ +#define MSR_TEST_CTRL 0x00000033 +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29 +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT) + #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ #define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */ #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ @@ -70,6 +74,11 @@ */ #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U) +/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */ +#define MSR_IA32_CORE_CAPS 0x000000cf +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5 +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT) + #define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2 #define NHM_C3_AUTO_DEMOTE (1UL << 25) #define NHM_C1_AUTO_DEMOTE (1UL << 26) From 9a00df311b5c1dfb7284ea22070772803dd0c95e Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 6 Apr 2020 10:30:16 -0300 Subject: [PATCH 167/331] perf python: Check if clang supports -fno-semantic-interposition The set of C compiler options used by distros to build python bindings may include options that are unknown to clang, we check for a variety of such options, add -fno-semantic-interposition to that mix: This fixes the build on, among others, Manjaro Linux: GEN /tmp/build/perf/python/perf.so clang-9: error: unknown argument: '-fno-semantic-interposition' error: command 'clang' failed with exit status 1 make: Leaving directory '/git/perf/tools/perf' [perfbuilder@602aed1c266d ~]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc Thread model: posix gcc version 9.3.0 (Arch Linux 9.3.0-1) [perfbuilder@602aed1c266d ~]$ Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/setup.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/util/setup.py b/tools/perf/util/setup.py index 347b2c0789e4..c5e3e9a68162 100644 --- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -21,6 +21,8 @@ if cc_is_clang: vars[var] = sub("-fstack-clash-protection", "", vars[var]) if not clang_has_option("-fstack-protector-strong"): vars[var] = sub("-fstack-protector-strong", "", vars[var]) + if not clang_has_option("-fno-semantic-interposition"): + vars[var] = sub("-fno-semantic-interposition", "", vars[var]) from distutils.core import setup, Extension From 8358f698ec9d8467ad00c045e4d83c3e4acc7db4 Mon Sep 17 00:00:00 2001 From: Jin Yao Date: Wed, 1 Apr 2020 02:02:26 +0800 Subject: [PATCH 168/331] perf stat: Fix no metric header if --per-socket and --metric-only set We received a report that was no metric header displayed if --per-socket and --metric-only were both set. It's hard for script to parse the perf-stat output. This patch fixes this issue. Before: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': S0 8 2.6 2.215270071 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus 1.000411692 S0 8 2.2 2.001547952 S0 8 3.4 3.002446511 S0 8 3.4 4.003346157 S0 8 4.0 5.004245736 S0 8 0.3 After: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': CPI S0 8 2.1 1.813579830 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus CPI 1.000415122 S0 8 3.2 2.001630051 S0 8 2.9 3.002612278 S0 8 4.3 4.003523594 S0 8 3.0 5.004504256 S0 8 3.7 Signed-off-by: Jin Yao Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/stat-shadow.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 0fd713d3674f..03ecb8cd0eec 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -803,8 +803,11 @@ static void generic_metric(struct perf_stat_config *config, out->force_header ? (metric_name ? metric_name : name) : "", 0); } - } else - print_metric(config, ctxp, NULL, NULL, "", 0); + } else { + print_metric(config, ctxp, NULL, NULL, + out->force_header ? + (metric_name ? metric_name : name) : "", 0); + } for (i = 1; i < pctx.num_ids; i++) zfree(&pctx.ids[i].name); From ca64d84e93762f4e587e040a44ad9f6089afc777 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 08:52:23 -0300 Subject: [PATCH 169/331] tools headers: Update linux/vdso.h and grab a copy of vdso/const.h To get in line with: 8165b57bca21 ("linux/const.h: Extract common header for vDSO") And silence this tools/perf/ build warning: Warning: Kernel ABI header at 'tools/include/linux/const.h' differs from latest version at 'include/linux/const.h' diff -u tools/include/linux/const.h include/linux/const.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Thomas Gleixner Cc: Vincenzo Frascino Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/linux/const.h | 5 +---- tools/include/vdso/const.h | 10 ++++++++++ tools/perf/check-headers.sh | 1 + 3 files changed, 12 insertions(+), 4 deletions(-) create mode 100644 tools/include/vdso/const.h diff --git a/tools/include/linux/const.h b/tools/include/linux/const.h index 7b55a55f5911..81b8aae5a855 100644 --- a/tools/include/linux/const.h +++ b/tools/include/linux/const.h @@ -1,9 +1,6 @@ #ifndef _LINUX_CONST_H #define _LINUX_CONST_H -#include - -#define UL(x) (_UL(x)) -#define ULL(x) (_ULL(x)) +#include #endif /* _LINUX_CONST_H */ diff --git a/tools/include/vdso/const.h b/tools/include/vdso/const.h new file mode 100644 index 000000000000..94b385ad438d --- /dev/null +++ b/tools/include/vdso/const.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_CONST_H +#define __VDSO_CONST_H + +#include + +#define UL(x) (_UL(x)) +#define ULL(x) (_ULL(x)) + +#endif /* __VDSO_CONST_H */ diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh index bfb21d049e6c..c905c683606a 100755 --- a/tools/perf/check-headers.sh +++ b/tools/perf/check-headers.sh @@ -23,6 +23,7 @@ include/uapi/linux/vhost.h include/uapi/sound/asound.h include/linux/bits.h include/linux/const.h +include/vdso/const.h include/linux/hash.h include/uapi/linux/hw_breakpoint.h arch/x86/include/asm/disabled-features.h From 027fa8fb63635fb984a86ec3106bc8417e074019 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:01:08 -0300 Subject: [PATCH 170/331] tools headers UAPI: Sync sched.h with the kernel To get the changes in: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Add that to 'perf trace's clone 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h' diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h Cc: Adrian Hunter Cc: Christian Brauner Cc: Jiri Olsa Cc: Namhyung Kim Cc: Tejun Heo Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/sched.h | 5 +++++ tools/perf/trace/beauty/clone.c | 1 + 2 files changed, 6 insertions(+) diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h index 2e3bc22c6f20..3bac0a8ceab2 100644 --- a/tools/include/uapi/linux/sched.h +++ b/tools/include/uapi/linux/sched.h @@ -35,6 +35,7 @@ /* Flags for the clone3() syscall. */ #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ +#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */ /* * cloning flags intersect with CSIGNAL so can be used with unshare and clone3 @@ -81,6 +82,8 @@ * @set_tid_size: This defines the size of the array referenced * in @set_tid. This cannot be larger than the * kernel's limit of nested PID namespaces. + * @cgroup: If CLONE_INTO_CGROUP is specified set this to + * a file descriptor for the cgroup. * * The structure is versioned by size and thus extensible. * New struct members must go at the end of the struct and @@ -97,11 +100,13 @@ struct clone_args { __aligned_u64 tls; __aligned_u64 set_tid; __aligned_u64 set_tid_size; + __aligned_u64 cgroup; }; #endif #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */ #define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */ +#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */ /* * Scheduling policies diff --git a/tools/perf/trace/beauty/clone.c b/tools/perf/trace/beauty/clone.c index 062ca849c8fd..f4db894e0af6 100644 --- a/tools/perf/trace/beauty/clone.c +++ b/tools/perf/trace/beauty/clone.c @@ -46,6 +46,7 @@ static size_t clone__scnprintf_flags(unsigned long flags, char *bf, size_t size, P_FLAG(NEWNET); P_FLAG(IO); P_FLAG(CLEAR_SIGHAND); + P_FLAG(INTO_CGROUP); #undef P_FLAG if (flags) From f60b3878f47311a61fe2d4c5ef77c52e31554c52 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:04:53 -0300 Subject: [PATCH 171/331] tools headers UAPI: Sync linux/mman.h with the kernel To get the changes in: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Add that to 'perf trace's mremap 'flags' decoder. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h' diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Brian Geffon Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/mman.h | 5 +++-- tools/perf/trace/beauty/mmap.c | 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h index fc1a64c3447b..923cc162609c 100644 --- a/tools/include/uapi/linux/mman.h +++ b/tools/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include #include -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/tools/perf/trace/beauty/mmap.c b/tools/perf/trace/beauty/mmap.c index 9fa771a90d79..862c8331dded 100644 --- a/tools/perf/trace/beauty/mmap.c +++ b/tools/perf/trace/beauty/mmap.c @@ -69,6 +69,7 @@ static size_t syscall_arg__scnprintf_mremap_flags(char *bf, size_t size, P_MREMAP_FLAG(MAYMOVE); P_MREMAP_FLAG(FIXED); + P_MREMAP_FLAG(DONTUNMAP); #undef P_MREMAP_FLAG if (flags) From e00a2d907ec9bf0e8a46543857f25ce95980c341 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:08:23 -0300 Subject: [PATCH 172/331] tools arch x86: Sync asm/cpufeatures.h with the kernel sources To pick up the changes from: 077168e241ec ("x86/mce/amd: Add PPIN support for AMD MCE") 753039ef8b2f ("x86/cpu/amd: Call init_amd_zn() om Family 19h processors too") 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") These don't cause any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Borislav Petkov Cc: Kim Phillips Cc: Peter Zijlstra (Intel) Cc: Wei Huang Signed-off-by: Arnaldo Carvalho de Melo --- tools/arch/x86/include/asm/cpufeatures.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index f3327cb56edf..db189945e9b0 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -217,7 +217,7 @@ #define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ -#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ +#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 or above (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ #define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */ @@ -285,6 +285,7 @@ #define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */ #define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */ #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */ +#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ @@ -299,6 +300,7 @@ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ #define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_AMD_STIBP_ALWAYS_ON (13*32+17) /* "" Single Thread Indirect Branch Predictors always-on preferred */ +#define X86_FEATURE_AMD_PPIN (13*32+23) /* Protected Processor Inventory Number */ #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */ @@ -367,6 +369,7 @@ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ +#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */ #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ /* From 7dc7c41607d192ff660ba4ea82d517745c1d7523 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 8 Apr 2020 20:53:51 +0200 Subject: [PATCH 173/331] rtw88: avoid unused function warnings The rtw88 driver defines emtpy functions with multiple indirections but gets one of these wrong: drivers/net/wireless/realtek/rtw88/pci.c:1347:12: error: 'rtw_pci_resume' defined but not used [-Werror=unused-function] 1347 | static int rtw_pci_resume(struct device *dev) | ^~~~~~~~~~~~~~ drivers/net/wireless/realtek/rtw88/pci.c:1342:12: error: 'rtw_pci_suspend' defined but not used [-Werror=unused-function] 1342 | static int rtw_pci_suspend(struct device *dev) Better simplify it to rely on the conditional reference in SIMPLE_DEV_PM_OPS(), and mark the functions as __maybe_unused to avoid warning about it. I'm not sure if these are needed at all given that the functions don't do anything, but they were only recently added. Fixes: 44bc17f7f5b3 ("rtw88: support wowlan feature for 8822c") Signed-off-by: Arnd Bergmann Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200408185413.218643-1-arnd@arndb.de --- drivers/net/wireless/realtek/rtw88/pci.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c index e37c71495c0d..1af87eb2e53a 100644 --- a/drivers/net/wireless/realtek/rtw88/pci.c +++ b/drivers/net/wireless/realtek/rtw88/pci.c @@ -1338,22 +1338,17 @@ static void rtw_pci_phy_cfg(struct rtw_dev *rtwdev) rtw_pci_link_cfg(rtwdev); } -#ifdef CONFIG_PM -static int rtw_pci_suspend(struct device *dev) +static int __maybe_unused rtw_pci_suspend(struct device *dev) { return 0; } -static int rtw_pci_resume(struct device *dev) +static int __maybe_unused rtw_pci_resume(struct device *dev) { return 0; } static SIMPLE_DEV_PM_OPS(rtw_pm_ops, rtw_pci_suspend, rtw_pci_resume); -#define RTW_PM_OPS (&rtw_pm_ops) -#else -#define RTW_PM_OPS NULL -#endif static int rtw_pci_claim(struct rtw_dev *rtwdev, struct pci_dev *pdev) { @@ -1582,7 +1577,7 @@ static struct pci_driver rtw_pci_driver = { .id_table = rtw_pci_id_table, .probe = rtw_pci_probe, .remove = rtw_pci_remove, - .driver.pm = RTW_PM_OPS, + .driver.pm = &rtw_pm_ops, }; module_pci_driver(rtw_pci_driver); From 6b51fd3f65a22e3d1471b18a1d56247e246edd46 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 26 Mar 2020 09:03:58 +0100 Subject: [PATCH 174/331] xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status xenbus_map_ring_valloc() maps a ring page and returns the status of the used grant (0 meaning success). There are Xen hypervisors which might return the value 1 for the status of a failed grant mapping due to a bug. Some callers of xenbus_map_ring_valloc() test for errors by testing the returned status to be less than zero, resulting in no error detected and crashing later due to a not available ring page. Set the return value of xenbus_map_ring_valloc() to GNTST_general_error in case the grant status reported by Xen is greater than zero. This is part of XSA-316. Signed-off-by: Juergen Gross Reviewed-by: Wei Liu Link: https://lore.kernel.org/r/20200326080358.1018-1-jgross@suse.com Signed-off-by: Juergen Gross --- drivers/xen/xenbus/xenbus_client.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 385843256865..040d2a43e8e3 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -448,7 +448,14 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn); int xenbus_map_ring_valloc(struct xenbus_device *dev, grant_ref_t *gnt_refs, unsigned int nr_grefs, void **vaddr) { - return ring_ops->map(dev, gnt_refs, nr_grefs, vaddr); + int err; + + err = ring_ops->map(dev, gnt_refs, nr_grefs, vaddr); + /* Some hypervisors are buggy and can return 1. */ + if (err > 0) + err = GNTST_general_error; + + return err; } EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc); From 3df4d4bf3c6cf3b7509ff11d7b02b64603b43d24 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:12:55 -0300 Subject: [PATCH 175/331] tools include UAPI: Sync linux/vhost.h with the kernel sources To get the changes in: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h This automatically picks these new ioctls, making tools such as 'perf trace' aware of them and possibly allowing to use the strings in filters, etc: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:12:28.559748968 -0300 +++ after 2020-04-14 09:12:38.781696242 -0300 @@ -24,9 +24,16 @@ [0x44] = "SCSI_GET_EVENTS_MISSED", [0x60] = "VSOCK_SET_GUEST_CID", [0x61] = "VSOCK_SET_RUNNING", + [0x72] = "VDPA_SET_STATUS", + [0x74] = "VDPA_SET_CONFIG", + [0x75] = "VDPA_SET_VRING_ENABLE", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", [0x12] = "GET_VRING_BASE", [0x26] = "GET_BACKEND_FEATURES", + [0x70] = "VDPA_GET_DEVICE_ID", + [0x71] = "VDPA_GET_STATUS", + [0x73] = "VDPA_GET_CONFIG", + [0x76] = "VDPA_GET_VRING_NUM", }; $ Cc: Adrian Hunter Cc: Jiri Olsa Cc: Michael S. Tsirkin Cc: Namhyung Kim Cc: Tiwei Bie Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/vhost.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/tools/include/uapi/linux/vhost.h b/tools/include/uapi/linux/vhost.h index 40d028eed645..9fe72e4b1373 100644 --- a/tools/include/uapi/linux/vhost.h +++ b/tools/include/uapi/linux/vhost.h @@ -116,4 +116,28 @@ #define VHOST_VSOCK_SET_GUEST_CID _IOW(VHOST_VIRTIO, 0x60, __u64) #define VHOST_VSOCK_SET_RUNNING _IOW(VHOST_VIRTIO, 0x61, int) +/* VHOST_VDPA specific defines */ + +/* Get the device id. The device ids follow the same definition of + * the device id defined in virtio-spec. + */ +#define VHOST_VDPA_GET_DEVICE_ID _IOR(VHOST_VIRTIO, 0x70, __u32) +/* Get and set the status. The status bits follow the same definition + * of the device status defined in virtio-spec. + */ +#define VHOST_VDPA_GET_STATUS _IOR(VHOST_VIRTIO, 0x71, __u8) +#define VHOST_VDPA_SET_STATUS _IOW(VHOST_VIRTIO, 0x72, __u8) +/* Get and set the device config. The device config follows the same + * definition of the device config defined in virtio-spec. + */ +#define VHOST_VDPA_GET_CONFIG _IOR(VHOST_VIRTIO, 0x73, \ + struct vhost_vdpa_config) +#define VHOST_VDPA_SET_CONFIG _IOW(VHOST_VIRTIO, 0x74, \ + struct vhost_vdpa_config) +/* Enable/disable the ring. */ +#define VHOST_VDPA_SET_VRING_ENABLE _IOW(VHOST_VIRTIO, 0x75, \ + struct vhost_vring_state) +/* Get the max ring size. */ +#define VHOST_VDPA_GET_VRING_NUM _IOR(VHOST_VIRTIO, 0x76, __u16) + #endif From 1abcb9d96dada352c7721105f4efec16784ae87c Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:17:22 -0300 Subject: [PATCH 176/331] tools headers UAPI: Sync linux/fscrypt.h with the kernel sources To pick the changes from: e98ad464750c ("fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl") That don't trigger any changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h In time we should come up with something like: $ tools/perf/trace/beauty/fsconfig.sh static const char *fsconfig_cmds[] = { [0] = "SET_FLAG", [1] = "SET_STRING", [2] = "SET_BINARY", [3] = "SET_PATH", [4] = "SET_PATH_EMPTY", [5] = "SET_FD", [6] = "CMD_CREATE", [7] = "CMD_RECONFIGURE", }; $ And: $ tools/perf/trace/beauty/drm_ioctl.sh | head #ifndef DRM_COMMAND_BASE #define DRM_COMMAND_BASE 0x40 #endif static const char *drm_ioctl_cmds[] = { [0x00] = "VERSION", [0x01] = "GET_UNIQUE", [0x02] = "GET_MAGIC", [0x03] = "IRQ_BUSID", [0x04] = "GET_MAP", [0x05] = "GET_CLIENT", $ For fscrypt's ioctls. Cc: Adrian Hunter Cc: Eric Biggers Cc: Jiri Olsa Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/fscrypt.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/include/uapi/linux/fscrypt.h b/tools/include/uapi/linux/fscrypt.h index 0d8a6f47711c..a10e3cdc2839 100644 --- a/tools/include/uapi/linux/fscrypt.h +++ b/tools/include/uapi/linux/fscrypt.h @@ -163,6 +163,7 @@ struct fscrypt_get_key_status_arg { #define FS_IOC_REMOVE_ENCRYPTION_KEY _IOWR('f', 24, struct fscrypt_remove_key_arg) #define FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS _IOWR('f', 25, struct fscrypt_remove_key_arg) #define FS_IOC_GET_ENCRYPTION_KEY_STATUS _IOWR('f', 26, struct fscrypt_get_key_status_arg) +#define FS_IOC_GET_ENCRYPTION_NONCE _IOR('f', 27, __u8[16]) /**********************************************************************/ From b8fc22803e594dee6597de4c81ccab5b37abecbb Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:21:56 -0300 Subject: [PATCH 177/331] tools headers kvm: Sync linux/kvm.h with the kernel sources To pick up the changes from: 9a5788c615f5 ("KVM: PPC: Book3S HV: Add a capability for enabling secure guests") 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks") 13da9ae1cdbf ("KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED") e0d2773d487c ("KVM: s390: protvirt: UV calls in support of diag308 0, 1") 19e122776886 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer") 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") So far we're ignoring those arch specific ioctls, we need to revisit this at some time to have arch specific tables, etc: $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \ $ This addresses these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h' diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h Cc: Adrian Hunter Cc: Christian Borntraeger Cc: Janosch Frank Cc: Jay Zhou Cc: Jiri Olsa Cc: Namhyung Kim Cc: Paolo Bonzini Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/linux/kvm.h | 47 ++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 4b95f9a31a2f..428c7dde6b4b 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -474,12 +474,17 @@ struct kvm_s390_mem_op { __u32 size; /* amount of bytes */ __u32 op; /* type of operation */ __u64 buf; /* buffer in userspace */ - __u8 ar; /* the access register number */ - __u8 reserved[31]; /* should be set to 0 */ + union { + __u8 ar; /* the access register number */ + __u32 sida_offset; /* offset into the sida */ + __u8 reserved[32]; /* should be set to 0 */ + }; }; /* types for kvm_s390_mem_op->op */ #define KVM_S390_MEMOP_LOGICAL_READ 0 #define KVM_S390_MEMOP_LOGICAL_WRITE 1 +#define KVM_S390_MEMOP_SIDA_READ 2 +#define KVM_S390_MEMOP_SIDA_WRITE 3 /* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) @@ -1010,6 +1015,8 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_NISV_TO_USER 177 #define KVM_CAP_ARM_INJECT_EXT_DABT 178 #define KVM_CAP_S390_VCPU_RESETS 179 +#define KVM_CAP_S390_PROTECTED 180 +#define KVM_CAP_PPC_SECURE_GUEST 181 #ifdef KVM_CAP_IRQ_ROUTING @@ -1478,6 +1485,39 @@ struct kvm_enc_region { #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3) #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4) +struct kvm_s390_pv_sec_parm { + __u64 origin; + __u64 length; +}; + +struct kvm_s390_pv_unp { + __u64 addr; + __u64 size; + __u64 tweak; +}; + +enum pv_cmd_id { + KVM_PV_ENABLE, + KVM_PV_DISABLE, + KVM_PV_SET_SEC_PARMS, + KVM_PV_UNPACK, + KVM_PV_VERIFY, + KVM_PV_PREP_RESET, + KVM_PV_UNSHARE_ALL, +}; + +struct kvm_pv_cmd { + __u32 cmd; /* Command to be executed */ + __u16 rc; /* Ultravisor return code */ + __u16 rrc; /* Ultravisor return reason code */ + __u64 data; /* Data or address */ + __u32 flags; /* flags for future extensions. Must be 0 for now */ + __u32 reserved[3]; +}; + +/* Available with KVM_CAP_S390_PROTECTED */ +#define KVM_S390_PV_COMMAND _IOWR(KVMIO, 0xc5, struct kvm_pv_cmd) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1628,4 +1668,7 @@ struct kvm_hyperv_eventfd { #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) +#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) +#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) + #endif /* __LINUX_KVM_H */ From 0719bdf46737c9060f67e35824af5bfff5084083 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:29:03 -0300 Subject: [PATCH 178/331] tools headers UAPI: Update tools's copy of drm.h headers Picking the changes from: 455e00f1412f ("drm: Add getfb2 ioctl") Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated tables will be able to translate this new ioctl code into a string: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after --- before 2020-04-14 09:28:45.461821077 -0300 +++ after 2020-04-14 09:28:53.594782685 -0300 @@ -107,6 +107,7 @@ [0xCB] = "SYNCOBJ_QUERY", [0xCC] = "SYNCOBJ_TRANSFER", [0xCD] = "SYNCOBJ_TIMELINE_SIGNAL", + [0xCE] = "MODE_GETFB2", [DRM_COMMAND_BASE + 0x00] = "I915_INIT", [DRM_COMMAND_BASE + 0x01] = "I915_FLUSH", [DRM_COMMAND_BASE + 0x02] = "I915_FLIP", $ Cc: Adrian Hunter Cc: Daniel Stone Cc: Jiri Olsa Cc: Lyude Paul Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/drm/drm.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/include/uapi/drm/drm.h b/tools/include/uapi/drm/drm.h index 868bf7996c0f..808b48a93330 100644 --- a/tools/include/uapi/drm/drm.h +++ b/tools/include/uapi/drm/drm.h @@ -948,6 +948,8 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct drm_syncobj_transfer) #define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL DRM_IOWR(0xCD, struct drm_syncobj_timeline_array) +#define DRM_IOCTL_MODE_GETFB2 DRM_IOWR(0xCE, struct drm_mode_fb_cmd2) + /** * Device specific ioctls should only be in their respective headers * The device specific ioctl range is from 0x40 to 0x9f. From 54a58ebc66cea54de056888e0afdda2983f00e0e Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:40:01 -0300 Subject: [PATCH 179/331] tools headers UAPI: Sync drm/i915_drm.h with the kernel sources To pick the change in: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction") That don't result in any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter Cc: Chris Wilson Cc: Jiri Olsa Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/uapi/drm/i915_drm.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h index 829c0a48577f..2813e579b480 100644 --- a/tools/include/uapi/drm/i915_drm.h +++ b/tools/include/uapi/drm/i915_drm.h @@ -1619,6 +1619,27 @@ struct drm_i915_gem_context_param { * By default, new contexts allow persistence. */ #define I915_CONTEXT_PARAM_PERSISTENCE 0xb + +/* + * I915_CONTEXT_PARAM_RINGSIZE: + * + * Sets the size of the CS ringbuffer to use for logical ring contexts. This + * applies a limit of how many batches can be queued to HW before the caller + * is blocked due to lack of space for more commands. + * + * Only reliably possible to be set prior to first use, i.e. during + * construction. At any later point, the current execution must be flushed as + * the ring can only be changed while the context is idle. Note, the ringsize + * can be specified as a constructor property, see + * I915_CONTEXT_CREATE_EXT_SETPARAM, but can also be set later if required. + * + * Only applies to the current set of engine and lost when those engines + * are replaced by a new mapping (see I915_CONTEXT_PARAM_ENGINES). + * + * Must be between 4 - 512 KiB, in intervals of page size [4 KiB]. + * Default is 16 KiB. + */ +#define I915_CONTEXT_PARAM_RINGSIZE 0xc /* Must be kept compact -- no holes and well documented */ __u64 value; From d8ed4d7aeb1e5f2315f801516d1953ea0af88a22 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 09:47:52 -0300 Subject: [PATCH 180/331] tools headers: Update x86's syscall_64.tbl with the kernel sources To pick the changes from: d3b1b776eefc ("x86/entry/64: Remove ptregs qualifier from syscall table") cab56d3484d4 ("x86/entry: Remove ABI prefixes from functions in syscall tables") 27dd84fafcd5 ("x86/entry/64: Use syscall wrappers for x32_rt_sigreturn") Addressing this tools/perf build warning: Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl That didn't result in any tooling changes, as what is extracted are just the first two columns, and these patches touched only the third. $ cp /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp $ cp arch/x86/entry/syscalls/syscall_64.tbl tools/perf/arch/x86/entry/syscalls/syscall_64.tbl $ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j12' parallel build DESCEND plugins CC /tmp/build/perf/util/syscalltbl.o INSTALL trace_plugins LD /tmp/build/perf/util/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf $ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp/syscalls_64.c $ Cc: Adrian Hunter Cc: Brian Gerst Cc: Jiri Olsa Cc: Namhyung Kim Cc: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo --- .../arch/x86/entry/syscalls/syscall_64.tbl | 740 +++++++++--------- 1 file changed, 370 insertions(+), 370 deletions(-) diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl index 44d510bc9b78..37b844f839bc 100644 --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl @@ -8,357 +8,357 @@ # # The abi is "common", "64" or "x32" for this file. # -0 common read __x64_sys_read -1 common write __x64_sys_write -2 common open __x64_sys_open -3 common close __x64_sys_close -4 common stat __x64_sys_newstat -5 common fstat __x64_sys_newfstat -6 common lstat __x64_sys_newlstat -7 common poll __x64_sys_poll -8 common lseek __x64_sys_lseek -9 common mmap __x64_sys_mmap -10 common mprotect __x64_sys_mprotect -11 common munmap __x64_sys_munmap -12 common brk __x64_sys_brk -13 64 rt_sigaction __x64_sys_rt_sigaction -14 common rt_sigprocmask __x64_sys_rt_sigprocmask -15 64 rt_sigreturn __x64_sys_rt_sigreturn/ptregs -16 64 ioctl __x64_sys_ioctl -17 common pread64 __x64_sys_pread64 -18 common pwrite64 __x64_sys_pwrite64 -19 64 readv __x64_sys_readv -20 64 writev __x64_sys_writev -21 common access __x64_sys_access -22 common pipe __x64_sys_pipe -23 common select __x64_sys_select -24 common sched_yield __x64_sys_sched_yield -25 common mremap __x64_sys_mremap -26 common msync __x64_sys_msync -27 common mincore __x64_sys_mincore -28 common madvise __x64_sys_madvise -29 common shmget __x64_sys_shmget -30 common shmat __x64_sys_shmat -31 common shmctl __x64_sys_shmctl -32 common dup __x64_sys_dup -33 common dup2 __x64_sys_dup2 -34 common pause __x64_sys_pause -35 common nanosleep __x64_sys_nanosleep -36 common getitimer __x64_sys_getitimer -37 common alarm __x64_sys_alarm -38 common setitimer __x64_sys_setitimer -39 common getpid __x64_sys_getpid -40 common sendfile __x64_sys_sendfile64 -41 common socket __x64_sys_socket -42 common connect __x64_sys_connect -43 common accept __x64_sys_accept -44 common sendto __x64_sys_sendto -45 64 recvfrom __x64_sys_recvfrom -46 64 sendmsg __x64_sys_sendmsg -47 64 recvmsg __x64_sys_recvmsg -48 common shutdown __x64_sys_shutdown -49 common bind __x64_sys_bind -50 common listen __x64_sys_listen -51 common getsockname __x64_sys_getsockname -52 common getpeername __x64_sys_getpeername -53 common socketpair __x64_sys_socketpair -54 64 setsockopt __x64_sys_setsockopt -55 64 getsockopt __x64_sys_getsockopt -56 common clone __x64_sys_clone/ptregs -57 common fork __x64_sys_fork/ptregs -58 common vfork __x64_sys_vfork/ptregs -59 64 execve __x64_sys_execve/ptregs -60 common exit __x64_sys_exit -61 common wait4 __x64_sys_wait4 -62 common kill __x64_sys_kill -63 common uname __x64_sys_newuname -64 common semget __x64_sys_semget -65 common semop __x64_sys_semop -66 common semctl __x64_sys_semctl -67 common shmdt __x64_sys_shmdt -68 common msgget __x64_sys_msgget -69 common msgsnd __x64_sys_msgsnd -70 common msgrcv __x64_sys_msgrcv -71 common msgctl __x64_sys_msgctl -72 common fcntl __x64_sys_fcntl -73 common flock __x64_sys_flock -74 common fsync __x64_sys_fsync -75 common fdatasync __x64_sys_fdatasync -76 common truncate __x64_sys_truncate -77 common ftruncate __x64_sys_ftruncate -78 common getdents __x64_sys_getdents -79 common getcwd __x64_sys_getcwd -80 common chdir __x64_sys_chdir -81 common fchdir __x64_sys_fchdir -82 common rename __x64_sys_rename -83 common mkdir __x64_sys_mkdir -84 common rmdir __x64_sys_rmdir -85 common creat __x64_sys_creat -86 common link __x64_sys_link -87 common unlink __x64_sys_unlink -88 common symlink __x64_sys_symlink -89 common readlink __x64_sys_readlink -90 common chmod __x64_sys_chmod -91 common fchmod __x64_sys_fchmod -92 common chown __x64_sys_chown -93 common fchown __x64_sys_fchown -94 common lchown __x64_sys_lchown -95 common umask __x64_sys_umask -96 common gettimeofday __x64_sys_gettimeofday -97 common getrlimit __x64_sys_getrlimit -98 common getrusage __x64_sys_getrusage -99 common sysinfo __x64_sys_sysinfo -100 common times __x64_sys_times -101 64 ptrace __x64_sys_ptrace -102 common getuid __x64_sys_getuid -103 common syslog __x64_sys_syslog -104 common getgid __x64_sys_getgid -105 common setuid __x64_sys_setuid -106 common setgid __x64_sys_setgid -107 common geteuid __x64_sys_geteuid -108 common getegid __x64_sys_getegid -109 common setpgid __x64_sys_setpgid -110 common getppid __x64_sys_getppid -111 common getpgrp __x64_sys_getpgrp -112 common setsid __x64_sys_setsid -113 common setreuid __x64_sys_setreuid -114 common setregid __x64_sys_setregid -115 common getgroups __x64_sys_getgroups -116 common setgroups __x64_sys_setgroups -117 common setresuid __x64_sys_setresuid -118 common getresuid __x64_sys_getresuid -119 common setresgid __x64_sys_setresgid -120 common getresgid __x64_sys_getresgid -121 common getpgid __x64_sys_getpgid -122 common setfsuid __x64_sys_setfsuid -123 common setfsgid __x64_sys_setfsgid -124 common getsid __x64_sys_getsid -125 common capget __x64_sys_capget -126 common capset __x64_sys_capset -127 64 rt_sigpending __x64_sys_rt_sigpending -128 64 rt_sigtimedwait __x64_sys_rt_sigtimedwait -129 64 rt_sigqueueinfo __x64_sys_rt_sigqueueinfo -130 common rt_sigsuspend __x64_sys_rt_sigsuspend -131 64 sigaltstack __x64_sys_sigaltstack -132 common utime __x64_sys_utime -133 common mknod __x64_sys_mknod +0 common read sys_read +1 common write sys_write +2 common open sys_open +3 common close sys_close +4 common stat sys_newstat +5 common fstat sys_newfstat +6 common lstat sys_newlstat +7 common poll sys_poll +8 common lseek sys_lseek +9 common mmap sys_mmap +10 common mprotect sys_mprotect +11 common munmap sys_munmap +12 common brk sys_brk +13 64 rt_sigaction sys_rt_sigaction +14 common rt_sigprocmask sys_rt_sigprocmask +15 64 rt_sigreturn sys_rt_sigreturn +16 64 ioctl sys_ioctl +17 common pread64 sys_pread64 +18 common pwrite64 sys_pwrite64 +19 64 readv sys_readv +20 64 writev sys_writev +21 common access sys_access +22 common pipe sys_pipe +23 common select sys_select +24 common sched_yield sys_sched_yield +25 common mremap sys_mremap +26 common msync sys_msync +27 common mincore sys_mincore +28 common madvise sys_madvise +29 common shmget sys_shmget +30 common shmat sys_shmat +31 common shmctl sys_shmctl +32 common dup sys_dup +33 common dup2 sys_dup2 +34 common pause sys_pause +35 common nanosleep sys_nanosleep +36 common getitimer sys_getitimer +37 common alarm sys_alarm +38 common setitimer sys_setitimer +39 common getpid sys_getpid +40 common sendfile sys_sendfile64 +41 common socket sys_socket +42 common connect sys_connect +43 common accept sys_accept +44 common sendto sys_sendto +45 64 recvfrom sys_recvfrom +46 64 sendmsg sys_sendmsg +47 64 recvmsg sys_recvmsg +48 common shutdown sys_shutdown +49 common bind sys_bind +50 common listen sys_listen +51 common getsockname sys_getsockname +52 common getpeername sys_getpeername +53 common socketpair sys_socketpair +54 64 setsockopt sys_setsockopt +55 64 getsockopt sys_getsockopt +56 common clone sys_clone +57 common fork sys_fork +58 common vfork sys_vfork +59 64 execve sys_execve +60 common exit sys_exit +61 common wait4 sys_wait4 +62 common kill sys_kill +63 common uname sys_newuname +64 common semget sys_semget +65 common semop sys_semop +66 common semctl sys_semctl +67 common shmdt sys_shmdt +68 common msgget sys_msgget +69 common msgsnd sys_msgsnd +70 common msgrcv sys_msgrcv +71 common msgctl sys_msgctl +72 common fcntl sys_fcntl +73 common flock sys_flock +74 common fsync sys_fsync +75 common fdatasync sys_fdatasync +76 common truncate sys_truncate +77 common ftruncate sys_ftruncate +78 common getdents sys_getdents +79 common getcwd sys_getcwd +80 common chdir sys_chdir +81 common fchdir sys_fchdir +82 common rename sys_rename +83 common mkdir sys_mkdir +84 common rmdir sys_rmdir +85 common creat sys_creat +86 common link sys_link +87 common unlink sys_unlink +88 common symlink sys_symlink +89 common readlink sys_readlink +90 common chmod sys_chmod +91 common fchmod sys_fchmod +92 common chown sys_chown +93 common fchown sys_fchown +94 common lchown sys_lchown +95 common umask sys_umask +96 common gettimeofday sys_gettimeofday +97 common getrlimit sys_getrlimit +98 common getrusage sys_getrusage +99 common sysinfo sys_sysinfo +100 common times sys_times +101 64 ptrace sys_ptrace +102 common getuid sys_getuid +103 common syslog sys_syslog +104 common getgid sys_getgid +105 common setuid sys_setuid +106 common setgid sys_setgid +107 common geteuid sys_geteuid +108 common getegid sys_getegid +109 common setpgid sys_setpgid +110 common getppid sys_getppid +111 common getpgrp sys_getpgrp +112 common setsid sys_setsid +113 common setreuid sys_setreuid +114 common setregid sys_setregid +115 common getgroups sys_getgroups +116 common setgroups sys_setgroups +117 common setresuid sys_setresuid +118 common getresuid sys_getresuid +119 common setresgid sys_setresgid +120 common getresgid sys_getresgid +121 common getpgid sys_getpgid +122 common setfsuid sys_setfsuid +123 common setfsgid sys_setfsgid +124 common getsid sys_getsid +125 common capget sys_capget +126 common capset sys_capset +127 64 rt_sigpending sys_rt_sigpending +128 64 rt_sigtimedwait sys_rt_sigtimedwait +129 64 rt_sigqueueinfo sys_rt_sigqueueinfo +130 common rt_sigsuspend sys_rt_sigsuspend +131 64 sigaltstack sys_sigaltstack +132 common utime sys_utime +133 common mknod sys_mknod 134 64 uselib -135 common personality __x64_sys_personality -136 common ustat __x64_sys_ustat -137 common statfs __x64_sys_statfs -138 common fstatfs __x64_sys_fstatfs -139 common sysfs __x64_sys_sysfs -140 common getpriority __x64_sys_getpriority -141 common setpriority __x64_sys_setpriority -142 common sched_setparam __x64_sys_sched_setparam -143 common sched_getparam __x64_sys_sched_getparam -144 common sched_setscheduler __x64_sys_sched_setscheduler -145 common sched_getscheduler __x64_sys_sched_getscheduler -146 common sched_get_priority_max __x64_sys_sched_get_priority_max -147 common sched_get_priority_min __x64_sys_sched_get_priority_min -148 common sched_rr_get_interval __x64_sys_sched_rr_get_interval -149 common mlock __x64_sys_mlock -150 common munlock __x64_sys_munlock -151 common mlockall __x64_sys_mlockall -152 common munlockall __x64_sys_munlockall -153 common vhangup __x64_sys_vhangup -154 common modify_ldt __x64_sys_modify_ldt -155 common pivot_root __x64_sys_pivot_root -156 64 _sysctl __x64_sys_sysctl -157 common prctl __x64_sys_prctl -158 common arch_prctl __x64_sys_arch_prctl -159 common adjtimex __x64_sys_adjtimex -160 common setrlimit __x64_sys_setrlimit -161 common chroot __x64_sys_chroot -162 common sync __x64_sys_sync -163 common acct __x64_sys_acct -164 common settimeofday __x64_sys_settimeofday -165 common mount __x64_sys_mount -166 common umount2 __x64_sys_umount -167 common swapon __x64_sys_swapon -168 common swapoff __x64_sys_swapoff -169 common reboot __x64_sys_reboot -170 common sethostname __x64_sys_sethostname -171 common setdomainname __x64_sys_setdomainname -172 common iopl __x64_sys_iopl/ptregs -173 common ioperm __x64_sys_ioperm +135 common personality sys_personality +136 common ustat sys_ustat +137 common statfs sys_statfs +138 common fstatfs sys_fstatfs +139 common sysfs sys_sysfs +140 common getpriority sys_getpriority +141 common setpriority sys_setpriority +142 common sched_setparam sys_sched_setparam +143 common sched_getparam sys_sched_getparam +144 common sched_setscheduler sys_sched_setscheduler +145 common sched_getscheduler sys_sched_getscheduler +146 common sched_get_priority_max sys_sched_get_priority_max +147 common sched_get_priority_min sys_sched_get_priority_min +148 common sched_rr_get_interval sys_sched_rr_get_interval +149 common mlock sys_mlock +150 common munlock sys_munlock +151 common mlockall sys_mlockall +152 common munlockall sys_munlockall +153 common vhangup sys_vhangup +154 common modify_ldt sys_modify_ldt +155 common pivot_root sys_pivot_root +156 64 _sysctl sys_sysctl +157 common prctl sys_prctl +158 common arch_prctl sys_arch_prctl +159 common adjtimex sys_adjtimex +160 common setrlimit sys_setrlimit +161 common chroot sys_chroot +162 common sync sys_sync +163 common acct sys_acct +164 common settimeofday sys_settimeofday +165 common mount sys_mount +166 common umount2 sys_umount +167 common swapon sys_swapon +168 common swapoff sys_swapoff +169 common reboot sys_reboot +170 common sethostname sys_sethostname +171 common setdomainname sys_setdomainname +172 common iopl sys_iopl +173 common ioperm sys_ioperm 174 64 create_module -175 common init_module __x64_sys_init_module -176 common delete_module __x64_sys_delete_module +175 common init_module sys_init_module +176 common delete_module sys_delete_module 177 64 get_kernel_syms 178 64 query_module -179 common quotactl __x64_sys_quotactl +179 common quotactl sys_quotactl 180 64 nfsservctl 181 common getpmsg 182 common putpmsg 183 common afs_syscall 184 common tuxcall 185 common security -186 common gettid __x64_sys_gettid -187 common readahead __x64_sys_readahead -188 common setxattr __x64_sys_setxattr -189 common lsetxattr __x64_sys_lsetxattr -190 common fsetxattr __x64_sys_fsetxattr -191 common getxattr __x64_sys_getxattr -192 common lgetxattr __x64_sys_lgetxattr -193 common fgetxattr __x64_sys_fgetxattr -194 common listxattr __x64_sys_listxattr -195 common llistxattr __x64_sys_llistxattr -196 common flistxattr __x64_sys_flistxattr -197 common removexattr __x64_sys_removexattr -198 common lremovexattr __x64_sys_lremovexattr -199 common fremovexattr __x64_sys_fremovexattr -200 common tkill __x64_sys_tkill -201 common time __x64_sys_time -202 common futex __x64_sys_futex -203 common sched_setaffinity __x64_sys_sched_setaffinity -204 common sched_getaffinity __x64_sys_sched_getaffinity +186 common gettid sys_gettid +187 common readahead sys_readahead +188 common setxattr sys_setxattr +189 common lsetxattr sys_lsetxattr +190 common fsetxattr sys_fsetxattr +191 common getxattr sys_getxattr +192 common lgetxattr sys_lgetxattr +193 common fgetxattr sys_fgetxattr +194 common listxattr sys_listxattr +195 common llistxattr sys_llistxattr +196 common flistxattr sys_flistxattr +197 common removexattr sys_removexattr +198 common lremovexattr sys_lremovexattr +199 common fremovexattr sys_fremovexattr +200 common tkill sys_tkill +201 common time sys_time +202 common futex sys_futex +203 common sched_setaffinity sys_sched_setaffinity +204 common sched_getaffinity sys_sched_getaffinity 205 64 set_thread_area -206 64 io_setup __x64_sys_io_setup -207 common io_destroy __x64_sys_io_destroy -208 common io_getevents __x64_sys_io_getevents -209 64 io_submit __x64_sys_io_submit -210 common io_cancel __x64_sys_io_cancel +206 64 io_setup sys_io_setup +207 common io_destroy sys_io_destroy +208 common io_getevents sys_io_getevents +209 64 io_submit sys_io_submit +210 common io_cancel sys_io_cancel 211 64 get_thread_area -212 common lookup_dcookie __x64_sys_lookup_dcookie -213 common epoll_create __x64_sys_epoll_create +212 common lookup_dcookie sys_lookup_dcookie +213 common epoll_create sys_epoll_create 214 64 epoll_ctl_old 215 64 epoll_wait_old -216 common remap_file_pages __x64_sys_remap_file_pages -217 common getdents64 __x64_sys_getdents64 -218 common set_tid_address __x64_sys_set_tid_address -219 common restart_syscall __x64_sys_restart_syscall -220 common semtimedop __x64_sys_semtimedop -221 common fadvise64 __x64_sys_fadvise64 -222 64 timer_create __x64_sys_timer_create -223 common timer_settime __x64_sys_timer_settime -224 common timer_gettime __x64_sys_timer_gettime -225 common timer_getoverrun __x64_sys_timer_getoverrun -226 common timer_delete __x64_sys_timer_delete -227 common clock_settime __x64_sys_clock_settime -228 common clock_gettime __x64_sys_clock_gettime -229 common clock_getres __x64_sys_clock_getres -230 common clock_nanosleep __x64_sys_clock_nanosleep -231 common exit_group __x64_sys_exit_group -232 common epoll_wait __x64_sys_epoll_wait -233 common epoll_ctl __x64_sys_epoll_ctl -234 common tgkill __x64_sys_tgkill -235 common utimes __x64_sys_utimes +216 common remap_file_pages sys_remap_file_pages +217 common getdents64 sys_getdents64 +218 common set_tid_address sys_set_tid_address +219 common restart_syscall sys_restart_syscall +220 common semtimedop sys_semtimedop +221 common fadvise64 sys_fadvise64 +222 64 timer_create sys_timer_create +223 common timer_settime sys_timer_settime +224 common timer_gettime sys_timer_gettime +225 common timer_getoverrun sys_timer_getoverrun +226 common timer_delete sys_timer_delete +227 common clock_settime sys_clock_settime +228 common clock_gettime sys_clock_gettime +229 common clock_getres sys_clock_getres +230 common clock_nanosleep sys_clock_nanosleep +231 common exit_group sys_exit_group +232 common epoll_wait sys_epoll_wait +233 common epoll_ctl sys_epoll_ctl +234 common tgkill sys_tgkill +235 common utimes sys_utimes 236 64 vserver -237 common mbind __x64_sys_mbind -238 common set_mempolicy __x64_sys_set_mempolicy -239 common get_mempolicy __x64_sys_get_mempolicy -240 common mq_open __x64_sys_mq_open -241 common mq_unlink __x64_sys_mq_unlink -242 common mq_timedsend __x64_sys_mq_timedsend -243 common mq_timedreceive __x64_sys_mq_timedreceive -244 64 mq_notify __x64_sys_mq_notify -245 common mq_getsetattr __x64_sys_mq_getsetattr -246 64 kexec_load __x64_sys_kexec_load -247 64 waitid __x64_sys_waitid -248 common add_key __x64_sys_add_key -249 common request_key __x64_sys_request_key -250 common keyctl __x64_sys_keyctl -251 common ioprio_set __x64_sys_ioprio_set -252 common ioprio_get __x64_sys_ioprio_get -253 common inotify_init __x64_sys_inotify_init -254 common inotify_add_watch __x64_sys_inotify_add_watch -255 common inotify_rm_watch __x64_sys_inotify_rm_watch -256 common migrate_pages __x64_sys_migrate_pages -257 common openat __x64_sys_openat -258 common mkdirat __x64_sys_mkdirat -259 common mknodat __x64_sys_mknodat -260 common fchownat __x64_sys_fchownat -261 common futimesat __x64_sys_futimesat -262 common newfstatat __x64_sys_newfstatat -263 common unlinkat __x64_sys_unlinkat -264 common renameat __x64_sys_renameat -265 common linkat __x64_sys_linkat -266 common symlinkat __x64_sys_symlinkat -267 common readlinkat __x64_sys_readlinkat -268 common fchmodat __x64_sys_fchmodat -269 common faccessat __x64_sys_faccessat -270 common pselect6 __x64_sys_pselect6 -271 common ppoll __x64_sys_ppoll -272 common unshare __x64_sys_unshare -273 64 set_robust_list __x64_sys_set_robust_list -274 64 get_robust_list __x64_sys_get_robust_list -275 common splice __x64_sys_splice -276 common tee __x64_sys_tee -277 common sync_file_range __x64_sys_sync_file_range -278 64 vmsplice __x64_sys_vmsplice -279 64 move_pages __x64_sys_move_pages -280 common utimensat __x64_sys_utimensat -281 common epoll_pwait __x64_sys_epoll_pwait -282 common signalfd __x64_sys_signalfd -283 common timerfd_create __x64_sys_timerfd_create -284 common eventfd __x64_sys_eventfd -285 common fallocate __x64_sys_fallocate -286 common timerfd_settime __x64_sys_timerfd_settime -287 common timerfd_gettime __x64_sys_timerfd_gettime -288 common accept4 __x64_sys_accept4 -289 common signalfd4 __x64_sys_signalfd4 -290 common eventfd2 __x64_sys_eventfd2 -291 common epoll_create1 __x64_sys_epoll_create1 -292 common dup3 __x64_sys_dup3 -293 common pipe2 __x64_sys_pipe2 -294 common inotify_init1 __x64_sys_inotify_init1 -295 64 preadv __x64_sys_preadv -296 64 pwritev __x64_sys_pwritev -297 64 rt_tgsigqueueinfo __x64_sys_rt_tgsigqueueinfo -298 common perf_event_open __x64_sys_perf_event_open -299 64 recvmmsg __x64_sys_recvmmsg -300 common fanotify_init __x64_sys_fanotify_init -301 common fanotify_mark __x64_sys_fanotify_mark -302 common prlimit64 __x64_sys_prlimit64 -303 common name_to_handle_at __x64_sys_name_to_handle_at -304 common open_by_handle_at __x64_sys_open_by_handle_at -305 common clock_adjtime __x64_sys_clock_adjtime -306 common syncfs __x64_sys_syncfs -307 64 sendmmsg __x64_sys_sendmmsg -308 common setns __x64_sys_setns -309 common getcpu __x64_sys_getcpu -310 64 process_vm_readv __x64_sys_process_vm_readv -311 64 process_vm_writev __x64_sys_process_vm_writev -312 common kcmp __x64_sys_kcmp -313 common finit_module __x64_sys_finit_module -314 common sched_setattr __x64_sys_sched_setattr -315 common sched_getattr __x64_sys_sched_getattr -316 common renameat2 __x64_sys_renameat2 -317 common seccomp __x64_sys_seccomp -318 common getrandom __x64_sys_getrandom -319 common memfd_create __x64_sys_memfd_create -320 common kexec_file_load __x64_sys_kexec_file_load -321 common bpf __x64_sys_bpf -322 64 execveat __x64_sys_execveat/ptregs -323 common userfaultfd __x64_sys_userfaultfd -324 common membarrier __x64_sys_membarrier -325 common mlock2 __x64_sys_mlock2 -326 common copy_file_range __x64_sys_copy_file_range -327 64 preadv2 __x64_sys_preadv2 -328 64 pwritev2 __x64_sys_pwritev2 -329 common pkey_mprotect __x64_sys_pkey_mprotect -330 common pkey_alloc __x64_sys_pkey_alloc -331 common pkey_free __x64_sys_pkey_free -332 common statx __x64_sys_statx -333 common io_pgetevents __x64_sys_io_pgetevents -334 common rseq __x64_sys_rseq +237 common mbind sys_mbind +238 common set_mempolicy sys_set_mempolicy +239 common get_mempolicy sys_get_mempolicy +240 common mq_open sys_mq_open +241 common mq_unlink sys_mq_unlink +242 common mq_timedsend sys_mq_timedsend +243 common mq_timedreceive sys_mq_timedreceive +244 64 mq_notify sys_mq_notify +245 common mq_getsetattr sys_mq_getsetattr +246 64 kexec_load sys_kexec_load +247 64 waitid sys_waitid +248 common add_key sys_add_key +249 common request_key sys_request_key +250 common keyctl sys_keyctl +251 common ioprio_set sys_ioprio_set +252 common ioprio_get sys_ioprio_get +253 common inotify_init sys_inotify_init +254 common inotify_add_watch sys_inotify_add_watch +255 common inotify_rm_watch sys_inotify_rm_watch +256 common migrate_pages sys_migrate_pages +257 common openat sys_openat +258 common mkdirat sys_mkdirat +259 common mknodat sys_mknodat +260 common fchownat sys_fchownat +261 common futimesat sys_futimesat +262 common newfstatat sys_newfstatat +263 common unlinkat sys_unlinkat +264 common renameat sys_renameat +265 common linkat sys_linkat +266 common symlinkat sys_symlinkat +267 common readlinkat sys_readlinkat +268 common fchmodat sys_fchmodat +269 common faccessat sys_faccessat +270 common pselect6 sys_pselect6 +271 common ppoll sys_ppoll +272 common unshare sys_unshare +273 64 set_robust_list sys_set_robust_list +274 64 get_robust_list sys_get_robust_list +275 common splice sys_splice +276 common tee sys_tee +277 common sync_file_range sys_sync_file_range +278 64 vmsplice sys_vmsplice +279 64 move_pages sys_move_pages +280 common utimensat sys_utimensat +281 common epoll_pwait sys_epoll_pwait +282 common signalfd sys_signalfd +283 common timerfd_create sys_timerfd_create +284 common eventfd sys_eventfd +285 common fallocate sys_fallocate +286 common timerfd_settime sys_timerfd_settime +287 common timerfd_gettime sys_timerfd_gettime +288 common accept4 sys_accept4 +289 common signalfd4 sys_signalfd4 +290 common eventfd2 sys_eventfd2 +291 common epoll_create1 sys_epoll_create1 +292 common dup3 sys_dup3 +293 common pipe2 sys_pipe2 +294 common inotify_init1 sys_inotify_init1 +295 64 preadv sys_preadv +296 64 pwritev sys_pwritev +297 64 rt_tgsigqueueinfo sys_rt_tgsigqueueinfo +298 common perf_event_open sys_perf_event_open +299 64 recvmmsg sys_recvmmsg +300 common fanotify_init sys_fanotify_init +301 common fanotify_mark sys_fanotify_mark +302 common prlimit64 sys_prlimit64 +303 common name_to_handle_at sys_name_to_handle_at +304 common open_by_handle_at sys_open_by_handle_at +305 common clock_adjtime sys_clock_adjtime +306 common syncfs sys_syncfs +307 64 sendmmsg sys_sendmmsg +308 common setns sys_setns +309 common getcpu sys_getcpu +310 64 process_vm_readv sys_process_vm_readv +311 64 process_vm_writev sys_process_vm_writev +312 common kcmp sys_kcmp +313 common finit_module sys_finit_module +314 common sched_setattr sys_sched_setattr +315 common sched_getattr sys_sched_getattr +316 common renameat2 sys_renameat2 +317 common seccomp sys_seccomp +318 common getrandom sys_getrandom +319 common memfd_create sys_memfd_create +320 common kexec_file_load sys_kexec_file_load +321 common bpf sys_bpf +322 64 execveat sys_execveat +323 common userfaultfd sys_userfaultfd +324 common membarrier sys_membarrier +325 common mlock2 sys_mlock2 +326 common copy_file_range sys_copy_file_range +327 64 preadv2 sys_preadv2 +328 64 pwritev2 sys_pwritev2 +329 common pkey_mprotect sys_pkey_mprotect +330 common pkey_alloc sys_pkey_alloc +331 common pkey_free sys_pkey_free +332 common statx sys_statx +333 common io_pgetevents sys_io_pgetevents +334 common rseq sys_rseq # don't use numbers 387 through 423, add new calls after the last # 'common' entry -424 common pidfd_send_signal __x64_sys_pidfd_send_signal -425 common io_uring_setup __x64_sys_io_uring_setup -426 common io_uring_enter __x64_sys_io_uring_enter -427 common io_uring_register __x64_sys_io_uring_register -428 common open_tree __x64_sys_open_tree -429 common move_mount __x64_sys_move_mount -430 common fsopen __x64_sys_fsopen -431 common fsconfig __x64_sys_fsconfig -432 common fsmount __x64_sys_fsmount -433 common fspick __x64_sys_fspick -434 common pidfd_open __x64_sys_pidfd_open -435 common clone3 __x64_sys_clone3/ptregs -437 common openat2 __x64_sys_openat2 -438 common pidfd_getfd __x64_sys_pidfd_getfd +424 common pidfd_send_signal sys_pidfd_send_signal +425 common io_uring_setup sys_io_uring_setup +426 common io_uring_enter sys_io_uring_enter +427 common io_uring_register sys_io_uring_register +428 common open_tree sys_open_tree +429 common move_mount sys_move_mount +430 common fsopen sys_fsopen +431 common fsconfig sys_fsconfig +432 common fsmount sys_fsmount +433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open +435 common clone3 sys_clone3 +437 common openat2 sys_openat2 +438 common pidfd_getfd sys_pidfd_getfd # # x32-specific system call numbers start at 512 to avoid cache impact @@ -366,39 +366,39 @@ # on-the-fly for compat_sys_*() compatibility system calls if X86_X32 # is defined. # -512 x32 rt_sigaction __x32_compat_sys_rt_sigaction -513 x32 rt_sigreturn sys32_x32_rt_sigreturn -514 x32 ioctl __x32_compat_sys_ioctl -515 x32 readv __x32_compat_sys_readv -516 x32 writev __x32_compat_sys_writev -517 x32 recvfrom __x32_compat_sys_recvfrom -518 x32 sendmsg __x32_compat_sys_sendmsg -519 x32 recvmsg __x32_compat_sys_recvmsg -520 x32 execve __x32_compat_sys_execve/ptregs -521 x32 ptrace __x32_compat_sys_ptrace -522 x32 rt_sigpending __x32_compat_sys_rt_sigpending -523 x32 rt_sigtimedwait __x32_compat_sys_rt_sigtimedwait_time64 -524 x32 rt_sigqueueinfo __x32_compat_sys_rt_sigqueueinfo -525 x32 sigaltstack __x32_compat_sys_sigaltstack -526 x32 timer_create __x32_compat_sys_timer_create -527 x32 mq_notify __x32_compat_sys_mq_notify -528 x32 kexec_load __x32_compat_sys_kexec_load -529 x32 waitid __x32_compat_sys_waitid -530 x32 set_robust_list __x32_compat_sys_set_robust_list -531 x32 get_robust_list __x32_compat_sys_get_robust_list -532 x32 vmsplice __x32_compat_sys_vmsplice -533 x32 move_pages __x32_compat_sys_move_pages -534 x32 preadv __x32_compat_sys_preadv64 -535 x32 pwritev __x32_compat_sys_pwritev64 -536 x32 rt_tgsigqueueinfo __x32_compat_sys_rt_tgsigqueueinfo -537 x32 recvmmsg __x32_compat_sys_recvmmsg_time64 -538 x32 sendmmsg __x32_compat_sys_sendmmsg -539 x32 process_vm_readv __x32_compat_sys_process_vm_readv -540 x32 process_vm_writev __x32_compat_sys_process_vm_writev -541 x32 setsockopt __x32_compat_sys_setsockopt -542 x32 getsockopt __x32_compat_sys_getsockopt -543 x32 io_setup __x32_compat_sys_io_setup -544 x32 io_submit __x32_compat_sys_io_submit -545 x32 execveat __x32_compat_sys_execveat/ptregs -546 x32 preadv2 __x32_compat_sys_preadv64v2 -547 x32 pwritev2 __x32_compat_sys_pwritev64v2 +512 x32 rt_sigaction compat_sys_rt_sigaction +513 x32 rt_sigreturn compat_sys_x32_rt_sigreturn +514 x32 ioctl compat_sys_ioctl +515 x32 readv compat_sys_readv +516 x32 writev compat_sys_writev +517 x32 recvfrom compat_sys_recvfrom +518 x32 sendmsg compat_sys_sendmsg +519 x32 recvmsg compat_sys_recvmsg +520 x32 execve compat_sys_execve +521 x32 ptrace compat_sys_ptrace +522 x32 rt_sigpending compat_sys_rt_sigpending +523 x32 rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 +524 x32 rt_sigqueueinfo compat_sys_rt_sigqueueinfo +525 x32 sigaltstack compat_sys_sigaltstack +526 x32 timer_create compat_sys_timer_create +527 x32 mq_notify compat_sys_mq_notify +528 x32 kexec_load compat_sys_kexec_load +529 x32 waitid compat_sys_waitid +530 x32 set_robust_list compat_sys_set_robust_list +531 x32 get_robust_list compat_sys_get_robust_list +532 x32 vmsplice compat_sys_vmsplice +533 x32 move_pages compat_sys_move_pages +534 x32 preadv compat_sys_preadv64 +535 x32 pwritev compat_sys_pwritev64 +536 x32 rt_tgsigqueueinfo compat_sys_rt_tgsigqueueinfo +537 x32 recvmmsg compat_sys_recvmmsg_time64 +538 x32 sendmmsg compat_sys_sendmmsg +539 x32 process_vm_readv compat_sys_process_vm_readv +540 x32 process_vm_writev compat_sys_process_vm_writev +541 x32 setsockopt compat_sys_setsockopt +542 x32 getsockopt compat_sys_getsockopt +543 x32 io_setup compat_sys_io_setup +544 x32 io_submit compat_sys_io_submit +545 x32 execveat compat_sys_execveat +546 x32 preadv2 compat_sys_preadv64v2 +547 x32 pwritev2 compat_sys_pwritev64v2 From 5b992add7d32d3106f121c85ad99d2986dc3b8e6 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 11:37:29 -0300 Subject: [PATCH 181/331] tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources Will be needed when syncing the linux/bits.h header, in the next cset. Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/linux/compiler.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h index 1827c2f973f9..180f7714a5f1 100644 --- a/tools/include/linux/compiler.h +++ b/tools/include/linux/compiler.h @@ -10,6 +10,32 @@ # define __compiletime_error(message) #endif +#ifdef __OPTIMIZE__ +# define __compiletime_assert(condition, msg, prefix, suffix) \ + do { \ + extern void prefix ## suffix(void) __compiletime_error(msg); \ + if (!(condition)) \ + prefix ## suffix(); \ + } while (0) +#else +# define __compiletime_assert(condition, msg, prefix, suffix) do { } while (0) +#endif + +#define _compiletime_assert(condition, msg, prefix, suffix) \ + __compiletime_assert(condition, msg, prefix, suffix) + +/** + * compiletime_assert - break build and emit msg if condition is false + * @condition: a compile-time constant condition to check + * @msg: a message to emit if condition is false + * + * In tradition of POSIX assert, this macro will break the build if the + * supplied condition is *false*, emitting the supplied error message if the + * compiler has support to do so. + */ +#define compiletime_assert(condition, msg) \ + _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) + /* Optimization barrier */ /* The "volatile" is due to gcc bugs */ #define barrier() __asm__ __volatile__("": : :"memory") From e3698b23ecb8c099b4b523e7d5c8c042e93ef15d Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 14 Apr 2020 10:27:39 -0300 Subject: [PATCH 182/331] tools headers: Synchronize linux/bits.h with the kernel sources To pick up the changes in these csets: 295bcca84916 ("linux/bits.h: add compile time sanity check of GENMASK inputs") 3945ff37d2f4 ("linux/bits.h: Extract common header for vDSO") To address this tools/perf build warning: Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h' diff -u tools/include/linux/bits.h include/linux/bits.h This clashes with usage of userspace's static_assert(), that, at least on glibc, is guarded by a ifnded/endif pair, do the same to our copy of build_bug.h and avoid that diff in check_headers.sh so that we continue checking for drifts with the kernel sources master copy. This will all be tested with the set of build containers that includes uCLibc, musl libc, lots of glibc versions in lots of distros and cross build environments. The tools/objtool, tools/bpf, etc were tested as well. Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Rikard Falkeborn Cc: Thomas Gleixner Cc: Vincenzo Frascino Signed-off-by: Arnaldo Carvalho de Melo --- tools/include/linux/bits.h | 24 ++++++++-- tools/include/linux/build_bug.h | 82 +++++++++++++++++++++++++++++++++ tools/include/linux/kernel.h | 4 +- tools/include/vdso/bits.h | 9 ++++ tools/perf/check-headers.sh | 2 + 5 files changed, 115 insertions(+), 6 deletions(-) create mode 100644 tools/include/linux/build_bug.h create mode 100644 tools/include/vdso/bits.h diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h index 669d69441a62..4671fbf28842 100644 --- a/tools/include/linux/bits.h +++ b/tools/include/linux/bits.h @@ -3,9 +3,9 @@ #define __LINUX_BITS_H #include +#include #include -#define BIT(nr) (UL(1) << (nr)) #define BIT_ULL(nr) (ULL(1) << (nr)) #define BIT_MASK(nr) (UL(1) << ((nr) % BITS_PER_LONG)) #define BIT_WORD(nr) ((nr) / BITS_PER_LONG) @@ -18,12 +18,30 @@ * position @h. For example * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. */ -#define GENMASK(h, l) \ +#if !defined(__ASSEMBLY__) && \ + (!defined(CONFIG_CC_IS_GCC) || CONFIG_GCC_VERSION >= 49000) +#include +#define GENMASK_INPUT_CHECK(h, l) \ + (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \ + __builtin_constant_p((l) > (h)), (l) > (h), 0))) +#else +/* + * BUILD_BUG_ON_ZERO is not available in h files included from asm files, + * disable the input check if that is the case. + */ +#define GENMASK_INPUT_CHECK(h, l) 0 +#endif + +#define __GENMASK(h, l) \ (((~UL(0)) - (UL(1) << (l)) + 1) & \ (~UL(0) >> (BITS_PER_LONG - 1 - (h)))) +#define GENMASK(h, l) \ + (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) -#define GENMASK_ULL(h, l) \ +#define __GENMASK_ULL(h, l) \ (((~ULL(0)) - (ULL(1) << (l)) + 1) & \ (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h)))) +#define GENMASK_ULL(h, l) \ + (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) #endif /* __LINUX_BITS_H */ diff --git a/tools/include/linux/build_bug.h b/tools/include/linux/build_bug.h new file mode 100644 index 000000000000..cc7070c7439b --- /dev/null +++ b/tools/include/linux/build_bug.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_BUILD_BUG_H +#define _LINUX_BUILD_BUG_H + +#include + +#ifdef __CHECKER__ +#define BUILD_BUG_ON_ZERO(e) (0) +#else /* __CHECKER__ */ +/* + * Force a compilation error if condition is true, but also produce a + * result (of value 0 and type int), so the expression can be used + * e.g. in a structure initializer (or where-ever else comma expressions + * aren't permitted). + */ +#define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) +#endif /* __CHECKER__ */ + +/* Force a compilation error if a constant expression is not a power of 2 */ +#define __BUILD_BUG_ON_NOT_POWER_OF_2(n) \ + BUILD_BUG_ON(((n) & ((n) - 1)) != 0) +#define BUILD_BUG_ON_NOT_POWER_OF_2(n) \ + BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0)) + +/* + * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the + * expression but avoids the generation of any code, even if that expression + * has side-effects. + */ +#define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e)))) + +/** + * BUILD_BUG_ON_MSG - break compile if a condition is true & emit supplied + * error message. + * @condition: the condition which the compiler should know is false. + * + * See BUILD_BUG_ON for description. + */ +#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) + +/** + * BUILD_BUG_ON - break compile if a condition is true. + * @condition: the condition which the compiler should know is false. + * + * If you have some code which relies on certain constants being equal, or + * some other compile-time-evaluated condition, you should use BUILD_BUG_ON to + * detect if someone changes it. + */ +#define BUILD_BUG_ON(condition) \ + BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) + +/** + * BUILD_BUG - break compile if used. + * + * If you have some code that you expect the compiler to eliminate at + * build time, you should use BUILD_BUG to detect if it is + * unexpectedly used. + */ +#define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") + +/** + * static_assert - check integer constant expression at build time + * + * static_assert() is a wrapper for the C11 _Static_assert, with a + * little macro magic to make the message optional (defaulting to the + * stringification of the tested expression). + * + * Contrary to BUILD_BUG_ON(), static_assert() can be used at global + * scope, but requires the expression to be an integer constant + * expression (i.e., it is not enough that __builtin_constant_p() is + * true for expr). + * + * Also note that BUILD_BUG_ON() fails the build if the condition is + * true, while static_assert() fails the build if the expression is + * false. + */ +#ifndef static_assert +#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) +#define __static_assert(expr, msg, ...) _Static_assert(expr, msg) +#endif // static_assert + +#endif /* _LINUX_BUILD_BUG_H */ diff --git a/tools/include/linux/kernel.h b/tools/include/linux/kernel.h index cba226948a0c..a7e54a08fb54 100644 --- a/tools/include/linux/kernel.h +++ b/tools/include/linux/kernel.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -35,9 +36,6 @@ (type *)((char *)__mptr - offsetof(type, member)); }) #endif -#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)])) -#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); })) - #ifndef max #define max(x, y) ({ \ typeof(x) _max1 = (x); \ diff --git a/tools/include/vdso/bits.h b/tools/include/vdso/bits.h new file mode 100644 index 000000000000..6d005a1f5d94 --- /dev/null +++ b/tools/include/vdso/bits.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_BITS_H +#define __VDSO_BITS_H + +#include + +#define BIT(nr) (UL(1) << (nr)) + +#endif /* __VDSO_BITS_H */ diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh index c905c683606a..cf147db4e5ca 100755 --- a/tools/perf/check-headers.sh +++ b/tools/perf/check-headers.sh @@ -22,6 +22,7 @@ include/uapi/linux/usbdevice_fs.h include/uapi/linux/vhost.h include/uapi/sound/asound.h include/linux/bits.h +include/vdso/bits.h include/linux/const.h include/vdso/const.h include/linux/hash.h @@ -116,6 +117,7 @@ check arch/x86/lib/memcpy_64.S '-I "^EXPORT_SYMBOL" -I "^#include " -I"^SYM_FUNC_START\(_LOCAL\)*(memset_\(erms\|orig\))"' check include/uapi/asm-generic/mman.h '-I "^#include <\(uapi/\)*asm-generic/mman-common\(-tools\)*.h>"' check include/uapi/linux/mman.h '-I "^#include <\(uapi/\)*asm/mman.h>"' +check include/linux/build_bug.h '-I "^#\(ifndef\|endif\)\( \/\/\)* static_assert$"' check include/linux/ctype.h '-I "isdigit("' check lib/ctype.c '-I "^EXPORT_SYMBOL" -I "^#include " -B' check arch/x86/include/asm/inat.h '-I "^#include [\"<]\(asm/\)*inat_types.h[\">]"' From 9a6418487b566503c772cb6e7d3d44e652b019b0 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Tue, 14 Apr 2020 22:27:25 +0800 Subject: [PATCH 183/331] ALSA: hda: call runtime_allow() for all hda controllers Before the pci_driver->probe() is called, the pci subsystem calls runtime_forbid() and runtime_get_sync() on this pci dev, so only call runtime_put_autosuspend() is not enough to enable the runtime_pm on this device. For controllers with vgaswitcheroo feature, the pci/quirks.c will call runtime_allow() for this dev, then the controllers could enter rt_idle/suspend/resume, but for non-vgaswitcheroo controllers like Intel hda controllers, the runtime_pm is not enabled because the runtime_allow() is not called. Since it is no harm calling runtime_allow() twice, here let hda driver call runtime_allow() for all controllers. Then the runtime_pm is enabled on all controllers after the put_autosuspend() is called. Signed-off-by: Hui Wang Link: https://lore.kernel.org/r/20200414142725.6020-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 8519051a426e..a5fab12defde 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2356,6 +2356,7 @@ static int azx_probe_continue(struct azx *chip) if (azx_has_pm_runtime(chip)) { pm_runtime_use_autosuspend(&pci->dev); + pm_runtime_allow(&pci->dev); pm_runtime_put_autosuspend(&pci->dev); } From bdf89df3c54518eed879d8fac7577fcfb220c67e Mon Sep 17 00:00:00 2001 From: John Allen Date: Thu, 9 Apr 2020 10:34:29 -0500 Subject: [PATCH 184/331] x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE Future AMD CPUs will have microcode patches that exceed the default 4K patch size. Raise our limit. Signed-off-by: John Allen Signed-off-by: Borislav Petkov Cc: stable@vger.kernel.org # v4.14.. Link: https://lkml.kernel.org/r/20200409152931.GA685273@mojo.amd.com --- arch/x86/include/asm/microcode_amd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/microcode_amd.h b/arch/x86/include/asm/microcode_amd.h index 6685e1218959..7063b5a43220 100644 --- a/arch/x86/include/asm/microcode_amd.h +++ b/arch/x86/include/asm/microcode_amd.h @@ -41,7 +41,7 @@ struct microcode_amd { unsigned int mpb[0]; }; -#define PATCH_MAX_SIZE PAGE_SIZE +#define PATCH_MAX_SIZE (3 * PAGE_SIZE) #ifdef CONFIG_MICROCODE_AMD extern void __init load_ucode_amd_bsp(unsigned int family); From b2a7e9735ab2864330be9d00d7f38c961c28de5d Mon Sep 17 00:00:00 2001 From: Prike Liang Date: Mon, 13 Apr 2020 21:41:14 +0800 Subject: [PATCH 185/331] drm/amdgpu: fix the hw hang during perform system reboot and reset The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang Tested-by: Mengbing Wang Tested-by: Paul Menzel Acked-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7d35b0a366a2..f84f9e35a73b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2356,6 +2356,8 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev) { int i, r; + amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); + amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE); for (i = adev->num_ip_blocks - 1; i >= 0; i--) { if (!adev->ip_blocks[i].status.valid) From 974229db7e6c1f2ff83ceaf3022d5128bf62caca Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 9 Apr 2020 09:40:01 -0400 Subject: [PATCH 186/331] drm/amdgpu/gfx9: add gfxoff quirk Fix screen corruption with firefox. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171 Reviewed-by: Huang Rui Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index e6b113ed2f40..0c390485bc10 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1234,6 +1234,8 @@ struct amdgpu_gfxoff_quirk { static const struct amdgpu_gfxoff_quirk amdgpu_gfxoff_quirk_list[] = { /* https://bugzilla.kernel.org/show_bug.cgi?id=204689 */ { 0x1002, 0x15dd, 0x1002, 0x15dd, 0xc8 }, + /* https://bugzilla.kernel.org/show_bug.cgi?id=207171 */ + { 0x1002, 0x15dd, 0x103c, 0x83e7, 0xd3 }, { 0, 0, 0, 0, 0 }, }; From 4178417cc5359c329790a4a8f4a6604612338cca Mon Sep 17 00:00:00 2001 From: Luke Nelson Date: Thu, 9 Apr 2020 15:17:52 -0700 Subject: [PATCH 187/331] arm, bpf: Fix offset overflow for BPF_MEM BPF_DW This patch fixes an incorrect check in how immediate memory offsets are computed for BPF_DW on arm. For BPF_LDX/ST/STX + BPF_DW, the 32-bit arm JIT breaks down an 8-byte access into two separate 4-byte accesses using off+0 and off+4. If off fits in imm12, the JIT emits a ldr/str instruction with the immediate and avoids the use of a temporary register. While the current check off <= 0xfff ensures that the first immediate off+0 doesn't overflow imm12, it's not sufficient for the second immediate off+4, which may cause the second access of BPF_DW to read/write the wrong address. This patch fixes the problem by changing the check to off <= 0xfff - 4 for BPF_DW, ensuring off+4 will never overflow. A side effect of simplifying the check is that it now allows using negative immediate offsets in ldr/str. This means that small negative offsets can also avoid the use of a temporary register. This patch introduces no new failures in test_verifier or test_bpf.c. Fixes: c5eae692571d6 ("ARM: net: bpf: improve 64-bit store implementation") Fixes: ec19e02b343db ("ARM: net: bpf: fix LDX instructions") Co-developed-by: Xi Wang Signed-off-by: Xi Wang Signed-off-by: Luke Nelson Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200409221752.28448-1-luke.r.nels@gmail.com --- arch/arm/net/bpf_jit_32.c | 40 +++++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index d124f78e20ac..bf85d6db4931 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1000,21 +1000,35 @@ static inline void emit_a32_mul_r64(const s8 dst[], const s8 src[], arm_bpf_put_reg32(dst_hi, rd[0], ctx); } +static bool is_ldst_imm(s16 off, const u8 size) +{ + s16 off_max = 0; + + switch (size) { + case BPF_B: + case BPF_W: + off_max = 0xfff; + break; + case BPF_H: + off_max = 0xff; + break; + case BPF_DW: + /* Need to make sure off+4 does not overflow. */ + off_max = 0xfff - 4; + break; + } + return -off_max <= off && off <= off_max; +} + /* *(size *)(dst + off) = src */ static inline void emit_str_r(const s8 dst, const s8 src[], - s32 off, struct jit_ctx *ctx, const u8 sz){ + s16 off, struct jit_ctx *ctx, const u8 sz){ const s8 *tmp = bpf2a32[TMP_REG_1]; - s32 off_max; s8 rd; rd = arm_bpf_get_reg32(dst, tmp[1], ctx); - if (sz == BPF_H) - off_max = 0xff; - else - off_max = 0xfff; - - if (off < 0 || off > off_max) { + if (!is_ldst_imm(off, sz)) { emit_a32_mov_i(tmp[0], off, ctx); emit(ARM_ADD_R(tmp[0], tmp[0], rd), ctx); rd = tmp[0]; @@ -1043,18 +1057,12 @@ static inline void emit_str_r(const s8 dst, const s8 src[], /* dst = *(size*)(src + off) */ static inline void emit_ldx_r(const s8 dst[], const s8 src, - s32 off, struct jit_ctx *ctx, const u8 sz){ + s16 off, struct jit_ctx *ctx, const u8 sz){ const s8 *tmp = bpf2a32[TMP_REG_1]; const s8 *rd = is_stacked(dst_lo) ? tmp : dst; s8 rm = src; - s32 off_max; - if (sz == BPF_H) - off_max = 0xff; - else - off_max = 0xfff; - - if (off < 0 || off > off_max) { + if (!is_ldst_imm(off, sz)) { emit_a32_mov_i(tmp[0], off, ctx); emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx); rm = tmp[0]; From 1f6cb19be2e231fe092f40decb71f066eba090d7 Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Fri, 10 Apr 2020 13:26:12 -0700 Subject: [PATCH 188/331] bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed pages can be later remapped as writable ones through mprotect() call. To prevent user application to rewrite contents of memory-mapped as read-only and subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially read-only mapping. Alternatively, we could treat any memory-mapping on unfrozen map as writable and bump writecnt instead. But there is little legitimate reason to map BPF map as read-only and then re-mmap() it as writable through mprotect(), instead of just mmap()'ing it as read/write from the very beginning. Also, at the suggestion of Jann Horn, drop unnecessary refcounting in mmap operations. We can just rely on VMA holding reference to BPF map's file properly. Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") Reported-by: Jann Horn Signed-off-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Reviewed-by: Jann Horn Link: https://lore.kernel.org/bpf/20200410202613.3679837-1-andriin@fb.com --- kernel/bpf/syscall.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 64783da34202..d85f37239540 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -586,9 +586,7 @@ static void bpf_map_mmap_open(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - bpf_map_inc_with_uref(map); - - if (vma->vm_flags & VM_WRITE) { + if (vma->vm_flags & VM_MAYWRITE) { mutex_lock(&map->freeze_mutex); map->writecnt++; mutex_unlock(&map->freeze_mutex); @@ -600,13 +598,11 @@ static void bpf_map_mmap_close(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - if (vma->vm_flags & VM_WRITE) { + if (vma->vm_flags & VM_MAYWRITE) { mutex_lock(&map->freeze_mutex); map->writecnt--; mutex_unlock(&map->freeze_mutex); } - - bpf_map_put_with_uref(map); } static const struct vm_operations_struct bpf_map_default_vmops = { @@ -635,14 +631,16 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) /* set default open/close callbacks */ vma->vm_ops = &bpf_map_default_vmops; vma->vm_private_data = map; + vma->vm_flags &= ~VM_MAYEXEC; + if (!(vma->vm_flags & VM_WRITE)) + /* disallow re-mapping with PROT_WRITE */ + vma->vm_flags &= ~VM_MAYWRITE; err = map->ops->map_mmap(map, vma); if (err) goto out; - bpf_map_inc_with_uref(map); - - if (vma->vm_flags & VM_WRITE) + if (vma->vm_flags & VM_MAYWRITE) map->writecnt++; out: mutex_unlock(&map->freeze_mutex); From 642c1654702731ab42a3be771bebbd6ef938f0dc Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Fri, 10 Apr 2020 13:26:13 -0700 Subject: [PATCH 189/331] selftests/bpf: Validate frozen map contents stays frozen Test that frozen and mmap()'ed BPF map can't be mprotect()'ed as writable or executable memory. Also validate that "downgrading" from writable to read-only doesn't screw up internal writable count accounting for the purposes of map freezing. Signed-off-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200410202613.3679837-2-andriin@fb.com --- tools/testing/selftests/bpf/prog_tests/mmap.c | 62 ++++++++++++++++++- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/mmap.c b/tools/testing/selftests/bpf/prog_tests/mmap.c index 16a814eb4d64..56d80adcf4bd 100644 --- a/tools/testing/selftests/bpf/prog_tests/mmap.c +++ b/tools/testing/selftests/bpf/prog_tests/mmap.c @@ -19,15 +19,16 @@ void test_mmap(void) const size_t map_sz = roundup_page(sizeof(struct map_data)); const int zero = 0, one = 1, two = 2, far = 1500; const long page_size = sysconf(_SC_PAGE_SIZE); - int err, duration = 0, i, data_map_fd; + int err, duration = 0, i, data_map_fd, data_map_id, tmp_fd; struct bpf_map *data_map, *bss_map; void *bss_mmaped = NULL, *map_mmaped = NULL, *tmp1, *tmp2; struct test_mmap__bss *bss_data; + struct bpf_map_info map_info; + __u32 map_info_sz = sizeof(map_info); struct map_data *map_data; struct test_mmap *skel; __u64 val = 0; - skel = test_mmap__open_and_load(); if (CHECK(!skel, "skel_open_and_load", "skeleton open/load failed\n")) return; @@ -36,6 +37,14 @@ void test_mmap(void) data_map = skel->maps.data_map; data_map_fd = bpf_map__fd(data_map); + /* get map's ID */ + memset(&map_info, 0, map_info_sz); + err = bpf_obj_get_info_by_fd(data_map_fd, &map_info, &map_info_sz); + if (CHECK(err, "map_get_info", "failed %d\n", errno)) + goto cleanup; + data_map_id = map_info.id; + + /* mmap BSS map */ bss_mmaped = mmap(NULL, bss_sz, PROT_READ | PROT_WRITE, MAP_SHARED, bpf_map__fd(bss_map), 0); if (CHECK(bss_mmaped == MAP_FAILED, "bss_mmap", @@ -98,6 +107,10 @@ void test_mmap(void) "data_map freeze succeeded: err=%d, errno=%d\n", err, errno)) goto cleanup; + err = mprotect(map_mmaped, map_sz, PROT_READ); + if (CHECK(err, "mprotect_ro", "mprotect to r/o failed %d\n", errno)) + goto cleanup; + /* unmap R/W mapping */ err = munmap(map_mmaped, map_sz); map_mmaped = NULL; @@ -111,6 +124,12 @@ void test_mmap(void) map_mmaped = NULL; goto cleanup; } + err = mprotect(map_mmaped, map_sz, PROT_WRITE); + if (CHECK(!err, "mprotect_wr", "mprotect() succeeded unexpectedly!\n")) + goto cleanup; + err = mprotect(map_mmaped, map_sz, PROT_EXEC); + if (CHECK(!err, "mprotect_ex", "mprotect() succeeded unexpectedly!\n")) + goto cleanup; map_data = map_mmaped; /* map/unmap in a loop to test ref counting */ @@ -197,6 +216,45 @@ void test_mmap(void) CHECK_FAIL(map_data->val[far] != 3 * 321); munmap(tmp2, 4 * page_size); + + tmp1 = mmap(NULL, map_sz, PROT_READ, MAP_SHARED, data_map_fd, 0); + if (CHECK(tmp1 == MAP_FAILED, "last_mmap", "failed %d\n", errno)) + goto cleanup; + + test_mmap__destroy(skel); + skel = NULL; + CHECK_FAIL(munmap(bss_mmaped, bss_sz)); + bss_mmaped = NULL; + CHECK_FAIL(munmap(map_mmaped, map_sz)); + map_mmaped = NULL; + + /* map should be still held by active mmap */ + tmp_fd = bpf_map_get_fd_by_id(data_map_id); + if (CHECK(tmp_fd < 0, "get_map_by_id", "failed %d\n", errno)) { + munmap(tmp1, map_sz); + goto cleanup; + } + close(tmp_fd); + + /* this should release data map finally */ + munmap(tmp1, map_sz); + + /* we need to wait for RCU grace period */ + for (i = 0; i < 10000; i++) { + __u32 id = data_map_id - 1; + if (bpf_map_get_next_id(id, &id) || id > data_map_id) + break; + usleep(1); + } + + /* should fail to get map FD by non-existing ID */ + tmp_fd = bpf_map_get_fd_by_id(data_map_id); + if (CHECK(tmp_fd >= 0, "get_map_by_id_after", + "unexpectedly succeeded %d\n", tmp_fd)) { + close(tmp_fd); + goto cleanup; + } + cleanup: if (bss_mmaped) CHECK_FAIL(munmap(bss_mmaped, bss_sz)); From 96b2eb6e77959b4b52f80e7a61d03db77606aac6 Mon Sep 17 00:00:00 2001 From: "Daniel T. Lee" Date: Fri, 10 Apr 2020 11:06:12 +0900 Subject: [PATCH 190/331] tools, bpftool: Fix struct_ops command invalid pointer free In commit 65c93628599d ("bpftool: Add struct_ops support") a new type of command named struct_ops has been added. This command requires a kernel with CONFIG_DEBUG_INFO_BTF=y set and for retrieving BTF info in bpftool, the helper get_btf_vmlinux() is used. When running this command on kernel without BTF debug info, this will lead to 'btf_vmlinux' variable being an invalid(error) pointer. And by this, btf_free() causes a segfault when executing 'bpftool struct_ops'. This commit adds pointer validation with IS_ERR not to free invalid pointer, and this will fix the segfault issue. Fixes: 65c93628599d ("bpftool: Add struct_ops support") Signed-off-by: Daniel T. Lee Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20200410020612.2930667-1-danieltimlee@gmail.com --- tools/bpf/bpftool/struct_ops.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/struct_ops.c b/tools/bpf/bpftool/struct_ops.c index 2a7befbd11ad..0fe0d584c57e 100644 --- a/tools/bpf/bpftool/struct_ops.c +++ b/tools/bpf/bpftool/struct_ops.c @@ -591,6 +591,8 @@ int do_struct_ops(int argc, char **argv) err = cmd_select(cmds, argc, argv, do_help); - btf__free(btf_vmlinux); + if (!IS_ERR(btf_vmlinux)) + btf__free(btf_vmlinux); + return err; } From dfa74909cb6b846cbdabfc2c3c7de1d507fca075 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Sun, 12 Apr 2020 07:32:04 -0600 Subject: [PATCH 191/331] xdp: Reset prog in dev_change_xdp_fd when fd is negative MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The commit mentioned in the Fixes tag reuses the local prog variable when looking up an expected_fd. The variable is not reset when fd < 0 causing a detach with the expected_fd set to actually call dev_xdp_install for the existing program. The end result is that the detach does not happen. Fixes: 92234c8f15c8 ("xdp: Support specifying expected existing program when attaching XDP") Signed-off-by: David Ahern Signed-off-by: Daniel Borkmann Reviewed-by: Jakub Kicinski Reviewed-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20200412133204.43847-1-dsahern@kernel.org --- net/core/dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index df8097b8e286..522288177bbd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8667,8 +8667,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, const struct net_device_ops *ops = dev->netdev_ops; enum bpf_netdev_command query; u32 prog_id, expected_id = 0; - struct bpf_prog *prog = NULL; bpf_op_t bpf_op, bpf_chk; + struct bpf_prog *prog; bool offload; int err; @@ -8734,6 +8734,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, } else { if (!prog_id) return 0; + prog = NULL; } err = dev_xdp_install(dev, bpf_op, extack, flags, prog); From 89f33dcadb349eb926a92633e2c5f61466afc596 Mon Sep 17 00:00:00 2001 From: Zou Wei Date: Mon, 13 Apr 2020 19:57:56 +0800 Subject: [PATCH 192/331] bpf: remove unneeded conversion to bool in __mark_reg_unknown This issue was detected by using the Coccinelle software: kernel/bpf/verifier.c:1259:16-21: WARNING: conversion to bool not needed here The conversion to bool is unneeded, remove it. Reported-by: Hulk Robot Signed-off-by: Zou Wei Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/1586779076-101346-1-git-send-email-zou_wei@huawei.com --- kernel/bpf/verifier.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 04c6630cc18f..38cfcf701eeb 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1255,8 +1255,7 @@ static void __mark_reg_unknown(const struct bpf_verifier_env *env, reg->type = SCALAR_VALUE; reg->var_off = tnum_unknown; reg->frameno = 0; - reg->precise = env->subprog_cnt > 1 || !env->allow_ptr_leaks ? - true : false; + reg->precise = env->subprog_cnt > 1 || !env->allow_ptr_leaks; __mark_reg_unbounded(reg); } From 7d6243aa599cfb8786a236c4f7a2ffbc6c119180 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 27 Mar 2020 10:18:23 -0300 Subject: [PATCH 193/331] dt-bindings: iio: dac: ad5770r: Add vendor to compatible string The compatible string in the example misses the vendor information. Pass the "adi" vendor to fix it. Signed-off-by: Fabio Estevam Acked-by: Rob Herring Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml index d9c25cf4b92f..f937040477ec 100644 --- a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml @@ -144,7 +144,7 @@ examples: #size-cells = <0>; ad5770r@0 { - compatible = "ad5770r"; + compatible = "adi,ad5770r"; reg = <0>; spi-max-frequency = <1000000>; vref-supply = <&vref>; From bc4be5517e9055686b5459f86b7fdc99edbcb72b Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 27 Mar 2020 10:18:25 -0300 Subject: [PATCH 194/331] dt-bindings: iio: dac: ad5770r: Fix the file path The following warning is seen with 'make dt_binding_check': Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml: $id: relative path/filename doesn't match actual path or filename Fix it by removing the "bindings" directory from the file path. Signed-off-by: Fabio Estevam Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml index f937040477ec..3b1a85236dd9 100644 --- a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml @@ -2,7 +2,7 @@ # Copyright 2020 Analog Devices Inc. %YAML 1.2 --- -$id: http://devicetree.org/schemas/bindings/iio/dac/adi,ad5770r.yaml# +$id: http://devicetree.org/schemas/iio/dac/adi,ad5770r.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# title: Analog Devices AD5770R DAC device driver From c6be88ad207bde37763e12e5b2dae1a15fa75b2b Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 27 Mar 2020 16:22:40 -0300 Subject: [PATCH 195/331] dt-bindings: touchscreen: edt-ft5x06: Remove unneeded I2C unit name The following warnings are seen with 'make dt_binding_check': Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.example.dts:19.22-30.11: Warning (unit_address_vs_reg): /example-0/i2c@00000000: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.example.dts:19.22-30.11: Warning (unit_address_format): /example-0/i2c@00000000: unit name should not have leading 0s Fix it by removing the unneeded i2c unit name. Signed-off-by: Fabio Estevam Signed-off-by: Rob Herring --- .../devicetree/bindings/input/touchscreen/edt-ft5x06.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.yaml b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.yaml index 8d58709d4b47..383d64a91854 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.yaml +++ b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.yaml @@ -109,7 +109,7 @@ examples: - | #include #include - i2c@00000000 { + i2c { #address-cells = <1>; #size-cells = <0>; edt-ft5x06@38 { From ec76f57d62669a396e6f519cdf99e3522246e3f5 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Sat, 28 Mar 2020 15:53:26 -0300 Subject: [PATCH 196/331] dt-bindings: clock: syscon-icst: Remove unneeded unit name The following warnings are seen with 'make dt_binding_check': Documentation/devicetree/bindings/clock/arm,syscon-icst.example.dts:17.16-24.11: Warning (unit_address_vs_reg): /example-0/clock@00: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/clock/arm,syscon-icst.example.dts:17.16-24.11: Warning (unit_address_format): /example-0/clock@00: unit name should not have leading 0s Fix them by removing the unneeded clock unit name. Signed-off-by: Fabio Estevam Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml b/Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml index de9a465096db..444aeea27db8 100644 --- a/Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml +++ b/Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml @@ -91,7 +91,7 @@ required: examples: - | - vco1: clock@00 { + vco1: clock { compatible = "arm,impd1-vco1"; #clock-cells = <0>; lock-offset = <0x08>; From 213d0e4c4e84a01d48ab368dc7dfc3e36dd04ac7 Mon Sep 17 00:00:00 2001 From: Matti Vaittinen Date: Mon, 6 Apr 2020 10:30:08 +0300 Subject: [PATCH 197/331] dt-bindings: BD718x7 - add missing I2C bus properties The DT example needs #address-cells and #size-cells for I2C bus or validity checker will generate warnings. Add these properties in BD71837 and BD71847 binding examples. Signed-off-by: Matti Vaittinen Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.yaml | 4 +++- Documentation/devicetree/bindings/mfd/rohm,bd71847-pmic.yaml | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.yaml b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.yaml index aa922c560fcc..65018a019e1d 100644 --- a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.yaml +++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.yaml @@ -123,7 +123,9 @@ examples: #include i2c { - pmic: pmic@4b { + #address-cells = <1>; + #size-cells = <0>; + pmic: pmic@4b { compatible = "rohm,bd71837"; reg = <0x4b>; interrupt-parent = <&gpio1>; diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71847-pmic.yaml b/Documentation/devicetree/bindings/mfd/rohm,bd71847-pmic.yaml index 402e40dfe0b8..77bcca2d414f 100644 --- a/Documentation/devicetree/bindings/mfd/rohm,bd71847-pmic.yaml +++ b/Documentation/devicetree/bindings/mfd/rohm,bd71847-pmic.yaml @@ -128,7 +128,9 @@ examples: #include i2c { - pmic: pmic@4b { + #address-cells = <1>; + #size-cells = <0>; + pmic: pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; interrupt-parent = <&gpio1>; From f88d59fc2dd69377d0d8063bba1dede40f238a25 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Thu, 9 Apr 2020 12:05:24 -0600 Subject: [PATCH 198/331] dt-bindings: Fix dtc warnings on reg and ranges in examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A recent update to dtc and changes to the default warnings introduced some new warnings in the DT binding examples: Documentation/devicetree/bindings/arm/sunxi/allwinner,sun4i-a10-mbus.example.dts:23.13-61: Warning (dma_ranges_format): /example-0/dram-controller@1c01000:dma-ranges: "dma-ranges" property has invalid length (12 bytes) (parent #address-cells == 1, child #address-cells == 2, #size-cells == 1) Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.example.dts:17.22-28.11: Warning (unit_address_vs_reg): /example-0/fpga-axi@0: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/memory-controllers/nvidia,tegra186-mc.example.dts:34.13-54: Warning (dma_ranges_format): /example-0/memory-controller@2c00000:dma-ranges: "dma-ranges" property has invalid length (24 bytes) (parent #address-cells == 1, child #address-cells == 2, #size-cells == 2) Documentation/devicetree/bindings/mfd/st,stpmic1.example.dts:19.15-79.11: Warning (unit_address_vs_reg): /example-0/i2c@0: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.example.dts:28.23-31.15: Warning (unit_address_vs_reg): /example-0/mdio@37000000/switch@10: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/rng/brcm,bcm2835.example.dts:17.5-21.11: Warning (unit_address_vs_reg): /example-0/rng: node has a reg or ranges property, but no unit name Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.example.dts:20.20-43.11: Warning (unit_address_vs_reg): /example-0/soc@0: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/usb/ingenic,musb.example.dts:18.28-21.11: Warning (unit_address_vs_reg): /example-0/usb-phy@0: node has a unit name, but no reg or ranges property Cc: Maxime Ripard Cc: Chen-Yu Tsai Cc: "Nuno Sá" Cc: Jean Delvare Cc: Thierry Reding Cc: Jonathan Hunter Cc: Lee Jones Cc: "David S. Miller" Cc: Matt Mackall Cc: Herbert Xu Cc: Nicolas Saenz Julienne Cc: Florian Fainelli Cc: Ray Jui Cc: Scott Branden Cc: bcm-kernel-feedback-list@broadcom.com Cc: Mark Brown Cc: linux-hwmon@vger.kernel.org Cc: linux-tegra@vger.kernel.org Cc: linux-arm-msm@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: linux-rpi-kernel@lists.infradead.org Cc: linux-spi@vger.kernel.org Cc: linux-usb@vger.kernel.org Acked-by: Guenter Roeck Reviewed-by: Bjorn Andersson Signed-off-by: Rob Herring --- .../arm/sunxi/allwinner,sun4i-a10-mbus.yaml | 6 +++ .../bindings/hwmon/adi,axi-fan-control.yaml | 2 +- .../nvidia,tegra186-mc.yaml | 41 +++++++++++-------- .../devicetree/bindings/mfd/st,stpmic1.yaml | 2 +- .../bindings/net/qcom,ipq8064-mdio.yaml | 1 + .../devicetree/bindings/rng/brcm,bcm2835.yaml | 2 +- .../bindings/spi/qcom,spi-qcom-qspi.yaml | 2 +- .../devicetree/bindings/usb/ingenic,musb.yaml | 2 +- 8 files changed, 35 insertions(+), 23 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/sunxi/allwinner,sun4i-a10-mbus.yaml b/Documentation/devicetree/bindings/arm/sunxi/allwinner,sun4i-a10-mbus.yaml index aa0738b4d534..e713a6fe4cf7 100644 --- a/Documentation/devicetree/bindings/arm/sunxi/allwinner,sun4i-a10-mbus.yaml +++ b/Documentation/devicetree/bindings/arm/sunxi/allwinner,sun4i-a10-mbus.yaml @@ -42,6 +42,10 @@ properties: description: See section 2.3.9 of the DeviceTree Specification. + '#address-cells': true + + '#size-cells': true + required: - "#interconnect-cells" - compatible @@ -59,6 +63,8 @@ examples: compatible = "allwinner,sun5i-a13-mbus"; reg = <0x01c01000 0x1000>; clocks = <&ccu CLK_MBUS>; + #address-cells = <1>; + #size-cells = <1>; dma-ranges = <0x00000000 0x40000000 0x20000000>; #interconnect-cells = <1>; }; diff --git a/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml b/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml index 57a240d2d026..29bb2c778c59 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml @@ -47,7 +47,7 @@ required: examples: - | - fpga_axi: fpga-axi@0 { + fpga_axi: fpga-axi { #address-cells = <0x2>; #size-cells = <0x1>; diff --git a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra186-mc.yaml b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra186-mc.yaml index 12516bd89cf9..611bda38d187 100644 --- a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra186-mc.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra186-mc.yaml @@ -97,30 +97,35 @@ examples: #include #include - memory-controller@2c00000 { - compatible = "nvidia,tegra186-mc"; - reg = <0x0 0x02c00000 0x0 0xb0000>; - interrupts = ; - + bus { #address-cells = <2>; #size-cells = <2>; - ranges = <0x0 0x02c00000 0x02c00000 0x0 0xb0000>; + memory-controller@2c00000 { + compatible = "nvidia,tegra186-mc"; + reg = <0x0 0x02c00000 0x0 0xb0000>; + interrupts = ; - /* - * Memory clients have access to all 40 bits that the memory - * controller can address. - */ - dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x0>; + #address-cells = <2>; + #size-cells = <2>; - external-memory-controller@2c60000 { - compatible = "nvidia,tegra186-emc"; - reg = <0x0 0x02c60000 0x0 0x50000>; - interrupts = ; - clocks = <&bpmp TEGRA186_CLK_EMC>; - clock-names = "emc"; + ranges = <0x0 0x02c00000 0x0 0x02c00000 0x0 0xb0000>; - nvidia,bpmp = <&bpmp>; + /* + * Memory clients have access to all 40 bits that the memory + * controller can address. + */ + dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x0>; + + external-memory-controller@2c60000 { + compatible = "nvidia,tegra186-emc"; + reg = <0x0 0x02c60000 0x0 0x50000>; + interrupts = ; + clocks = <&bpmp TEGRA186_CLK_EMC>; + clock-names = "emc"; + + nvidia,bpmp = <&bpmp>; + }; }; }; diff --git a/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml b/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml index d9ad9260e348..f88d13d70441 100644 --- a/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml +++ b/Documentation/devicetree/bindings/mfd/st,stpmic1.yaml @@ -274,7 +274,7 @@ examples: - | #include #include - i2c@0 { + i2c { #address-cells = <1>; #size-cells = <0>; pmic@33 { diff --git a/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml b/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml index b9f90081046f..67df3fe861ee 100644 --- a/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml @@ -48,6 +48,7 @@ examples: switch@10 { compatible = "qca,qca8337"; + reg = <0x10>; /* ... */ }; }; diff --git a/Documentation/devicetree/bindings/rng/brcm,bcm2835.yaml b/Documentation/devicetree/bindings/rng/brcm,bcm2835.yaml index 89ab67f20a7f..c147900f9041 100644 --- a/Documentation/devicetree/bindings/rng/brcm,bcm2835.yaml +++ b/Documentation/devicetree/bindings/rng/brcm,bcm2835.yaml @@ -39,7 +39,7 @@ additionalProperties: false examples: - | - rng { + rng@7e104000 { compatible = "brcm,bcm2835-rng"; reg = <0x7e104000 0x10>; interrupts = <2 29>; diff --git a/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.yaml b/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.yaml index 0cf470eaf2a0..5c16cf59ca00 100644 --- a/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.yaml +++ b/Documentation/devicetree/bindings/spi/qcom,spi-qcom-qspi.yaml @@ -61,7 +61,7 @@ examples: #include #include - soc: soc@0 { + soc: soc { #address-cells = <2>; #size-cells = <2>; diff --git a/Documentation/devicetree/bindings/usb/ingenic,musb.yaml b/Documentation/devicetree/bindings/usb/ingenic,musb.yaml index 1d6877875077..c2d2ee43ba67 100644 --- a/Documentation/devicetree/bindings/usb/ingenic,musb.yaml +++ b/Documentation/devicetree/bindings/usb/ingenic,musb.yaml @@ -56,7 +56,7 @@ additionalProperties: false examples: - | #include - usb_phy: usb-phy@0 { + usb_phy: usb-phy { compatible = "usb-nop-xceiv"; #phy-cells = <0>; }; From ce81bd6977c8d58b90c599bf34be9705af4bd32b Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Thu, 9 Apr 2020 12:20:09 -0600 Subject: [PATCH 199/331] dt-bindings: hwmon: Fix incorrect $id paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix the path warnings in the adi,axi-fan-control and adt7475 bindings: Documentation/devicetree/bindings/hwmon/adt7475.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/hwmon/adt7475.yaml# Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/hwmon/adi,axi-fan-control.yaml# Cc: Jean Delvare Cc: linux-hwmon@vger.kernel.org Acked-by: Guenter Roeck Acked-by: Nuno Sá Signed-off-by: Rob Herring --- .../devicetree/bindings/hwmon/adi,axi-fan-control.yaml | 2 +- Documentation/devicetree/bindings/hwmon/adt7475.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml b/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml index 29bb2c778c59..7db78767c02d 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,axi-fan-control.yaml @@ -2,7 +2,7 @@ # Copyright 2019 Analog Devices Inc. %YAML 1.2 --- -$id: http://devicetree.org/schemas/bindings/hwmon/adi,axi-fan-control.yaml# +$id: http://devicetree.org/schemas/hwmon/adi,axi-fan-control.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# title: Analog Devices AXI FAN Control Device Tree Bindings diff --git a/Documentation/devicetree/bindings/hwmon/adt7475.yaml b/Documentation/devicetree/bindings/hwmon/adt7475.yaml index 76985034ea73..46c441574f98 100644 --- a/Documentation/devicetree/bindings/hwmon/adt7475.yaml +++ b/Documentation/devicetree/bindings/hwmon/adt7475.yaml @@ -1,7 +1,7 @@ # SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) %YAML 1.2 --- -$id: http://devicetree.org/schemas/adt7475.yaml# +$id: http://devicetree.org/schemas/hwmon/adt7475.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# title: ADT7475 hwmon sensor From 7801eba8e5b2e94979ca4a3668ec8d46eca6e223 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Thu, 9 Apr 2020 12:27:32 -0600 Subject: [PATCH 200/331] dt-bindings: interrupt-controller: Fix loongson,parent_int_map property schema 'loongson,parent_int_map' is an array, but the schema is defining a matrix resulting in the follow warnings: Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.example.dt.yaml: interrupt-controller@3ff01400: loongson,parent_int_map:0: [4043309055] is too short Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.example.dt.yaml: interrupt-controller@3ff01400: loongson,parent_int_map:1: [251658240] is too short Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.example.dt.yaml: interrupt-controller@3ff01400: loongson,parent_int_map:2: [0] is too short Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.example.dt.yaml: interrupt-controller@3ff01400: loongson,parent_int_map:3: [0] is too short The correct way to define an array is a list in 'items' and/or a size defined by 'minItems' and 'maxItems'. Cc: Thomas Gleixner Cc: Jason Cooper Cc: Marc Zyngier Signed-off-by: Rob Herring --- .../bindings/interrupt-controller/loongson,liointc.yaml | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.yaml index 9c6b91fee477..26f1fcf0857a 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.yaml +++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,liointc.yaml @@ -56,9 +56,8 @@ properties: cell with zero. allOf: - $ref: /schemas/types.yaml#/definitions/uint32-array - - items: - minItems: 4 - maxItems: 4 + - minItems: 4 + maxItems: 4 required: From b8a1707f177a18142f1340c4dad847446e299f5d Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 14 Apr 2020 18:48:34 +0200 Subject: [PATCH 201/331] docs: dt: fix broken reference to phy-cadence-torrent.yaml This file was removed, and another file was added instead of it, on two separate commits. Splitting a single logical change (doc conversion) on two patches is a bad thing, as it makes harder to discover what crap happened. Anyway, this patch fixes the broken reference, making it pointing to the new location of the file. Fixes: 922003733d42 ("dt-bindings: phy: Remove Cadence MHDP PHY dt binding") Fixes: c6d8eef38b7f ("dt-bindings: phy: Add Cadence MHDP PHY bindings in YAML format.") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml b/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml index fd1982c56104..3f913d6d1c3d 100644 --- a/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml +++ b/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml @@ -146,7 +146,7 @@ patternProperties: bindings specified in Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt Torrent SERDES should follow the bindings specified in - Documentation/devicetree/bindings/phy/phy-cadence-dp.txt + Documentation/devicetree/bindings/phy/phy-cadence-torrent.yaml required: - compatible From 0c134f528a72754bf007d605c2a5c8cdbc448fe8 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 14 Apr 2020 18:48:49 +0200 Subject: [PATCH 202/331] docs: dt: qcom,dwc3.txt: fix cross-reference for a converted file The qcom-qusb2-phy.txt file was converted and renamed to yaml. Update cross-reference accordingly. Fixes: 8ce65d8d38df ("dt-bindings: phy: qcom,qusb2: Convert QUSB2 phy bindings to yaml") Reviewed-by: Stephen Boyd Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/usb/qcom,dwc3.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/usb/qcom,dwc3.txt b/Documentation/devicetree/bindings/usb/qcom,dwc3.txt index cb695aa3fba4..fbdd01756752 100644 --- a/Documentation/devicetree/bindings/usb/qcom,dwc3.txt +++ b/Documentation/devicetree/bindings/usb/qcom,dwc3.txt @@ -52,8 +52,8 @@ A child node must exist to represent the core DWC3 IP block. The name of the node is not important. The content of the node is defined in dwc3.txt. Phy documentation is provided in the following places: -Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt - USB3 QMP PHY -Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt - USB2 QUSB2 PHY +Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt - USB3 QMP PHY +Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml - USB2 QUSB2 PHY Example device nodes: From 27b128b30c58d847c11d3d2e9b69fe7cb61170ce Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 14 Apr 2020 18:48:50 +0200 Subject: [PATCH 203/331] docs: dt: fix a broken reference for a file converted to json Changeset 32ced09d7903 ("dt-bindings: serial: Convert slave-device bindings to json-schema") moved a binding to json and updated the links. Yet, one link was not changed, due to a merge conflict. Update this one too. Fixes: 32ced09d7903 ("dt-bindings: serial: Convert slave-device bindings to json-schema") Reviewed-by: Geert Uytterhoeven Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt index beca6466d59a..d2202791c1d4 100644 --- a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt @@ -29,7 +29,7 @@ Required properties for compatible string qcom,wcn399x-bt: Optional properties for compatible string qcom,wcn399x-bt: - - max-speed: see Documentation/devicetree/bindings/serial/slave-device.txt + - max-speed: see Documentation/devicetree/bindings/serial/serial.yaml - firmware-name: specify the name of nvm firmware to load - clocks: clock provided to the controller From 5fd274ed3c8551ebe38c3e9d2ded853a5c289499 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 14 Apr 2020 18:48:54 +0200 Subject: [PATCH 204/331] docs: dt: rockchip,dwc3.txt: fix a pointer to a renamed file phy-rockchip-inno-usb2.txt was converted to yaml. Fix the corresponding reference. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Rob Herring --- Documentation/devicetree/bindings/usb/rockchip,dwc3.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt index c8c4b00ecb94..94520493233b 100644 --- a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt +++ b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt @@ -16,7 +16,7 @@ A child node must exist to represent the core DWC3 IP block. The name of the node is not important. The content of the node is defined in dwc3.txt. Phy documentation is provided in the following places: -Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt - USB2.0 PHY +Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.yaml - USB2.0 PHY Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt - Type-C PHY Example device nodes: From 68dac3eb50be32957ae6e1e6da9281a3b7c6658b Mon Sep 17 00:00:00 2001 From: Atsushi Nemoto Date: Fri, 10 Apr 2020 12:16:16 +0900 Subject: [PATCH 205/331] net: phy: micrel: use genphy_read_status for KSZ9131 KSZ9131 will not work with some switches due to workaround for KSZ9031 introduced in commit d2fd719bcb0e83cb39cfee22ee800f98a56eceb3 ("net/phy: micrel: Add workaround for bad autoneg"). Use genphy_read_status instead of dedicated ksz9031_read_status. Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver") Signed-off-by: Atsushi Nemoto Signed-off-by: David S. Miller --- drivers/net/phy/micrel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 05d20343b816..3a4d83fa52dc 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1204,7 +1204,7 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz9021_type, .probe = kszphy_probe, .config_init = ksz9131_config_init, - .read_status = ksz9031_read_status, + .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, .get_sset_count = kszphy_get_sset_count, From 0e631eee17dcea576ab922fa70e4fdbd596ee452 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 13 Apr 2020 13:57:14 +0100 Subject: [PATCH 206/331] rxrpc: Fix DATA Tx to disable nofrag for UDP on AF_INET6 socket Fix the DATA packet transmission to disable nofrag for UDPv4 on an AF_INET6 socket as well as UDPv6 when trying to transmit fragmentably. Without this, packets filled to the normal size used by the kernel AFS client of 1412 bytes be rejected by udp_sendmsg() with EMSGSIZE immediately. The ->sk_error_report() notification hook is called, but rxrpc doesn't generate a trace for it. This is a temporary fix; a more permanent solution needs to involve changing the size of the packets being filled in accordance with the MTU, which isn't currently done in AF_RXRPC. The reason for not doing so was that, barring the last packet in an rx jumbo packet, jumbos can only be assembled out of 1412-byte packets - and the plan was to construct jumbos on the fly at transmission time. Also, there's no point turning on IPV6_MTU_DISCOVER, since IPv6 has to engage in this anyway since fragmentation is only done by the sender. We can then condense the switch-statement in rxrpc_send_data_packet(). Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support") Signed-off-by: David Howells Signed-off-by: David S. Miller --- net/rxrpc/local_object.c | 9 --------- net/rxrpc/output.c | 42 +++++++++++----------------------------- 2 files changed, 11 insertions(+), 40 deletions(-) diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index a6c1349e965d..01135e54d95d 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -165,15 +165,6 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) goto error; } - /* we want to set the don't fragment bit */ - opt = IPV6_PMTUDISC_DO; - ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_MTU_DISCOVER, - (char *) &opt, sizeof(opt)); - if (ret < 0) { - _debug("setsockopt failed"); - goto error; - } - /* Fall through and set IPv4 options too otherwise we don't get * errors from IPv4 packets sent through the IPv6 socket. */ diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index bad3d2420344..90e263c6aa69 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -474,42 +474,22 @@ send_fragmentable: skb->tstamp = ktime_get_real(); switch (conn->params.local->srx.transport.family) { + case AF_INET6: case AF_INET: opt = IP_PMTUDISC_DONT; - ret = kernel_setsockopt(conn->params.local->socket, - SOL_IP, IP_MTU_DISCOVER, - (char *)&opt, sizeof(opt)); - if (ret == 0) { - ret = kernel_sendmsg(conn->params.local->socket, &msg, - iov, 2, len); - conn->params.peer->last_tx_at = ktime_get_seconds(); + kernel_setsockopt(conn->params.local->socket, + SOL_IP, IP_MTU_DISCOVER, + (char *)&opt, sizeof(opt)); + ret = kernel_sendmsg(conn->params.local->socket, &msg, + iov, 2, len); + conn->params.peer->last_tx_at = ktime_get_seconds(); - opt = IP_PMTUDISC_DO; - kernel_setsockopt(conn->params.local->socket, SOL_IP, - IP_MTU_DISCOVER, - (char *)&opt, sizeof(opt)); - } + opt = IP_PMTUDISC_DO; + kernel_setsockopt(conn->params.local->socket, + SOL_IP, IP_MTU_DISCOVER, + (char *)&opt, sizeof(opt)); break; -#ifdef CONFIG_AF_RXRPC_IPV6 - case AF_INET6: - opt = IPV6_PMTUDISC_DONT; - ret = kernel_setsockopt(conn->params.local->socket, - SOL_IPV6, IPV6_MTU_DISCOVER, - (char *)&opt, sizeof(opt)); - if (ret == 0) { - ret = kernel_sendmsg(conn->params.local->socket, &msg, - iov, 2, len); - conn->params.peer->last_tx_at = ktime_get_seconds(); - - opt = IPV6_PMTUDISC_DO; - kernel_setsockopt(conn->params.local->socket, - SOL_IPV6, IPV6_MTU_DISCOVER, - (char *)&opt, sizeof(opt)); - } - break; -#endif - default: BUG(); } From 555cd19d0c6a23b3faef949bccca4822cccc2eb7 Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Mon, 13 Apr 2020 10:33:10 -0700 Subject: [PATCH 207/331] ionic: add dynamic_debug header Add the appropriate header for using dynamic_hex_dump(), which seems to be incidentally included in some configurations but not all. Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade") Reported-by: Randy Dunlap Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller --- drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c index f3c7dd1596ee..27b7eca19784 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c @@ -2,6 +2,7 @@ /* Copyright(c) 2017 - 2019 Pensando Systems, Inc */ #include +#include #include #include "ionic.h" From 2c0df9f9eddbc87fa2ef8da86264995404d816b9 Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Mon, 13 Apr 2020 10:33:11 -0700 Subject: [PATCH 208/331] ionic: fix unused assignment Remove an unused initialized value. Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade") Reported-by: kbuild test robot Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller --- drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c index 27b7eca19784..80eeb7696e01 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c @@ -26,7 +26,7 @@ void ionic_rx_filter_replay(struct ionic_lif *lif) struct hlist_head *head; struct hlist_node *tmp; unsigned int i; - int err = 0; + int err; ac = &ctx.cmd.rx_filter_add; From 34b5e6a33c1a8e466c3a73fd437f66fb16cb83ea Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 14 Apr 2020 02:34:38 +0200 Subject: [PATCH 209/331] net: dsa: mv88e6xxx: Configure MAC when using fixed link The 88e6185 is reporting it has detected a PHY, when a port is connected to an SFP. As a result, the fixed-phy configuration is not being applied. That then breaks packet transfer, since the port is reported as being down. Add additional conditions to check the interface mode, and if it is fixed always configure the port on link up/down, independent of the PPU status. Fixes: 30c4a5b0aad8 ("net: mv88e6xxx: use resolved link config in mac_link_up()") Signed-off-by: Andrew Lunn Signed-off-by: David S. Miller --- drivers/net/dsa/mv88e6xxx/chip.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 221593261e8f..dd8a5666a584 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -709,7 +709,8 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port, ops = chip->info->ops; mv88e6xxx_reg_lock(chip); - if (!mv88e6xxx_port_ppu_updates(chip, port) && ops->port_set_link) + if ((!mv88e6xxx_port_ppu_updates(chip, port) || + mode == MLO_AN_FIXED) && ops->port_set_link) err = ops->port_set_link(chip, port, LINK_FORCED_DOWN); mv88e6xxx_reg_unlock(chip); @@ -731,7 +732,7 @@ static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port, ops = chip->info->ops; mv88e6xxx_reg_lock(chip); - if (!mv88e6xxx_port_ppu_updates(chip, port)) { + if (!mv88e6xxx_port_ppu_updates(chip, port) || mode == MLO_AN_FIXED) { /* FIXME: for an automedia port, should we force the link * down here - what if the link comes up due to "other" media * while we're bringing the port up, how is the exclusivity From 3be98b2d5fbca3da7c4df0477eed95bfb5b83d64 Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 14 Apr 2020 02:34:39 +0200 Subject: [PATCH 210/331] net: dsa: Down cpu/dsa ports phylink will control DSA and CPU ports can be configured in two ways. By default, the driver should configure such ports to there maximum bandwidth. For most use cases, this is sufficient. When this default is insufficient, a phylink instance can be bound to such ports, and phylink will configure the port, e.g. based on fixed-link properties. phylink assumes the port is initially down. Given that the driver should have already configured it to its maximum speed, ask the driver to down the port before instantiating the phylink instance. Fixes: 30c4a5b0aad8 ("net: mv88e6xxx: use resolved link config in mac_link_up()") Signed-off-by: Andrew Lunn Signed-off-by: David S. Miller --- net/dsa/port.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/dsa/port.c b/net/dsa/port.c index 231b2d494f1c..a58fdd362574 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -670,11 +670,16 @@ int dsa_port_link_register_of(struct dsa_port *dp) { struct dsa_switch *ds = dp->ds; struct device_node *phy_np; + int port = dp->index; if (!ds->ops->adjust_link) { phy_np = of_parse_phandle(dp->dn, "phy-handle", 0); - if (of_phy_is_fixed_link(dp->dn) || phy_np) + if (of_phy_is_fixed_link(dp->dn) || phy_np) { + if (ds->ops->phylink_mac_link_down) + ds->ops->phylink_mac_link_down(ds, port, + MLO_AN_FIXED, PHY_INTERFACE_MODE_NA); return dsa_port_phylink_register(dp); + } return 0; } From a7a0d6269652846671312b29992143f56e2866b8 Mon Sep 17 00:00:00 2001 From: Atsushi Nemoto Date: Tue, 14 Apr 2020 10:12:34 +0900 Subject: [PATCH 211/331] net: stmmac: socfpga: Allow all RGMII modes Allow all the RGMII modes to be used. (Not only "rgmii", "rgmii-id" but "rgmii-txid", "rgmii-rxid") Signed-off-by: Atsushi Nemoto Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c index e0212d2fc2a1..fa32cd5b418e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c @@ -241,6 +241,8 @@ static int socfpga_set_phy_mode_common(int phymode, u32 *val) switch (phymode) { case PHY_INTERFACE_MODE_RGMII: case PHY_INTERFACE_MODE_RGMII_ID: + case PHY_INTERFACE_MODE_RGMII_RXID: + case PHY_INTERFACE_MODE_RGMII_TXID: *val = SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII; break; case PHY_INTERFACE_MODE_MII: From c799fca8baf18d1bbbbad6c3b736eefbde8bdb90 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 14 Apr 2020 12:27:08 -0300 Subject: [PATCH 212/331] net/cxgb4: Check the return from t4_query_params properly MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Positive return values are also failures that don't set val, although this probably can't happen. Fixes gcc 10 warning: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function ‘t4_phy_fw_ver’: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:3747:14: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized] 3747 | *phy_fw_ver = val; Fixes: 01b6961410b7 ("cxgb4: Add PHY firmware support for T420-BT cards") Signed-off-by: Jason Gunthorpe Signed-off-by: David S. Miller --- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 239f678a94ed..2a3480fc1d91 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -3742,7 +3742,7 @@ int t4_phy_fw_ver(struct adapter *adap, int *phy_fw_ver) FW_PARAMS_PARAM_Z_V(FW_PARAMS_PARAM_DEV_PHYFW_VERSION)); ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 1, ¶m, &val); - if (ret < 0) + if (ret) return ret; *phy_fw_ver = val; return 0; From dd649b4ff0127559950965d739cc63efae50ecd9 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 14 Apr 2020 20:49:03 +0100 Subject: [PATCH 213/331] net: marvell10g: report firmware version Report the firmware version when probing the PHY to allow issues attributable to firmware to be diagnosed. Tested-by: Matteo Croce Signed-off-by: Russell King Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/marvell10g.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c index 7621badae64d..748532d9e1ae 100644 --- a/drivers/net/phy/marvell10g.c +++ b/drivers/net/phy/marvell10g.c @@ -33,6 +33,8 @@ #define MV_PHY_ALASKA_NBT_QUIRK_REV (MARVELL_PHY_ID_88X3310 | 0xa) enum { + MV_PMA_FW_VER0 = 0xc011, + MV_PMA_FW_VER1 = 0xc012, MV_PMA_BOOT = 0xc050, MV_PMA_BOOT_FATAL = BIT(0), @@ -83,6 +85,8 @@ enum { }; struct mv3310_priv { + u32 firmware_ver; + struct device *hwmon_dev; char *hwmon_name; }; @@ -355,6 +359,22 @@ static int mv3310_probe(struct phy_device *phydev) dev_set_drvdata(&phydev->mdio.dev, priv); + ret = phy_read_mmd(phydev, MDIO_MMD_PMAPMD, MV_PMA_FW_VER0); + if (ret < 0) + return ret; + + priv->firmware_ver = ret << 16; + + ret = phy_read_mmd(phydev, MDIO_MMD_PMAPMD, MV_PMA_FW_VER1); + if (ret < 0) + return ret; + + priv->firmware_ver |= ret; + + phydev_info(phydev, "Firmware version %u.%u.%u.%u\n", + priv->firmware_ver >> 24, (priv->firmware_ver >> 16) & 255, + (priv->firmware_ver >> 8) & 255, priv->firmware_ver & 255); + /* Powering down the port when not in use saves about 600mW */ ret = mv3310_power_down(phydev); if (ret) From 8f48c2ac85eda8d8a01c83c6d73f891c43ef182d Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 14 Apr 2020 20:49:08 +0100 Subject: [PATCH 214/331] net: marvell10g: soft-reset the PHY when coming out of low power Soft-reset the PHY when coming out of low power mode, which seems to be necessary with firmware versions 0.3.3.0 and 0.3.10.0. This depends on ("net: marvell10g: report firmware version") Fixes: c9cc1c815d36 ("net: phy: marvell10g: place in powersave mode at probe") Reported-by: Matteo Croce Tested-by: Matteo Croce Reviewed-by: Andrew Lunn Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/marvell10g.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c index 748532d9e1ae..95e3f4644aeb 100644 --- a/drivers/net/phy/marvell10g.c +++ b/drivers/net/phy/marvell10g.c @@ -75,7 +75,8 @@ enum { /* Vendor2 MMD registers */ MV_V2_PORT_CTRL = 0xf001, - MV_V2_PORT_CTRL_PWRDOWN = 0x0800, + MV_V2_PORT_CTRL_SWRST = BIT(15), + MV_V2_PORT_CTRL_PWRDOWN = BIT(11), MV_V2_TEMP_CTRL = 0xf08a, MV_V2_TEMP_CTRL_MASK = 0xc000, MV_V2_TEMP_CTRL_SAMPLE = 0x0000, @@ -239,8 +240,17 @@ static int mv3310_power_down(struct phy_device *phydev) static int mv3310_power_up(struct phy_device *phydev) { - return phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL, - MV_V2_PORT_CTRL_PWRDOWN); + struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev); + int ret; + + ret = phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL, + MV_V2_PORT_CTRL_PWRDOWN); + + if (priv->firmware_ver < 0x00030000) + return ret; + + return phy_set_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL, + MV_V2_PORT_CTRL_SWRST); } static int mv3310_reset(struct phy_device *phydev, u32 unit) From 22cad1585c6bc6caf2688701004cf2af6865cbe0 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Wed, 15 Apr 2020 00:39:48 +0300 Subject: [PATCH 215/331] io_uring: fix cached_sq_head in io_timeout() io_timeout() can be executed asynchronously by a worker and without holding ctx->uring_lock 1. using ctx->cached_sq_head there is racy there 2. it should count events from a moment of timeout's submission, but not execution Use req->sequence. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 32cbace58256..9325ac618cf0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4714,6 +4714,7 @@ static int io_timeout(struct io_kiocb *req) struct io_timeout_data *data; struct list_head *entry; unsigned span = 0; + u32 seq = req->sequence; data = &req->io->timeout; @@ -4730,7 +4731,7 @@ static int io_timeout(struct io_kiocb *req) goto add; } - req->sequence = ctx->cached_sq_head + count - 1; + req->sequence = seq + count; data->seq_offset = count; /* @@ -4740,7 +4741,7 @@ static int io_timeout(struct io_kiocb *req) spin_lock_irq(&ctx->completion_lock); list_for_each_prev(entry, &ctx->timeout_list) { struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); - unsigned nxt_sq_head; + unsigned nxt_seq; long long tmp, tmp_nxt; u32 nxt_offset = nxt->io->timeout.seq_offset; @@ -4748,18 +4749,18 @@ static int io_timeout(struct io_kiocb *req) continue; /* - * Since cached_sq_head + count - 1 can overflow, use type long + * Since seq + count can overflow, use type long * long to store it. */ - tmp = (long long)ctx->cached_sq_head + count - 1; - nxt_sq_head = nxt->sequence - nxt_offset + 1; - tmp_nxt = (long long)nxt_sq_head + nxt_offset - 1; + tmp = (long long)seq + count; + nxt_seq = nxt->sequence - nxt_offset; + tmp_nxt = (long long)nxt_seq + nxt_offset; /* * cached_sq_head may overflow, and it will never overflow twice * once there is some timeout req still be valid. */ - if (ctx->cached_sq_head < nxt_sq_head) + if (seq < nxt_seq) tmp += UINT_MAX; if (tmp > tmp_nxt) From b55ce732004989c85bf9d858c03e6d477cf9023b Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Wed, 15 Apr 2020 00:39:49 +0300 Subject: [PATCH 216/331] io_uring: kill already cached timeout.seq_offset req->timeout.count and req->io->timeout.seq_offset store the same value, which is sqe->off. Kill the second one Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 9325ac618cf0..3fc33ba4855d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -357,7 +357,6 @@ struct io_timeout_data { struct hrtimer timer; struct timespec64 ts; enum hrtimer_mode mode; - u32 seq_offset; }; struct io_accept { @@ -385,7 +384,7 @@ struct io_timeout { struct file *file; u64 addr; int flags; - unsigned count; + u32 count; }; struct io_rw { @@ -4709,11 +4708,11 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe, static int io_timeout(struct io_kiocb *req) { - unsigned count; struct io_ring_ctx *ctx = req->ctx; struct io_timeout_data *data; struct list_head *entry; unsigned span = 0; + u32 count = req->timeout.count; u32 seq = req->sequence; data = &req->io->timeout; @@ -4723,7 +4722,6 @@ static int io_timeout(struct io_kiocb *req) * timeout event to be satisfied. If it isn't set, then this is * a pure timeout request, sequence isn't used. */ - count = req->timeout.count; if (!count) { req->flags |= REQ_F_TIMEOUT_NOSEQ; spin_lock_irq(&ctx->completion_lock); @@ -4732,7 +4730,6 @@ static int io_timeout(struct io_kiocb *req) } req->sequence = seq + count; - data->seq_offset = count; /* * Insertion sort, ensuring the first entry in the list is always @@ -4743,7 +4740,7 @@ static int io_timeout(struct io_kiocb *req) struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); unsigned nxt_seq; long long tmp, tmp_nxt; - u32 nxt_offset = nxt->io->timeout.seq_offset; + u32 nxt_offset = nxt->timeout.count; if (nxt->flags & REQ_F_TIMEOUT_NOSEQ) continue; From 31af27c7cc9f675d93a135dca99e6413f9096f1d Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Wed, 15 Apr 2020 00:39:50 +0300 Subject: [PATCH 217/331] io_uring: don't count rqs failed after current one When checking for draining with __req_need_defer(), it tries to match how many requests were sent before a current one with number of already completed. Dropped SQEs are included in req->sequence, and they won't ever appear in CQ. To compensate for that, __req_need_defer() substracts ctx->cached_sq_dropped. However, what it should really use is number of SQEs dropped __before__ the current one. In other words, any submitted request shouldn't shouldn't affect dequeueing from the drain queue of previously submitted ones. Instead of saving proper ctx->cached_sq_dropped in each request, substract from req->sequence it at initialisation, so it includes number of properly submitted requests. note: it also changes behaviour of timeouts, but 1. it's already diverge from the description because of using SQ 2. the description is ambiguous regarding dropped SQEs Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- fs/io_uring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3fc33ba4855d..381d50becd04 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -957,8 +957,8 @@ static inline bool __req_need_defer(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; - return req->sequence != ctx->cached_cq_tail + ctx->cached_sq_dropped - + atomic_read(&ctx->cached_cq_overflow); + return req->sequence != ctx->cached_cq_tail + + atomic_read(&ctx->cached_cq_overflow); } static inline bool req_need_defer(struct io_kiocb *req) @@ -5801,7 +5801,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, * it can be used to mark the position of the first IO in the * link list. */ - req->sequence = ctx->cached_sq_head; + req->sequence = ctx->cached_sq_head - ctx->cached_sq_dropped; req->opcode = READ_ONCE(sqe->opcode); req->user_data = READ_ONCE(sqe->user_data); req->io = NULL; From 0bbe7f719985efd9adb3454679ecef0984cb6800 Mon Sep 17 00:00:00 2001 From: Xiao Yang Date: Tue, 14 Apr 2020 09:51:45 +0800 Subject: [PATCH 218/331] tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger() or snapshot_count_trigger()) when register_snapshot_trigger() has completed registration but doesn't allocate buffer for 'snapshot' event trigger. In the rare case, 'snapshot' operation always detects the lack of allocated buffer so make register_snapshot_trigger() allocate buffer first. trigger-snapshot.tc in kselftest reproduces the issue on slow vm: ----------------------------------------------------------- cat trace ... ftracetest-3028 [002] .... 236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036 <...>-2875 [003] .... 240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED *** <...>-2875 [003] .... 240.460338: tracing_snapshot_instance_cond: *** stopping trace here! *** ----------------------------------------------------------- Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com Cc: stable@vger.kernel.org Fixes: 93e31ffbf417a ("tracing: Add 'snapshot' event trigger command") Signed-off-by: Xiao Yang Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_trigger.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index dd34a1b46a86..3a74736da363 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1088,14 +1088,10 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops, struct event_trigger_data *data, struct trace_event_file *file) { - int ret = register_trigger(glob, ops, data, file); + if (tracing_alloc_snapshot_instance(file->tr) != 0) + return 0; - if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) { - unregister_trigger(glob, ops, data, file); - ret = 0; - } - - return ret; + return register_trigger(glob, ops, data, file); } static int From 52e04b4ce5d03775b6a78f3ed1097480faacc9fd Mon Sep 17 00:00:00 2001 From: Sumit Garg Date: Tue, 7 Apr 2020 15:40:55 +0530 Subject: [PATCH 219/331] mac80211: fix race in ieee80211_register_hw() A race condition leading to a kernel crash is observed during invocation of ieee80211_register_hw() on a dragonboard410c device having wcn36xx driver built as a loadable module along with a wifi manager in user-space waiting for a wifi device (wlanX) to be active. Sequence diagram for a particular kernel crash scenario: user-space ieee80211_register_hw() ieee80211_tasklet_handler() ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | | | |<---phy0----wiphy_register() | |-----iwd if_add---->| | | |<---IRQ----(RX packet) | Kernel crash | | due to unallocated | | workqueue. | | | | | alloc_ordered_workqueue() | | | | | Misc wiphy init. | | | | | ieee80211_if_add() | | | | As evident from above sequence diagram, this race condition isn't specific to a particular wifi driver but rather the initialization sequence in ieee80211_register_hw() needs to be fixed. So re-order the initialization sequence and the updated sequence diagram would look like: user-space ieee80211_register_hw() ieee80211_tasklet_handler() ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | | | | alloc_ordered_workqueue() | | | | | Misc wiphy init. | | | | |<---phy0----wiphy_register() | |-----iwd if_add---->| | | |<---IRQ----(RX packet) | | | | ieee80211_if_add() | | | | Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg Link: https://lore.kernel.org/r/1586254255-28713-1-git-send-email-sumit.garg@linaro.org [Johannes: fix rtnl imbalances] Signed-off-by: Johannes Berg --- net/mac80211/main.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 8345926193de..0e9ad60fb2b3 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -1069,7 +1069,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) local->hw.wiphy->signal_type = CFG80211_SIGNAL_TYPE_UNSPEC; if (hw->max_signal <= 0) { result = -EINVAL; - goto fail_wiphy_register; + goto fail_workqueue; } } @@ -1135,7 +1135,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) result = ieee80211_init_cipher_suites(local); if (result < 0) - goto fail_wiphy_register; + goto fail_workqueue; if (!local->ops->remain_on_channel) local->hw.wiphy->max_remain_on_channel_duration = 5000; @@ -1161,10 +1161,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) local->hw.wiphy->max_num_csa_counters = IEEE80211_MAX_CSA_COUNTERS_NUM; - result = wiphy_register(local->hw.wiphy); - if (result < 0) - goto fail_wiphy_register; - /* * We use the number of queues for feature tests (QoS, HT) internally * so restrict them appropriately. @@ -1217,9 +1213,9 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) goto fail_flows; rtnl_lock(); - result = ieee80211_init_rate_ctrl_alg(local, hw->rate_control_algorithm); + rtnl_unlock(); if (result < 0) { wiphy_debug(local->hw.wiphy, "Failed to initialize rate control algorithm\n"); @@ -1273,6 +1269,12 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) local->sband_allocated |= BIT(band); } + result = wiphy_register(local->hw.wiphy); + if (result < 0) + goto fail_wiphy_register; + + rtnl_lock(); + /* add one default STA interface if supported */ if (local->hw.wiphy->interface_modes & BIT(NL80211_IFTYPE_STATION) && !ieee80211_hw_check(hw, NO_AUTO_VIF)) { @@ -1312,17 +1314,17 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) #if defined(CONFIG_INET) || defined(CONFIG_IPV6) fail_ifa: #endif + wiphy_unregister(local->hw.wiphy); + fail_wiphy_register: rtnl_lock(); rate_control_deinitialize(local); ieee80211_remove_interfaces(local); - fail_rate: rtnl_unlock(); + fail_rate: fail_flows: ieee80211_led_exit(local); destroy_workqueue(local->workqueue); fail_workqueue: - wiphy_unregister(local->hw.wiphy); - fail_wiphy_register: if (local->wiphy_ciphers_allocated) kfree(local->hw.wiphy->cipher_suites); kfree(local->int_scan_req); @@ -1372,8 +1374,8 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw) skb_queue_purge(&local->skb_queue_unreliable); skb_queue_purge(&local->skb_queue_tdls_chsw); - destroy_workqueue(local->workqueue); wiphy_unregister(local->hw.wiphy); + destroy_workqueue(local->workqueue); ieee80211_led_exit(local); kfree(local->int_scan_req); } From 93e2d04a1888668183f3fb48666e90b9b31d29e6 Mon Sep 17 00:00:00 2001 From: Tamizh chelvam Date: Sat, 28 Mar 2020 19:23:24 +0530 Subject: [PATCH 220/331] mac80211: fix channel switch trigger from unknown mesh peer Previously mesh channel switch happens if beacon contains CSA IE without checking the mesh peer info. Due to that channel switch happens even if the beacon is not from its own mesh peer. Fixing that by checking if the CSA originated from the same mesh network before proceeding for channel switch. Signed-off-by: Tamizh chelvam Link: https://lore.kernel.org/r/1585403604-29274-1-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg --- net/mac80211/mesh.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index d09b3c789314..36978a0e5000 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -1257,15 +1257,15 @@ static void ieee80211_mesh_rx_bcn_presp(struct ieee80211_sub_if_data *sdata, sdata->u.mesh.mshcfg.rssi_threshold < rx_status->signal) mesh_neighbour_update(sdata, mgmt->sa, &elems, rx_status); + + if (ifmsh->csa_role != IEEE80211_MESH_CSA_ROLE_INIT && + !sdata->vif.csa_active) + ieee80211_mesh_process_chnswitch(sdata, &elems, true); } if (ifmsh->sync_ops) ifmsh->sync_ops->rx_bcn_presp(sdata, stype, mgmt, &elems, rx_status); - - if (ifmsh->csa_role != IEEE80211_MESH_CSA_ROLE_INIT && - !sdata->vif.csa_active) - ieee80211_mesh_process_chnswitch(sdata, &elems, true); } int ieee80211_mesh_finish_csa(struct ieee80211_sub_if_data *sdata) @@ -1373,6 +1373,9 @@ static void mesh_rx_csa_frame(struct ieee80211_sub_if_data *sdata, ieee802_11_parse_elems(pos, len - baselen, true, &elems, mgmt->bssid, NULL); + if (!mesh_matches_local(sdata, &elems)) + return; + ifmsh->chsw_ttl = elems.mesh_chansw_params_ie->mesh_ttl; if (!--ifmsh->chsw_ttl) fwd_csa = false; From e82a118f57b89bbb437ce70780fc2678d5c281e5 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Sun, 12 Apr 2020 22:25:33 +0200 Subject: [PATCH 221/331] clone3: fix cgroup argument sanity check Checking that cgroup field value of struct clone_args is less than 0 is useless, as it is defined as unsigned 64-bit integer. Moreover, it doesn't catch the situations where its higher bits are lost during the assignment to the cgroup field of the cgroup field of the internal struct kernel_clone_args (where it is declared as signed 32-bit integer), so it is still possible to pass garbage there. A check against INT_MAX solves both these issues. Fixes: ef2c41cf38a7559b ("clone3: allow spawning processes into cgroups") Signed-off-by: Eugene Syromiatnikov Acked-by: Christian Brauner Link: https://lore.kernel.org/r/20200412202533.GA29554@asgard.redhat.com Signed-off-by: Christian Brauner --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 4385f3d639f2..b4f7775623c8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2631,7 +2631,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, !valid_signal(args.exit_signal))) return -EINVAL; - if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup < 0) + if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup > INT_MAX) return -EINVAL; *kargs = (struct kernel_clone_args){ From 62173872ca65767c586217dec0a32485da8a2f07 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Sun, 12 Apr 2020 22:31:23 +0200 Subject: [PATCH 222/331] clone3: add a check for the user struct size if CLONE_INTO_CGROUP is set Passing CLONE_INTO_CGROUP with an under-sized structure (that doesn't properly contain cgroup field) seems like garbage input, especially considering the fact that fd 0 is a valid descriptor. Signed-off-by: Eugene Syromiatnikov Acked-by: Christian Brauner Link: https://lore.kernel.org/r/20200412203123.GA5869@asgard.redhat.com Signed-off-by: Christian Brauner --- kernel/fork.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index b4f7775623c8..3ab7cf88e455 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2631,7 +2631,8 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, !valid_signal(args.exit_signal))) return -EINVAL; - if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup > INT_MAX) + if ((args.flags & CLONE_INTO_CGROUP) && + (args.cgroup > INT_MAX || usize < CLONE_ARGS_SIZE_VER2)) return -EINVAL; *kargs = (struct kernel_clone_args){ From a966dcfe153ab0a3d8d79cd971a079411a489be7 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Sun, 12 Apr 2020 22:26:58 +0200 Subject: [PATCH 223/331] clone3: add build-time CLONE_ARGS_SIZE_VER* validity checks CLONE_ARGS_SIZE_VER* macros are defined explicitly and not via the offsets of the relevant struct clone_args fields, which makes it rather error-prone, so it probably makes sense to add some compile-time checks for them (including the one that breaks on struct clone_args extension as a reminder to add a relevant size macro and a similar check). Function copy_clone_args_from_user seems to be a good place for such checks. Signed-off-by: Eugene Syromiatnikov Acked-by: Christian Brauner Link: https://lore.kernel.org/r/20200412202658.GA31499@asgard.redhat.com Signed-off-by: Christian Brauner --- kernel/fork.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index 3ab7cf88e455..8c700f881d92 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2605,6 +2605,14 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, struct clone_args args; pid_t *kset_tid = kargs->set_tid; + BUILD_BUG_ON(offsetofend(struct clone_args, tls) != + CLONE_ARGS_SIZE_VER0); + BUILD_BUG_ON(offsetofend(struct clone_args, set_tid_size) != + CLONE_ARGS_SIZE_VER1); + BUILD_BUG_ON(offsetofend(struct clone_args, cgroup) != + CLONE_ARGS_SIZE_VER2); + BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2); + if (unlikely(usize > PAGE_SIZE)) return -E2BIG; if (unlikely(usize < CLONE_ARGS_SIZE_VER0)) From 3662daf023500dc084fa3b96f68a6f46179ddc73 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Fri, 3 Apr 2020 18:35:17 -0400 Subject: [PATCH 224/331] sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters The "isolcpus=" parameter allows sub-parameters before the cpulist is specified, and if the parser detects an unknown sub-parameters the whole parameter will be ignored. This design is incompatible with itself when new sub-parameters are added. An older kernel will not recognize the new sub-parameter and will invalidate the whole parameter so the CPU isolation will not take effect. It emits a warning: isolcpus: Error, unknown flag The better and compatible way is to allow "isolcpus=" to skip unknown sub-parameters, so that even if new sub-parameters are added an older kernel will still be able to behave as usual even if with the new sub-parameter specified on the command line. Ideally this should have been there when the first sub-parameter for "isolcpus=" was introduced. Suggested-by: Thomas Gleixner Signed-off-by: Peter Xu Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com --- kernel/sched/isolation.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 008d6ac2342b..808244f3ddd9 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -149,6 +149,9 @@ __setup("nohz_full=", housekeeping_nohz_full_setup); static int __init housekeeping_isolcpus_setup(char *str) { unsigned int flags = 0; + bool illegal = false; + char *par; + int len; while (isalpha(*str)) { if (!strncmp(str, "nohz,", 5)) { @@ -169,8 +172,22 @@ static int __init housekeeping_isolcpus_setup(char *str) continue; } - pr_warn("isolcpus: Error, unknown flag\n"); - return 0; + /* + * Skip unknown sub-parameter and validate that it is not + * containing an invalid character. + */ + for (par = str, len = 0; *str && *str != ','; str++, len++) { + if (!isalpha(*str) && *str != '_') + illegal = true; + } + + if (illegal) { + pr_warn("isolcpus: Invalid flag %.*s\n", len, par); + return 0; + } + + pr_info("isolcpus: Skipped unknown flag %.*s\n", len, par); + str++; } /* Default behaviour for isolcpus without flags */ From e0d648f9d883ec1efab261af158d73aa30e9dd12 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 27 Mar 2020 22:43:34 +0100 Subject: [PATCH 225/331] sched/vtime: Work around an unitialized variable warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Work around this warning: kernel/sched/cputime.c: In function ‘kcpustat_field’: kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized] because GCC can't see that val is used only when err is 0. Acked-by: Peter Zijlstra Signed-off-by: Borislav Petkov Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic --- kernel/sched/cputime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index dac9104d126f..ff9435dee1df 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -1003,12 +1003,12 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat, enum cpu_usage_stat usage, int cpu) { u64 *cpustat = kcpustat->cpustat; + u64 val = cpustat[usage]; struct rq *rq; - u64 val; int err; if (!vtime_accounting_enabled_cpu(cpu)) - return cpustat[usage]; + return val; rq = cpu_rq(cpu); From b0e387c3ec0170b429f15c53b6183fe1c691403b Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Mon, 13 Apr 2020 16:22:13 +0800 Subject: [PATCH 226/331] x86/umip: Make umip_insns static Fix the following sparse warning: arch/x86/kernel/umip.c:84:12: warning: symbol 'umip_insns' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Signed-off-by: Thomas Gleixner Acked-by: Ricardo Neri Link: https://lkml.kernel.org/r/20200413082213.22934-1-yanaijie@huawei.com --- arch/x86/kernel/umip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index 4d732a444711..8d5cbe1bbb3b 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -81,7 +81,7 @@ #define UMIP_INST_SLDT 3 /* 0F 00 /0 */ #define UMIP_INST_STR 4 /* 0F 00 /1 */ -const char * const umip_insns[5] = { +static const char * const umip_insns[5] = { [UMIP_INST_SGDT] = "SGDT", [UMIP_INST_SIDT] = "SIDT", [UMIP_INST_SMSW] = "SMSW", From d79294d0de12ddd1420110813626d691f440b86f Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 7 Apr 2020 20:11:16 +0200 Subject: [PATCH 227/331] i2c: designware: platdrv: Remove DPM_FLAG_SMART_SUSPEND flag on BYT and CHT We already set DPM_FLAG_SMART_PREPARE, so we completely skip all callbacks (other then prepare) where possible, quoting from dw_i2c_plat_prepare(): /* * If the ACPI companion device object is present for this device, it * may be accessed during suspend and resume of other devices via I2C * operation regions, so tell the PM core and middle layers to avoid * skipping system suspend/resume callbacks for it in that case. */ return !has_acpi_companion(dev); Also setting the DPM_FLAG_SMART_SUSPEND will cause acpi_subsys_suspend() to leave the controller runtime-suspended even if dw_i2c_plat_prepare() returned 0. Leaving the controller runtime-suspended normally, when the I2C controller is suspended during the suspend_late phase, is not an issue because the pm_runtime_get_sync() done by i2c_dw_xfer() will (runtime-)resume it. But for dw I2C controllers on Bay- and Cherry-Trail devices acpi_lpss.c leaves the controller alive until the suspend_noirq phase, because it may be used by the _PS3 ACPI methods of PCI devices and PCI devices are left powered on until the suspend_noirq phase. Between the suspend_late and resume_early phases runtime-pm is disabled. So for any ACPI I2C OPRegion accesses done after the suspend_late phase, the pm_runtime_get_sync() done by i2c_dw_xfer() is a no-op and the controller is left runtime-suspended. i2c_dw_xfer() has a check to catch this condition (rather then waiting for the I2C transfer to timeout because the controller is suspended). acpi_subsys_suspend() leaving the controller runtime-suspended in combination with an ACPI I2C OPRegion access done after the suspend_late phase triggers this check, leading to the following error being logged on a Bay Trail based Lenovo Thinkpad 8 tablet: [ 93.275882] i2c_designware 80860F41:00: Transfer while suspended [ 93.275993] WARNING: CPU: 0 PID: 412 at drivers/i2c/busses/i2c-designware-master.c:429 i2c_dw_xfer+0x239/0x280 ... [ 93.276252] Workqueue: kacpi_notify acpi_os_execute_deferred [ 93.276267] RIP: 0010:i2c_dw_xfer+0x239/0x280 ... [ 93.276340] Call Trace: [ 93.276366] __i2c_transfer+0x121/0x520 [ 93.276379] i2c_transfer+0x4c/0x100 [ 93.276392] i2c_acpi_space_handler+0x219/0x510 [ 93.276408] ? up+0x40/0x60 [ 93.276419] ? i2c_acpi_notify+0x130/0x130 [ 93.276433] acpi_ev_address_space_dispatch+0x1e1/0x252 ... So since on BYT and CHT platforms we want ACPI I2c OPRegion accesses to work until the suspend_noirq phase, we need the controller to be runtime-resumed during the suspend phase if it is runtime-suspended suspended at that time. This means that we must not set the DPM_FLAG_SMART_SUSPEND on these platforms. On BYT and CHT we already have a special ACCESS_NO_IRQ_SUSPEND flag to make sure the controller stays functional until the suspend_noirq phase. This commit makes the driver not set the DPM_FLAG_SMART_SUSPEND flag when that flag is set. Cc: stable@vger.kernel.org Fixes: b30f2f65568f ("i2c: designware: Set IRQF_NO_SUSPEND flag for all BYT and CHT controllers") Signed-off-by: Hans de Goede Reviewed-by: Andy Shevchenko Acked-by: Rafael J. Wysocki Acked-by: Jarkko Nikula Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-designware-platdrv.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c index c98befe2a92e..5536673060cc 100644 --- a/drivers/i2c/busses/i2c-designware-platdrv.c +++ b/drivers/i2c/busses/i2c-designware-platdrv.c @@ -354,10 +354,16 @@ static int dw_i2c_plat_probe(struct platform_device *pdev) adap->dev.of_node = pdev->dev.of_node; adap->nr = -1; - dev_pm_set_driver_flags(&pdev->dev, - DPM_FLAG_SMART_PREPARE | - DPM_FLAG_SMART_SUSPEND | - DPM_FLAG_LEAVE_SUSPENDED); + if (dev->flags & ACCESS_NO_IRQ_SUSPEND) { + dev_pm_set_driver_flags(&pdev->dev, + DPM_FLAG_SMART_PREPARE | + DPM_FLAG_LEAVE_SUSPENDED); + } else { + dev_pm_set_driver_flags(&pdev->dev, + DPM_FLAG_SMART_PREPARE | + DPM_FLAG_SMART_SUSPEND | + DPM_FLAG_LEAVE_SUSPENDED); + } /* The code below assumes runtime PM to be disabled. */ WARN_ON(pm_runtime_enabled(&pdev->dev)); From edb2c9dd3948738ef030c32b948543e84f4d3f81 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 27 Mar 2020 23:28:26 +0100 Subject: [PATCH 228/331] i2c: altera: use proper variable to hold errno device_property_read_u32() returns errno or 0, so we should use the integer variable 'ret' and not the u32 'val' to hold the retval. Fixes: 0560ad576268 ("i2c: altera: Add Altera I2C Controller driver") Signed-off-by: Wolfram Sang Reviewed-by: Thor Thayer Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-altera.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/i2c/busses/i2c-altera.c b/drivers/i2c/busses/i2c-altera.c index 20ef63820c77..f5c00f903df3 100644 --- a/drivers/i2c/busses/i2c-altera.c +++ b/drivers/i2c/busses/i2c-altera.c @@ -384,7 +384,6 @@ static int altr_i2c_probe(struct platform_device *pdev) struct altr_i2c_dev *idev = NULL; struct resource *res; int irq, ret; - u32 val; idev = devm_kzalloc(&pdev->dev, sizeof(*idev), GFP_KERNEL); if (!idev) @@ -411,17 +410,17 @@ static int altr_i2c_probe(struct platform_device *pdev) init_completion(&idev->msg_complete); spin_lock_init(&idev->lock); - val = device_property_read_u32(idev->dev, "fifo-size", + ret = device_property_read_u32(idev->dev, "fifo-size", &idev->fifo_size); - if (val) { + if (ret) { dev_err(&pdev->dev, "FIFO size set to default of %d\n", ALTR_I2C_DFLT_FIFO_SZ); idev->fifo_size = ALTR_I2C_DFLT_FIFO_SZ; } - val = device_property_read_u32(idev->dev, "clock-frequency", + ret = device_property_read_u32(idev->dev, "clock-frequency", &idev->bus_clk_rate); - if (val) { + if (ret) { dev_err(&pdev->dev, "Default to 100kHz\n"); idev->bus_clk_rate = I2C_MAX_STANDARD_MODE_FREQ; /* default clock rate */ } From 3c1d1613be80c2e17f1ddf672df1d8a8caebfd0d Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Mon, 6 Apr 2020 14:25:31 +0200 Subject: [PATCH 229/331] i2c: remove i2c_new_probed_device API All in-tree users have been converted to the new i2c_new_scanned_device function, so remove this deprecated one. Signed-off-by: Wolfram Sang Signed-off-by: Wolfram Sang --- drivers/i2c/i2c-core-base.c | 13 ------------- include/linux/i2c.h | 6 ------ 2 files changed, 19 deletions(-) diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index 5cc0b0ec5570..a66912782064 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -2273,19 +2273,6 @@ i2c_new_scanned_device(struct i2c_adapter *adap, } EXPORT_SYMBOL_GPL(i2c_new_scanned_device); -struct i2c_client * -i2c_new_probed_device(struct i2c_adapter *adap, - struct i2c_board_info *info, - unsigned short const *addr_list, - int (*probe)(struct i2c_adapter *adap, unsigned short addr)) -{ - struct i2c_client *client; - - client = i2c_new_scanned_device(adap, info, addr_list, probe); - return IS_ERR(client) ? NULL : client; -} -EXPORT_SYMBOL_GPL(i2c_new_probed_device); - struct i2c_adapter *i2c_get_adapter(int nr) { struct i2c_adapter *adapter; diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 456fc17ecb1c..45d36ba4826b 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -461,12 +461,6 @@ i2c_new_scanned_device(struct i2c_adapter *adap, unsigned short const *addr_list, int (*probe)(struct i2c_adapter *adap, unsigned short addr)); -struct i2c_client * -i2c_new_probed_device(struct i2c_adapter *adap, - struct i2c_board_info *info, - unsigned short const *addr_list, - int (*probe)(struct i2c_adapter *adap, unsigned short addr)); - /* Common custom probe functions */ int i2c_probe_func_quick_read(struct i2c_adapter *adap, unsigned short addr); From 9cc3d0c6915aee5140f8335d41bbc3ff1b79aa4e Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 14 Apr 2020 11:42:48 +0100 Subject: [PATCH 230/331] arm64: vdso: don't free unallocated pages The aarch32_vdso_pages[] array never has entries allocated in the C_VVAR or C_VDSO slots, and as the array is zero initialized these contain NULL. However in __aarch32_alloc_vdso_pages() when aarch32_alloc_kuser_vdso_page() fails we attempt to free the page whose struct page is at NULL, which is obviously nonsensical. This patch removes the erroneous page freeing. Fixes: 7c1deeeb0130 ("arm64: compat: VDSO setup for compat layer") Cc: # 5.3.x- Cc: Vincenzo Frascino Acked-by: Will Deacon Signed-off-by: Mark Rutland Signed-off-by: Catalin Marinas --- arch/arm64/kernel/vdso.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 354b11e27c07..033a48f30dbb 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -260,18 +260,7 @@ static int __aarch32_alloc_vdso_pages(void) if (ret) return ret; - ret = aarch32_alloc_kuser_vdso_page(); - if (ret) { - unsigned long c_vvar = - (unsigned long)page_to_virt(aarch32_vdso_pages[C_VVAR]); - unsigned long c_vdso = - (unsigned long)page_to_virt(aarch32_vdso_pages[C_VDSO]); - - free_page(c_vvar); - free_page(c_vdso); - } - - return ret; + return aarch32_alloc_kuser_vdso_page(); } #else static int __aarch32_alloc_vdso_pages(void) From 99e3a236dd43d06c65af0a2ef9cb44306aef6e02 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Tue, 14 Apr 2020 09:35:15 +0200 Subject: [PATCH 231/331] xsk: Add missing check on user supplied headroom size Add a check that the headroom cannot be larger than the available space in the chunk. In the current code, a malicious user can set the headroom to a value larger than the chunk size minus the fixed XDP headroom. That way packets with a length larger than the supported size in the umem could get accepted and result in an out-of-bounds write. Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: Bui Quang Minh Signed-off-by: Magnus Karlsson Signed-off-by: Daniel Borkmann Link: https://bugzilla.kernel.org/show_bug.cgi?id=207225 Link: https://lore.kernel.org/bpf/1586849715-23490-1-git-send-email-magnus.karlsson@intel.com --- net/xdp/xdp_umem.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index fa7bb5e060d0..ed7a6060f73c 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -343,7 +343,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) u32 chunk_size = mr->chunk_size, headroom = mr->headroom; unsigned int chunks, chunks_per_page; u64 addr = mr->addr, size = mr->len; - int size_chk, err; + int err; if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) { /* Strictly speaking we could support this, if: @@ -382,8 +382,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) return -EINVAL; } - size_chk = chunk_size - headroom - XDP_PACKET_HEADROOM; - if (size_chk < 0) + if (headroom >= chunk_size - XDP_PACKET_HEADROOM) return -EINVAL; umem->address = (unsigned long)addr; From 25498a1969bf3687c29c29bbac92821d7a0f8b4a Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Tue, 14 Apr 2020 11:26:45 -0700 Subject: [PATCH 232/331] libbpf: Always specify expected_attach_type on program load if supported For some types of BPF programs that utilize expected_attach_type, libbpf won't set load_attr.expected_attach_type, even if expected_attach_type is known from section definition. This was done to preserve backwards compatibility with old kernels that didn't recognize expected_attach_type attribute yet (which was added in 5e43f899b03a ("bpf: Check attach type at prog load time"). But this is problematic for some BPF programs that utilize newer features that require kernel to know specific expected_attach_type (e.g., extended set of return codes for cgroup_skb/egress programs). This patch makes libbpf specify expected_attach_type by default, but also detect support for this field in kernel and not set it during program load. This allows to have a good metadata for bpf_program (e.g., bpf_program__get_extected_attach_type()), but still work with old kernels (for cases where it can work at all). Additionally, due to expected_attach_type being always set for recognized program types, bpf_program__attach_cgroup doesn't have to do extra checks to determine correct attach type, so remove that additional logic. Also adjust section_names selftest to account for this change. More detailed discussion can be found in [0]. [0] https://lore.kernel.org/bpf/20200412003604.GA15986@rdna-mbp.dhcp.thefacebook.com/ Fixes: 5cf1e9145630 ("bpf: cgroup inet skb programs can return 0 to 3") Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time") Reported-by: Andrey Ignatov Signed-off-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Acked-by: Song Liu Acked-by: Andrey Ignatov Link: https://lore.kernel.org/bpf/20200414182645.1368174-1-andriin@fb.com --- tools/lib/bpf/libbpf.c | 126 ++++++++++++------ .../selftests/bpf/prog_tests/section_names.c | 42 +++--- 2 files changed, 109 insertions(+), 59 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index ff9174282a8c..8f480e29a6b0 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -178,6 +178,8 @@ struct bpf_capabilities { __u32 array_mmap:1; /* BTF_FUNC_GLOBAL is supported */ __u32 btf_func_global:1; + /* kernel support for expected_attach_type in BPF_PROG_LOAD */ + __u32 exp_attach_type:1; }; enum reloc_type { @@ -194,6 +196,22 @@ struct reloc_desc { int sym_off; }; +struct bpf_sec_def; + +typedef struct bpf_link *(*attach_fn_t)(const struct bpf_sec_def *sec, + struct bpf_program *prog); + +struct bpf_sec_def { + const char *sec; + size_t len; + enum bpf_prog_type prog_type; + enum bpf_attach_type expected_attach_type; + bool is_exp_attach_type_optional; + bool is_attachable; + bool is_attach_btf; + attach_fn_t attach_fn; +}; + /* * bpf_prog should be a better name but it has been used in * linux/filter.h. @@ -204,6 +222,7 @@ struct bpf_program { char *name; int prog_ifindex; char *section_name; + const struct bpf_sec_def *sec_def; /* section_name with / replaced by _; makes recursive pinning * in bpf_object__pin_programs easier */ @@ -3315,6 +3334,37 @@ static int bpf_object__probe_array_mmap(struct bpf_object *obj) return 0; } +static int +bpf_object__probe_exp_attach_type(struct bpf_object *obj) +{ + struct bpf_load_program_attr attr; + struct bpf_insn insns[] = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + int fd; + + memset(&attr, 0, sizeof(attr)); + /* use any valid combination of program type and (optional) + * non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS) + * to see if kernel supports expected_attach_type field for + * BPF_PROG_LOAD command + */ + attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK; + attr.expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE; + attr.insns = insns; + attr.insns_cnt = ARRAY_SIZE(insns); + attr.license = "GPL"; + + fd = bpf_load_program_xattr(&attr, NULL, 0); + if (fd >= 0) { + obj->caps.exp_attach_type = 1; + close(fd); + return 1; + } + return 0; +} + static int bpf_object__probe_caps(struct bpf_object *obj) { @@ -3325,6 +3375,7 @@ bpf_object__probe_caps(struct bpf_object *obj) bpf_object__probe_btf_func_global, bpf_object__probe_btf_datasec, bpf_object__probe_array_mmap, + bpf_object__probe_exp_attach_type, }; int i, ret; @@ -4861,7 +4912,12 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); load_attr.prog_type = prog->type; - load_attr.expected_attach_type = prog->expected_attach_type; + /* old kernels might not support specifying expected_attach_type */ + if (!prog->caps->exp_attach_type && prog->sec_def && + prog->sec_def->is_exp_attach_type_optional) + load_attr.expected_attach_type = 0; + else + load_attr.expected_attach_type = prog->expected_attach_type; if (prog->caps->name) load_attr.name = prog->name; load_attr.insns = insns; @@ -5062,6 +5118,8 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level) return 0; } +static const struct bpf_sec_def *find_sec_def(const char *sec_name); + static struct bpf_object * __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz, const struct bpf_object_open_opts *opts) @@ -5117,24 +5175,17 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz, bpf_object__elf_finish(obj); bpf_object__for_each_program(prog, obj) { - enum bpf_prog_type prog_type; - enum bpf_attach_type attach_type; - - if (prog->type != BPF_PROG_TYPE_UNSPEC) - continue; - - err = libbpf_prog_type_by_name(prog->section_name, &prog_type, - &attach_type); - if (err == -ESRCH) + prog->sec_def = find_sec_def(prog->section_name); + if (!prog->sec_def) /* couldn't guess, but user might manually specify */ continue; - if (err) - goto out; - bpf_program__set_type(prog, prog_type); - bpf_program__set_expected_attach_type(prog, attach_type); - if (prog_type == BPF_PROG_TYPE_TRACING || - prog_type == BPF_PROG_TYPE_EXT) + bpf_program__set_type(prog, prog->sec_def->prog_type); + bpf_program__set_expected_attach_type(prog, + prog->sec_def->expected_attach_type); + + if (prog->sec_def->prog_type == BPF_PROG_TYPE_TRACING || + prog->sec_def->prog_type == BPF_PROG_TYPE_EXT) prog->attach_prog_fd = OPTS_GET(opts, attach_prog_fd, 0); } @@ -6223,23 +6274,32 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog, prog->expected_attach_type = type; } -#define BPF_PROG_SEC_IMPL(string, ptype, eatype, is_attachable, btf, atype) \ - { string, sizeof(string) - 1, ptype, eatype, is_attachable, btf, atype } +#define BPF_PROG_SEC_IMPL(string, ptype, eatype, eatype_optional, \ + attachable, attach_btf) \ + { \ + .sec = string, \ + .len = sizeof(string) - 1, \ + .prog_type = ptype, \ + .expected_attach_type = eatype, \ + .is_exp_attach_type_optional = eatype_optional, \ + .is_attachable = attachable, \ + .is_attach_btf = attach_btf, \ + } /* Programs that can NOT be attached. */ #define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, 0, 0, 0) /* Programs that can be attached. */ #define BPF_APROG_SEC(string, ptype, atype) \ - BPF_PROG_SEC_IMPL(string, ptype, 0, 1, 0, atype) + BPF_PROG_SEC_IMPL(string, ptype, atype, true, 1, 0) /* Programs that must specify expected attach type at load time. */ #define BPF_EAPROG_SEC(string, ptype, eatype) \ - BPF_PROG_SEC_IMPL(string, ptype, eatype, 1, 0, eatype) + BPF_PROG_SEC_IMPL(string, ptype, eatype, false, 1, 0) /* Programs that use BTF to identify attach point */ #define BPF_PROG_BTF(string, ptype, eatype) \ - BPF_PROG_SEC_IMPL(string, ptype, eatype, 0, 1, 0) + BPF_PROG_SEC_IMPL(string, ptype, eatype, false, 0, 1) /* Programs that can be attached but attach type can't be identified by section * name. Kept for backward compatibility. @@ -6253,11 +6313,6 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog, __VA_ARGS__ \ } -struct bpf_sec_def; - -typedef struct bpf_link *(*attach_fn_t)(const struct bpf_sec_def *sec, - struct bpf_program *prog); - static struct bpf_link *attach_kprobe(const struct bpf_sec_def *sec, struct bpf_program *prog); static struct bpf_link *attach_tp(const struct bpf_sec_def *sec, @@ -6269,17 +6324,6 @@ static struct bpf_link *attach_trace(const struct bpf_sec_def *sec, static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec, struct bpf_program *prog); -struct bpf_sec_def { - const char *sec; - size_t len; - enum bpf_prog_type prog_type; - enum bpf_attach_type expected_attach_type; - bool is_attachable; - bool is_attach_btf; - enum bpf_attach_type attach_type; - attach_fn_t attach_fn; -}; - static const struct bpf_sec_def section_defs[] = { BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), BPF_PROG_SEC("sk_reuseport", BPF_PROG_TYPE_SK_REUSEPORT), @@ -6713,7 +6757,7 @@ int libbpf_attach_type_by_name(const char *name, continue; if (!section_defs[i].is_attachable) return -EINVAL; - *attach_type = section_defs[i].attach_type; + *attach_type = section_defs[i].expected_attach_type; return 0; } pr_debug("failed to guess attach type based on ELF section name '%s'\n", name); @@ -7542,7 +7586,6 @@ static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec, struct bpf_link * bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd) { - const struct bpf_sec_def *sec_def; enum bpf_attach_type attach_type; char errmsg[STRERR_BUFSIZE]; struct bpf_link *link; @@ -7561,11 +7604,6 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd) link->detach = &bpf_link__detach_fd; attach_type = bpf_program__get_expected_attach_type(prog); - if (!attach_type) { - sec_def = find_sec_def(bpf_program__title(prog, false)); - if (sec_def) - attach_type = sec_def->attach_type; - } link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL); if (link_fd < 0) { link_fd = -errno; diff --git a/tools/testing/selftests/bpf/prog_tests/section_names.c b/tools/testing/selftests/bpf/prog_tests/section_names.c index 9d9351dc2ded..713167449c98 100644 --- a/tools/testing/selftests/bpf/prog_tests/section_names.c +++ b/tools/testing/selftests/bpf/prog_tests/section_names.c @@ -43,18 +43,18 @@ static struct sec_name_test tests[] = { {"lwt_seg6local", {0, BPF_PROG_TYPE_LWT_SEG6LOCAL, 0}, {-EINVAL, 0} }, { "cgroup_skb/ingress", - {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, + {0, BPF_PROG_TYPE_CGROUP_SKB, BPF_CGROUP_INET_INGRESS}, {0, BPF_CGROUP_INET_INGRESS}, }, { "cgroup_skb/egress", - {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, + {0, BPF_PROG_TYPE_CGROUP_SKB, BPF_CGROUP_INET_EGRESS}, {0, BPF_CGROUP_INET_EGRESS}, }, {"cgroup/skb", {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, {-EINVAL, 0} }, { "cgroup/sock", - {0, BPF_PROG_TYPE_CGROUP_SOCK, 0}, + {0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET_SOCK_CREATE}, {0, BPF_CGROUP_INET_SOCK_CREATE}, }, { @@ -69,26 +69,38 @@ static struct sec_name_test tests[] = { }, { "cgroup/dev", - {0, BPF_PROG_TYPE_CGROUP_DEVICE, 0}, + {0, BPF_PROG_TYPE_CGROUP_DEVICE, BPF_CGROUP_DEVICE}, {0, BPF_CGROUP_DEVICE}, }, - {"sockops", {0, BPF_PROG_TYPE_SOCK_OPS, 0}, {0, BPF_CGROUP_SOCK_OPS} }, + { + "sockops", + {0, BPF_PROG_TYPE_SOCK_OPS, BPF_CGROUP_SOCK_OPS}, + {0, BPF_CGROUP_SOCK_OPS}, + }, { "sk_skb/stream_parser", - {0, BPF_PROG_TYPE_SK_SKB, 0}, + {0, BPF_PROG_TYPE_SK_SKB, BPF_SK_SKB_STREAM_PARSER}, {0, BPF_SK_SKB_STREAM_PARSER}, }, { "sk_skb/stream_verdict", - {0, BPF_PROG_TYPE_SK_SKB, 0}, + {0, BPF_PROG_TYPE_SK_SKB, BPF_SK_SKB_STREAM_VERDICT}, {0, BPF_SK_SKB_STREAM_VERDICT}, }, {"sk_skb", {0, BPF_PROG_TYPE_SK_SKB, 0}, {-EINVAL, 0} }, - {"sk_msg", {0, BPF_PROG_TYPE_SK_MSG, 0}, {0, BPF_SK_MSG_VERDICT} }, - {"lirc_mode2", {0, BPF_PROG_TYPE_LIRC_MODE2, 0}, {0, BPF_LIRC_MODE2} }, + { + "sk_msg", + {0, BPF_PROG_TYPE_SK_MSG, BPF_SK_MSG_VERDICT}, + {0, BPF_SK_MSG_VERDICT}, + }, + { + "lirc_mode2", + {0, BPF_PROG_TYPE_LIRC_MODE2, BPF_LIRC_MODE2}, + {0, BPF_LIRC_MODE2}, + }, { "flow_dissector", - {0, BPF_PROG_TYPE_FLOW_DISSECTOR, 0}, + {0, BPF_PROG_TYPE_FLOW_DISSECTOR, BPF_FLOW_DISSECTOR}, {0, BPF_FLOW_DISSECTOR}, }, { @@ -158,17 +170,17 @@ static void test_prog_type_by_name(const struct sec_name_test *test) &expected_attach_type); CHECK(rc != test->expected_load.rc, "check_code", - "prog: unexpected rc=%d for %s", rc, test->sec_name); + "prog: unexpected rc=%d for %s\n", rc, test->sec_name); if (rc) return; CHECK(prog_type != test->expected_load.prog_type, "check_prog_type", - "prog: unexpected prog_type=%d for %s", + "prog: unexpected prog_type=%d for %s\n", prog_type, test->sec_name); CHECK(expected_attach_type != test->expected_load.expected_attach_type, - "check_attach_type", "prog: unexpected expected_attach_type=%d for %s", + "check_attach_type", "prog: unexpected expected_attach_type=%d for %s\n", expected_attach_type, test->sec_name); } @@ -180,13 +192,13 @@ static void test_attach_type_by_name(const struct sec_name_test *test) rc = libbpf_attach_type_by_name(test->sec_name, &attach_type); CHECK(rc != test->expected_attach.rc, "check_ret", - "attach: unexpected rc=%d for %s", rc, test->sec_name); + "attach: unexpected rc=%d for %s\n", rc, test->sec_name); if (rc) return; CHECK(attach_type != test->expected_attach.attach_type, - "check_attach_type", "attach: unexpected attach_type=%d for %s", + "check_attach_type", "attach: unexpected attach_type=%d for %s\n", attach_type, test->sec_name); } From 49b452c382da2c2d1ccee1265cbb92da905c82f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Tue, 14 Apr 2020 16:50:24 +0200 Subject: [PATCH 233/331] libbpf: Fix type of old_fd in bpf_xdp_set_link_opts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 'old_fd' parameter used for atomic replacement of XDP programs is supposed to be an FD, but was left as a u32 from an earlier iteration of the patch that added it. It was converted to an int when read, so things worked correctly even with negative values, but better change the definition to correctly reflect the intention. Fixes: bd5ca3ef93cd ("libbpf: Add function to set link XDP fd while specifying old program") Reported-by: David Ahern Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Daniel Borkmann Acked-by: David Ahern Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20200414145025.182163-1-toke@redhat.com --- tools/lib/bpf/libbpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 44df1d3e7287..f1dacecb1619 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -458,7 +458,7 @@ struct xdp_link_info { struct bpf_xdp_set_link_opts { size_t sz; - __u32 old_fd; + int old_fd; }; #define bpf_xdp_set_link_opts__last_field old_fd From c6c111523d9e697bfb463870759825be5d6caff6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Tue, 14 Apr 2020 16:50:25 +0200 Subject: [PATCH 234/331] selftests/bpf: Check for correct program attach/detach in xdp_attach test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit David Ahern noticed that there was a bug in the EXPECTED_FD code so programs did not get detached properly when that parameter was supplied. This case was not included in the xdp_attach tests; so let's add it to be sure that such a bug does not sneak back in down. Fixes: 87854a0b57b3 ("selftests/bpf: Add tests for attaching XDP programs") Reported-by: David Ahern Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20200414145025.182163-2-toke@redhat.com --- .../selftests/bpf/prog_tests/xdp_attach.c | 30 ++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_attach.c index 05b294d6b923..15ef3531483e 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_attach.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_attach.c @@ -6,19 +6,34 @@ void test_xdp_attach(void) { + __u32 duration = 0, id1, id2, id0 = 0, len; struct bpf_object *obj1, *obj2, *obj3; const char *file = "./test_xdp.o"; + struct bpf_prog_info info = {}; int err, fd1, fd2, fd3; - __u32 duration = 0; DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, .old_fd = -1); + len = sizeof(info); + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj1, &fd1); if (CHECK_FAIL(err)) return; + err = bpf_obj_get_info_by_fd(fd1, &info, &len); + if (CHECK_FAIL(err)) + goto out_1; + id1 = info.id; + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj2, &fd2); if (CHECK_FAIL(err)) goto out_1; + + memset(&info, 0, sizeof(info)); + err = bpf_obj_get_info_by_fd(fd2, &info, &len); + if (CHECK_FAIL(err)) + goto out_2; + id2 = info.id; + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj3, &fd3); if (CHECK_FAIL(err)) goto out_2; @@ -28,6 +43,11 @@ void test_xdp_attach(void) if (CHECK(err, "load_ok", "initial load failed")) goto out_close; + err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0); + if (CHECK(err || id0 != id1, "id1_check", + "loaded prog id %u != id1 %u, err %d", id0, id1, err)) + goto out_close; + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd2, XDP_FLAGS_REPLACE, &opts); if (CHECK(!err, "load_fail", "load with expected id didn't fail")) @@ -37,6 +57,10 @@ void test_xdp_attach(void) err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd2, 0, &opts); if (CHECK(err, "replace_ok", "replace valid old_fd failed")) goto out; + err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0); + if (CHECK(err || id0 != id2, "id2_check", + "loaded prog id %u != id2 %u, err %d", id0, id2, err)) + goto out_close; err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd3, 0, &opts); if (CHECK(!err, "replace_fail", "replace invalid old_fd didn't fail")) @@ -51,6 +75,10 @@ void test_xdp_attach(void) if (CHECK(err, "remove_ok", "remove valid old_fd failed")) goto out; + err = bpf_get_link_xdp_id(IFINDEX_LO, &id0, 0); + if (CHECK(err || id0 != 0, "unload_check", + "loaded prog id %u != 0, err %d", id0, err)) + goto out_close; out: bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0); out_close: From c9a4ef66450145a356a626c833d3d7b1668b3ded Mon Sep 17 00:00:00 2001 From: Fangrui Song Date: Tue, 14 Apr 2020 09:32:55 -0700 Subject: [PATCH 235/331] arm64: Delete the space separator in __emit_inst In assembly, many instances of __emit_inst(x) expand to a directive. In a few places __emit_inst(x) is used as an assembler macro argument. For example, in arch/arm64/kvm/hyp/entry.S ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) expands to the following by the C preprocessor: alternative_insn nop, .inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 Both comma and space are separators, with an exception that content inside a pair of parentheses/quotes is not split, so the clang integrated assembler splits the arguments to: nop, .inst, (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 GNU as preprocesses the input with do_scrub_chars(). Its arm64 backend (along with many other non-x86 backends) sees: alternative_insn nop,.inst(0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 # .inst(...) is parsed as one argument while its x86 backend sees: alternative_insn nop,.inst (0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 # The extra space before '(' makes the whole .inst (...) parsed as two arguments The non-x86 backend's behavior is considered unintentional (https://sourceware.org/bugzilla/show_bug.cgi?id=25750). So drop the space separator inside `.inst (...)` to make the clang integrated assembler work. Suggested-by: Ilie Halip Signed-off-by: Fangrui Song Reviewed-by: Mark Rutland Link: https://github.com/ClangBuiltLinux/linux/issues/939 Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/sysreg.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index ebc622432831..c4ac0ac25a00 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -49,7 +49,9 @@ #ifndef CONFIG_BROKEN_GAS_INST #ifdef __ASSEMBLY__ -#define __emit_inst(x) .inst (x) +// The space separator is omitted so that __emit_inst(x) can be parsed as +// either an assembler directive or an assembler macro argument. +#define __emit_inst(x) .inst(x) #else #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" #endif From a900aeac253729411cf33c6cb598c152e9e4137f Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 24 Mar 2020 22:12:16 +0300 Subject: [PATCH 236/331] i2c: tegra: Better handle case where CPU0 is busy for a long time Boot CPU0 always handle I2C interrupt and under some rare circumstances (like running KASAN + NFS root) it may stuck in uninterruptible state for a significant time. In this case we will get timeout if I2C transfer is running on a sibling CPU, despite of IRQ being raised. In order to handle this rare condition, the IRQ status needs to be checked after completion timeout. Signed-off-by: Dmitry Osipenko Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-tegra.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index 4c4d17ddc96b..a795b4e278b1 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -996,14 +996,13 @@ tegra_i2c_poll_completion_timeout(struct tegra_i2c_dev *i2c_dev, do { u32 status = i2c_readl(i2c_dev, I2C_INT_STATUS); - if (status) { + if (status) tegra_i2c_isr(i2c_dev->irq, i2c_dev); - if (completion_done(complete)) { - s64 delta = ktime_ms_delta(ktimeout, ktime); + if (completion_done(complete)) { + s64 delta = ktime_ms_delta(ktimeout, ktime); - return msecs_to_jiffies(delta) ?: 1; - } + return msecs_to_jiffies(delta) ?: 1; } ktime = ktime_get(); @@ -1030,14 +1029,18 @@ tegra_i2c_wait_completion_timeout(struct tegra_i2c_dev *i2c_dev, disable_irq(i2c_dev->irq); /* - * There is a chance that completion may happen after IRQ - * synchronization, which is done by disable_irq(). + * Under some rare circumstances (like running KASAN + + * NFS root) CPU, which handles interrupt, may stuck in + * uninterruptible state for a significant time. In this + * case we will get timeout if I2C transfer is running on + * a sibling CPU, despite of IRQ being raised. + * + * In order to handle this rare condition, the IRQ status + * needs to be checked after timeout. */ - if (ret == 0 && completion_done(complete)) { - dev_warn(i2c_dev->dev, - "completion done after timeout\n"); - ret = 1; - } + if (ret == 0) + ret = tegra_i2c_poll_completion_timeout(i2c_dev, + complete, 0); } return ret; From 8814044fe0fa182abc9ff818d3da562de98bc9a7 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 24 Mar 2020 22:12:17 +0300 Subject: [PATCH 237/331] i2c: tegra: Synchronize DMA before termination DMA transfer could be completed, but CPU (which handles DMA interrupt) may get too busy and can't handle the interrupt in a timely manner, despite of DMA IRQ being raised. In this case the DMA state needs to synchronized before terminating DMA transfer in order not to miss the DMA transfer completion. Signed-off-by: Dmitry Osipenko Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-tegra.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index a795b4e278b1..8280ac7cc1b7 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -1219,6 +1219,15 @@ static int tegra_i2c_xfer_msg(struct tegra_i2c_dev *i2c_dev, time_left = tegra_i2c_wait_completion_timeout( i2c_dev, &i2c_dev->dma_complete, xfer_time); + /* + * Synchronize DMA first, since dmaengine_terminate_sync() + * performs synchronization after the transfer's termination + * and we want to get a completion if transfer succeeded. + */ + dmaengine_synchronize(i2c_dev->msg_read ? + i2c_dev->rx_dma_chan : + i2c_dev->tx_dma_chan); + dmaengine_terminate_sync(i2c_dev->msg_read ? i2c_dev->rx_dma_chan : i2c_dev->tx_dma_chan); From 87b0f983f66f23762921129fd35966eddc3f2dae Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 14 Apr 2020 22:36:15 +0300 Subject: [PATCH 238/331] net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge To rehash a previous explanation given in commit 1c44ce560b4d ("net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up"), the switch driver operates the in a mode where a single VLAN can be transmitted as untagged on a particular egress port. That is the "native VLAN on trunk port" use case. The configuration for this native VLAN is driven in 2 ways: - Set the egress port rewriter to strip the VLAN tag for the native VID (as it is egress-untagged, after all). - Configure the ingress port to drop untagged and priority-tagged traffic, if there is no native VLAN. The intention of this setting is that a trunk port with no native VLAN should not accept untagged traffic. Since both of the above configurations for the native VLAN should only be done if VLAN awareness is requested, they are actually done from the ocelot_port_vlan_filtering function, after the basic procedure of toggling the VLAN awareness flag of the port. But there's a problem with that simplistic approach: we are trying to juggle with 2 independent variables from a single function: - Native VLAN of the port - its value is held in port->vid. - VLAN awareness state of the port - currently there are some issues here, more on that later*. The actual problem can be seen when enslaving the switch ports to a VLAN filtering bridge: 0. The driver configures a pvid of zero for each port, when in standalone mode. While the bridge configures a default_pvid of 1 for each port that gets added as a slave to it. 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true. The VLAN-filtering-dependent portion of the native VLAN configuration is done, considering that the native VLAN is 0. 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true, untagged=true. The native VLAN changes to 1 (change which gets propagated to hardware). 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the VLAN-filtering-dependent portion of the native VLAN configuration, for the new native VLAN of 1. One can notice that after toggling "ip link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0 type bridge vlan_filtering 1", the new native VLAN finally makes it through and untagged traffic finally starts flowing again. But obviously that shouldn't be needed. So it is clear that 2 independent variables need to both re-trigger the native VLAN configuration. So we introduce the second variable as ocelot_port->vlan_aware. *Actually both the DSA Felix driver and the Ocelot driver already had each its own variable: - Ocelot: ocelot_port_private->vlan_aware - Felix: dsa_port->vlan_filtering but the common Ocelot library needs to work with a single, common, variable, so there is some refactoring done to move the vlan_aware property from the private structure into the common ocelot_port structure. Fixes: 97bb69e1e36e ("net: mscc: ocelot: break apart ocelot_vlan_port_apply") Signed-off-by: Vladimir Oltean Reviewed-by: Horatiu Vultur Signed-off-by: David S. Miller --- drivers/net/dsa/ocelot/felix.c | 5 +- drivers/net/ethernet/mscc/ocelot.c | 110 +++++++++++++++-------------- drivers/net/ethernet/mscc/ocelot.h | 2 - include/soc/mscc/ocelot.h | 4 +- 4 files changed, 60 insertions(+), 61 deletions(-) diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 79ca3aadb864..d0a3764ff0cf 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -46,11 +46,8 @@ static int felix_fdb_add(struct dsa_switch *ds, int port, const unsigned char *addr, u16 vid) { struct ocelot *ocelot = ds->priv; - bool vlan_aware; - vlan_aware = dsa_port_is_vlan_filtering(dsa_to_port(ds, port)); - - return ocelot_fdb_add(ocelot, port, addr, vid, vlan_aware); + return ocelot_fdb_add(ocelot, port, addr, vid); } static int felix_fdb_del(struct dsa_switch *ds, int port, diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index b4731df186f4..a8c48a4a708f 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -183,58 +183,11 @@ static void ocelot_vlan_mode(struct ocelot *ocelot, int port, ocelot_write(ocelot, val, ANA_VLANMASK); } -void ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, - bool vlan_aware) -{ - struct ocelot_port *ocelot_port = ocelot->ports[port]; - u32 val; - - if (vlan_aware) - val = ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA | - ANA_PORT_VLAN_CFG_VLAN_POP_CNT(1); - else - val = 0; - ocelot_rmw_gix(ocelot, val, - ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA | - ANA_PORT_VLAN_CFG_VLAN_POP_CNT_M, - ANA_PORT_VLAN_CFG, port); - - if (vlan_aware && !ocelot_port->vid) - /* If port is vlan-aware and tagged, drop untagged and priority - * tagged frames. - */ - val = ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA | - ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA | - ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA; - else - val = 0; - ocelot_rmw_gix(ocelot, val, - ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA | - ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA | - ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA, - ANA_PORT_DROP_CFG, port); - - if (vlan_aware) { - if (ocelot_port->vid) - /* Tag all frames except when VID == DEFAULT_VLAN */ - val |= REW_TAG_CFG_TAG_CFG(1); - else - /* Tag all frames */ - val |= REW_TAG_CFG_TAG_CFG(3); - } else { - /* Port tagging disabled. */ - val = REW_TAG_CFG_TAG_CFG(0); - } - ocelot_rmw_gix(ocelot, val, - REW_TAG_CFG_TAG_CFG_M, - REW_TAG_CFG, port); -} -EXPORT_SYMBOL(ocelot_port_vlan_filtering); - static int ocelot_port_set_native_vlan(struct ocelot *ocelot, int port, u16 vid) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + u32 val = 0; if (ocelot_port->vid != vid) { /* Always permit deleting the native VLAN (vid = 0) */ @@ -251,9 +204,59 @@ static int ocelot_port_set_native_vlan(struct ocelot *ocelot, int port, REW_PORT_VLAN_CFG_PORT_VID_M, REW_PORT_VLAN_CFG, port); + if (ocelot_port->vlan_aware && !ocelot_port->vid) + /* If port is vlan-aware and tagged, drop untagged and priority + * tagged frames. + */ + val = ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA | + ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA | + ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA; + ocelot_rmw_gix(ocelot, val, + ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA | + ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA | + ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA, + ANA_PORT_DROP_CFG, port); + + if (ocelot_port->vlan_aware) { + if (ocelot_port->vid) + /* Tag all frames except when VID == DEFAULT_VLAN */ + val = REW_TAG_CFG_TAG_CFG(1); + else + /* Tag all frames */ + val = REW_TAG_CFG_TAG_CFG(3); + } else { + /* Port tagging disabled. */ + val = REW_TAG_CFG_TAG_CFG(0); + } + ocelot_rmw_gix(ocelot, val, + REW_TAG_CFG_TAG_CFG_M, + REW_TAG_CFG, port); + return 0; } +void ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, + bool vlan_aware) +{ + struct ocelot_port *ocelot_port = ocelot->ports[port]; + u32 val; + + ocelot_port->vlan_aware = vlan_aware; + + if (vlan_aware) + val = ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA | + ANA_PORT_VLAN_CFG_VLAN_POP_CNT(1); + else + val = 0; + ocelot_rmw_gix(ocelot, val, + ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA | + ANA_PORT_VLAN_CFG_VLAN_POP_CNT_M, + ANA_PORT_VLAN_CFG, port); + + ocelot_port_set_native_vlan(ocelot, port, ocelot_port->vid); +} +EXPORT_SYMBOL(ocelot_port_vlan_filtering); + /* Default vlan to clasify for untagged frames (may be zero) */ static void ocelot_port_set_pvid(struct ocelot *ocelot, int port, u16 pvid) { @@ -873,12 +876,12 @@ static void ocelot_get_stats64(struct net_device *dev, } int ocelot_fdb_add(struct ocelot *ocelot, int port, - const unsigned char *addr, u16 vid, bool vlan_aware) + const unsigned char *addr, u16 vid) { struct ocelot_port *ocelot_port = ocelot->ports[port]; if (!vid) { - if (!vlan_aware) + if (!ocelot_port->vlan_aware) /* If the bridge is not VLAN aware and no VID was * provided, set it to pvid to ensure the MAC entry * matches incoming untagged packets @@ -905,7 +908,7 @@ static int ocelot_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], struct ocelot *ocelot = priv->port.ocelot; int port = priv->chip_port; - return ocelot_fdb_add(ocelot, port, addr, vid, priv->vlan_aware); + return ocelot_fdb_add(ocelot, port, addr, vid); } int ocelot_fdb_del(struct ocelot *ocelot, int port, @@ -1496,8 +1499,8 @@ static int ocelot_port_attr_set(struct net_device *dev, ocelot_port_attr_ageing_set(ocelot, port, attr->u.ageing_time); break; case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING: - priv->vlan_aware = attr->u.vlan_filtering; - ocelot_port_vlan_filtering(ocelot, port, priv->vlan_aware); + ocelot_port_vlan_filtering(ocelot, port, + attr->u.vlan_filtering); break; case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED: ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled); @@ -1868,7 +1871,6 @@ static int ocelot_netdevice_port_event(struct net_device *dev, } else { err = ocelot_port_bridge_leave(ocelot, port, info->upper_dev); - priv->vlan_aware = false; } } if (netif_is_lag_master(info->upper_dev)) { diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h index e34ef8380eb3..641af929497f 100644 --- a/drivers/net/ethernet/mscc/ocelot.h +++ b/drivers/net/ethernet/mscc/ocelot.h @@ -56,8 +56,6 @@ struct ocelot_port_private { struct phy_device *phy; u8 chip_port; - u8 vlan_aware; - struct phy *serdes; struct ocelot_port_tc tc; diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index ebffcb36a7e3..6d6a3947c8b7 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -476,6 +476,8 @@ struct ocelot_port { void __iomem *regs; + bool vlan_aware; + /* Ingress default VLAN (pvid) */ u16 pvid; @@ -610,7 +612,7 @@ int ocelot_port_bridge_leave(struct ocelot *ocelot, int port, int ocelot_fdb_dump(struct ocelot *ocelot, int port, dsa_fdb_dump_cb_t *cb, void *data); int ocelot_fdb_add(struct ocelot *ocelot, int port, - const unsigned char *addr, u16 vid, bool vlan_aware); + const unsigned char *addr, u16 vid); int ocelot_fdb_del(struct ocelot *ocelot, int port, const unsigned char *addr, u16 vid); int ocelot_vlan_add(struct ocelot *ocelot, int port, u16 vid, bool pvid, From 7dba92037baf3fa00b4880a31fd532542264994c Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 14 Apr 2020 20:02:07 -0300 Subject: [PATCH 239/331] net/rds: Use ERR_PTR for rds_message_alloc_sgs() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Returning the error code via a 'int *ret' when the function returns a pointer is very un-kernely and causes gcc 10's static analysis to choke: net/rds/message.c: In function ‘rds_message_map_pages’: net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] 358 | return ERR_PTR(ret); Use a typical ERR_PTR return instead. Signed-off-by: Jason Gunthorpe Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/message.c | 19 ++++++------------- net/rds/rdma.c | 12 ++++++++---- net/rds/rds.h | 3 +-- net/rds/send.c | 6 ++++-- 4 files changed, 19 insertions(+), 21 deletions(-) diff --git a/net/rds/message.c b/net/rds/message.c index bbecb8cb873e..071a261fdaab 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -308,26 +308,20 @@ out: /* * RDS ops use this to grab SG entries from the rm's sg pool. */ -struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents, - int *ret) +struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents) { struct scatterlist *sg_first = (struct scatterlist *) &rm[1]; struct scatterlist *sg_ret; - if (WARN_ON(!ret)) - return NULL; - if (nents <= 0) { pr_warn("rds: alloc sgs failed! nents <= 0\n"); - *ret = -EINVAL; - return NULL; + return ERR_PTR(-EINVAL); } if (rm->m_used_sgs + nents > rm->m_total_sgs) { pr_warn("rds: alloc sgs failed! total %d used %d nents %d\n", rm->m_total_sgs, rm->m_used_sgs, nents); - *ret = -ENOMEM; - return NULL; + return ERR_PTR(-ENOMEM); } sg_ret = &sg_first[rm->m_used_sgs]; @@ -343,7 +337,6 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in unsigned int i; int num_sgs = DIV_ROUND_UP(total_len, PAGE_SIZE); int extra_bytes = num_sgs * sizeof(struct scatterlist); - int ret; rm = rds_message_alloc(extra_bytes, GFP_NOWAIT); if (!rm) @@ -352,10 +345,10 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in set_bit(RDS_MSG_PAGEVEC, &rm->m_flags); rm->m_inc.i_hdr.h_len = cpu_to_be32(total_len); rm->data.op_nents = DIV_ROUND_UP(total_len, PAGE_SIZE); - rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs, &ret); - if (!rm->data.op_sg) { + rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs); + if (IS_ERR(rm->data.op_sg)) { rds_message_put(rm); - return ERR_PTR(ret); + return ERR_CAST(rm->data.op_sg); } for (i = 0; i < rm->data.op_nents; ++i) { diff --git a/net/rds/rdma.c b/net/rds/rdma.c index 113e442101ce..a7ae11846cd7 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -665,9 +665,11 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm, op->op_odp_mr = NULL; WARN_ON(!nr_pages); - op->op_sg = rds_message_alloc_sgs(rm, nr_pages, &ret); - if (!op->op_sg) + op->op_sg = rds_message_alloc_sgs(rm, nr_pages); + if (IS_ERR(op->op_sg)) { + ret = PTR_ERR(op->op_sg); goto out_pages; + } if (op->op_notify || op->op_recverr) { /* We allocate an uninitialized notifier here, because @@ -906,9 +908,11 @@ int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, rm->atomic.op_silent = !!(args->flags & RDS_RDMA_SILENT); rm->atomic.op_active = 1; rm->atomic.op_recverr = rs->rs_recverr; - rm->atomic.op_sg = rds_message_alloc_sgs(rm, 1, &ret); - if (!rm->atomic.op_sg) + rm->atomic.op_sg = rds_message_alloc_sgs(rm, 1); + if (IS_ERR(rm->atomic.op_sg)) { + ret = PTR_ERR(rm->atomic.op_sg); goto err; + } /* verify 8 byte-aligned */ if (args->local_addr & 0x7) { diff --git a/net/rds/rds.h b/net/rds/rds.h index 8e18cd2aec51..6019b0c004a9 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -844,8 +844,7 @@ rds_conn_connecting(struct rds_connection *conn) /* message.c */ struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); -struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents, - int *ret); +struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents); int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, bool zcopy); struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len); diff --git a/net/rds/send.c b/net/rds/send.c index 82dcd8b84fe7..68e2bdb08fd0 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -1274,9 +1274,11 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) /* Attach data to the rm */ if (payload_len) { - rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs, &ret); - if (!rm->data.op_sg) + rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs); + if (IS_ERR(rm->data.op_sg)) { + ret = PTR_ERR(rm->data.op_sg); goto out; + } ret = rds_message_copy_from_user(rm, &msg->msg_iter, zcopy); if (ret) goto out; From 404e603f1ec8520ca09b606496a55cfdcead4e15 Mon Sep 17 00:00:00 2001 From: Chris Packham Date: Wed, 15 Apr 2020 10:12:22 +1200 Subject: [PATCH 240/331] docs: timekeeping: Use correct prototype for deprecated functions Use the correct prototypes for do_gettimeofday(), getnstimeofday() and getnstimeofday64(). All of these returned void and passed the return value by reference. This should make the documentation of their deprecation and replacements easier to search for. Signed-off-by: Chris Packham Acked-by: Arnd Bergmann Link: https://lore.kernel.org/r/20200414221222.23996-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jonathan Corbet --- Documentation/core-api/timekeeping.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/core-api/timekeeping.rst b/Documentation/core-api/timekeeping.rst index c0ffa30c7c37..729e24864fe7 100644 --- a/Documentation/core-api/timekeeping.rst +++ b/Documentation/core-api/timekeeping.rst @@ -154,9 +154,9 @@ architectures. These are the recommended replacements: Use ktime_get() or ktime_get_ts64() instead. -.. c:function:: struct timeval do_gettimeofday( void ) - struct timespec getnstimeofday( void ) - struct timespec64 getnstimeofday64( void ) +.. c:function:: void do_gettimeofday( struct timeval * ) + void getnstimeofday( struct timespec * ) + void getnstimeofday64( struct timespec64 * ) void ktime_get_real_ts( struct timespec * ) ktime_get_real_ts64() is a direct replacement, but consider using From 52338dfb3ca11e6a99288e9e9e4019f279822ddd Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 14 Apr 2020 10:24:30 -0700 Subject: [PATCH 241/331] docs: admin-guide: merge sections for the kernel.modprobe sysctl Documentation for the kernel.modprobe sysctl was added both by commit 0317c5371e6a ("docs: merge debugging-modules.txt into sysctl/kernel.rst") and by commit 6e7158250625 ("docs: admin-guide: document the kernel.modprobe sysctl"), resulting in the same sysctl being documented in two places. Merge these into one place. Signed-off-by: Eric Biggers Reviewed-by: Stephen Kitt Link: https://lore.kernel.org/r/20200414172430.230293-1-ebiggers@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/sysctl/kernel.rst | 47 +++++++++------------ 1 file changed, 19 insertions(+), 28 deletions(-) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 39c95c0e13d3..0d427fd10941 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -390,9 +390,17 @@ When ``kptr_restrict`` is set to 2, kernel pointers printed using modprobe ======== -This gives the full path of the modprobe command which the kernel will -use to load modules. This can be used to debug module loading -requests:: +The full path to the usermode helper for autoloading kernel modules, +by default "/sbin/modprobe". This binary is executed when the kernel +requests a module. For example, if userspace passes an unknown +filesystem type to mount(), then the kernel will automatically request +the corresponding filesystem module by executing this usermode helper. +This usermode helper should insert the needed module into the kernel. + +This sysctl only affects module autoloading. It has no effect on the +ability to explicitly insert modules. + +This sysctl can be used to debug module loading requests:: echo '#! /bin/sh' > /tmp/modprobe echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe @@ -400,10 +408,15 @@ requests:: chmod a+x /tmp/modprobe echo /tmp/modprobe > /proc/sys/kernel/modprobe -This only applies when the *kernel* is requesting that the module be -loaded; it won't have any effect if the module is being loaded -explicitly using ``modprobe`` from userspace. +Alternatively, if this sysctl is set to the empty string, then module +autoloading is completely disabled. The kernel will not try to +execute a usermode helper at all, nor will it call the +kernel_module_request LSM hook. +If CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration, +then the configured static usermode helper overrides this sysctl, +except that the empty string is still accepted to completely disable +module autoloading as described above. modules_disabled ================ @@ -446,28 +459,6 @@ Notes: successful IPC object allocation. If an IPC object allocation syscall fails, it is undefined if the value remains unmodified or is reset to -1. -modprobe: -========= - -The path to the usermode helper for autoloading kernel modules, by -default "/sbin/modprobe". This binary is executed when the kernel -requests a module. For example, if userspace passes an unknown -filesystem type to mount(), then the kernel will automatically request -the corresponding filesystem module by executing this usermode helper. -This usermode helper should insert the needed module into the kernel. - -This sysctl only affects module autoloading. It has no effect on the -ability to explicitly insert modules. - -If this sysctl is set to the empty string, then module autoloading is -completely disabled. The kernel will not try to execute a usermode -helper at all, nor will it call the kernel_module_request LSM hook. - -If CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration, -then the configured static usermode helper overrides this sysctl, -except that the empty string is still accepted to completely disable -module autoloading as described above. - nmi_watchdog ============ From e8f4ba833166d4f589ab80630709e807a09f0b43 Mon Sep 17 00:00:00 2001 From: Peter Maydell Date: Tue, 14 Apr 2020 15:37:43 +0100 Subject: [PATCH 242/331] scripts/kernel-doc: Add missing close-paren in c:function directives When kernel-doc generates a 'c:function' directive for a function one of whose arguments is a function pointer, it fails to print the close-paren after the argument list of the function pointer argument. For instance: long work_on_cpu(int cpu, long (*fn) (void *, void * arg) in driver-api/basics.html is missing a ')' separating the "void *" of the 'fn' arguments from the ", void * arg" which is an argument to work_on_cpu(). Add the missing close-paren, so that we render the prototype correctly: long work_on_cpu(int cpu, long (*fn)(void *), void * arg) (Note that Sphinx stops rendering a space between the '(fn*)' and the '(void *)' once it gets something that's syntactically valid.) Signed-off-by: Peter Maydell Link: https://lore.kernel.org/r/20200414143743.32677-1-peter.maydell@linaro.org Signed-off-by: Jonathan Corbet --- scripts/kernel-doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index f2d73f04e71d..f746ca8fa403 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -853,7 +853,7 @@ sub output_function_rst(%) { if ($type =~ m/([^\(]*\(\*)\s*\)\s*\(([^\)]*)\)/) { # pointer-to-function - print $1 . $parameter . ") (" . $2; + print $1 . $parameter . ") (" . $2 . ")"; } else { print $type . " " . $parameter; } From d98dbbe0d331b1a6dc1ca0b948c99d58cdba580c Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Tue, 14 Apr 2020 17:41:48 +0800 Subject: [PATCH 243/331] scripts: documentation-file-ref-check: Add line break before exit If execute ./scripts/documentation-file-ref-check in a directory which is not a git tree, it will exit without a line break, fix it. Without this patch: [loongson@localhost linux-5.7-rc1]$ ./scripts/documentation-file-ref-check Warning: can't check if file exists, as this is not a git tree[loongson@localhost linux-5.7-rc1]$ With this patch: [loongson@localhost linux-5.7-rc1]$ ./scripts/documentation-file-ref-check Warning: can't check if file exists, as this is not a git tree [loongson@localhost linux-5.7-rc1]$ Signed-off-by: Tiezhu Yang Reviewed-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/1586857308-2040-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Jonathan Corbet --- scripts/documentation-file-ref-check | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index 9a8cc10cffd0..c71832b2312b 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -25,7 +25,7 @@ my $fix = 0; my $warn = 0; if (! -d ".git") { - printf "Warning: can't check if file exists, as this is not a git tree"; + printf "Warning: can't check if file exists, as this is not a git tree\n"; exit 0; } From af15f14c8cfcee515f4e9078889045ad63efefe3 Mon Sep 17 00:00:00 2001 From: Ondrej Mosnacek Date: Tue, 14 Apr 2020 16:23:51 +0200 Subject: [PATCH 244/331] selinux: free str on error in str_read() In [see "Fixes:"] I missed the fact that str_read() may give back an allocated pointer even if it returns an error, causing a potential memory leak in filename_trans_read_one(). Fix this by making the function free the allocated string whenever it returns a non-zero value, which also makes its behavior more obvious and prevents repeating the same mistake in the future. Reported-by: coverity-bot Addresses-Coverity-ID: 1461665 ("Resource leaks") Fixes: c3a276111ea2 ("selinux: optimize storage of filename transitions") Signed-off-by: Ondrej Mosnacek Reviewed-by: Kees Cook Signed-off-by: Paul Moore --- security/selinux/ss/policydb.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c index 70ecdc78efbd..c21b922e5ebe 100644 --- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -1035,14 +1035,14 @@ static int str_read(char **strp, gfp_t flags, void *fp, u32 len) if (!str) return -ENOMEM; - /* it's expected the caller should free the str */ - *strp = str; - rc = next_entry(str, fp, len); - if (rc) + if (rc) { + kfree(str); return rc; + } str[len] = '\0'; + *strp = str; return 0; } From c3a2079828fae3ecc9c81e0751d19cbc678471f7 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Wed, 15 Apr 2020 12:57:04 -0500 Subject: [PATCH 245/331] dt-bindings: pwm: Fix cros-ec-pwm example dtc 'reg' warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The example for the CrOS EC PWM is incomplete and now generates a dtc warning: Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.example.dts:17.11-23.11: Warning (unit_address_vs_reg): /example-0/cros-ec@0: node has a unit name, but no reg or ranges property Fixing this results in more warnings as a parent spi node is needed as well. Cc: Thierry Reding Cc: Benson Leung Cc: Enric Balletbo i Serra Cc: Guenter Roeck Cc: linux-pwm@vger.kernel.org Acked-by: Uwe Kleine-König Signed-off-by: Rob Herring --- .../bindings/pwm/google,cros-ec-pwm.yaml | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.yaml b/Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.yaml index 24c217b76580..41ece1d85315 100644 --- a/Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.yaml +++ b/Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.yaml @@ -31,10 +31,17 @@ additionalProperties: false examples: - | - cros-ec@0 { - compatible = "google,cros-ec-spi"; - cros_ec_pwm: ec-pwm { - compatible = "google,cros-ec-pwm"; - #pwm-cells = <1>; + spi { + #address-cells = <1>; + #size-cells = <0>; + + cros-ec@0 { + compatible = "google,cros-ec-spi"; + reg = <0>; + + cros_ec_pwm: ec-pwm { + compatible = "google,cros-ec-pwm"; + #pwm-cells = <1>; + }; }; }; From 672e24772aeb45293c86f6176520d98b19cd48a1 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 16 Apr 2020 00:16:30 +0100 Subject: [PATCH 246/331] ipv6: remove redundant assignment to variable err The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King Signed-off-by: David S. Miller --- net/ipv6/seg6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 75421a472d25..4c7e0a27fa9c 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -434,7 +434,7 @@ static struct genl_family seg6_genl_family __ro_after_init = { int __init seg6_init(void) { - int err = -ENOMEM; + int err; err = genl_register_family(&seg6_genl_family); if (err) From c8322754642052b3580db8bc3c33fd671a41cdd6 Mon Sep 17 00:00:00 2001 From: Johan Jonker Date: Wed, 15 Apr 2020 22:01:49 +0200 Subject: [PATCH 247/331] dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400 The description below is already in use in 'rk3228-evb.dts', 'rk3229-xms6.dts' and 'rk3328.dtsi' but somehow never added to a document, so add "ethernet-phy-id1234.d400", "ethernet-phy-ieee802.3-c22" for ethernet-phy nodes on Rockchip platforms to 'ethernet-phy.yaml'. Signed-off-by: Johan Jonker Acked-by: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/ethernet-phy.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml index 8927941c74bb..5aa141ccc113 100644 --- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml @@ -43,6 +43,9 @@ properties: second group of digits is the Phy Identifier 2 register, this is the chip vendor OUI bits 19:24, followed by 10 bits of a vendor specific ID. + - items: + - pattern: "^ethernet-phy-id[a-f0-9]{4}\\.[a-f0-9]{4}$" + - const: ethernet-phy-ieee802.3-c22 - items: - pattern: "^ethernet-phy-id[a-f0-9]{4}\\.[a-f0-9]{4}$" - const: ethernet-phy-ieee802.3-c45 From ae5a44bb970ad8d0f7382cf3fc9738787e3cf19f Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Wed, 15 Apr 2020 16:42:48 +0800 Subject: [PATCH 248/331] net: tulip: make early_486_chipsets static Fix the following sparse warning: drivers/net/ethernet/dec/tulip/tulip_core.c:1280:28: warning: symbol 'early_486_chipsets' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Signed-off-by: David S. Miller --- drivers/net/ethernet/dec/tulip/tulip_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c index 48ea658aa1a6..15efc294f513 100644 --- a/drivers/net/ethernet/dec/tulip/tulip_core.c +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c @@ -1277,7 +1277,7 @@ static const struct net_device_ops tulip_netdev_ops = { #endif }; -const struct pci_device_id early_486_chipsets[] = { +static const struct pci_device_id early_486_chipsets[] = { { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82424) }, { PCI_DEVICE(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_496) }, { }, From 5309960e49f5e2363d2814488878a29e944e1be9 Mon Sep 17 00:00:00 2001 From: Cambda Zhu Date: Wed, 15 Apr 2020 17:54:04 +0800 Subject: [PATCH 249/331] Documentation: Fix tcp_challenge_ack_limit default value The default value of tcp_challenge_ack_limit has been changed from 100 to 1000 and this patch fixes its documentation. Signed-off-by: Cambda Zhu Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ee961d322d93..6fcfd313dbe4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -812,7 +812,7 @@ tcp_limit_output_bytes - INTEGER tcp_challenge_ack_limit - INTEGER Limits number of Challenge ACK sent per second, as recommended in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) - Default: 100 + Default: 1000 tcp_rx_skb_cache - BOOLEAN Controls a per TCP socket cache of one skb, that might help From edadedf1c5b4e4404192a0a4c3c0c05e3b7672ab Mon Sep 17 00:00:00 2001 From: Tuong Lien Date: Wed, 15 Apr 2020 18:34:49 +0700 Subject: [PATCH 250/331] tipc: fix incorrect increasing of link window In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we allow link window to change with the congestion avoidance algorithm. However, there is a bug that during the slow-start if packet retransmission occurs, the link will enter the fast-recovery phase, set its window to the 'ssthresh' which is never less than 300, so the link window suddenly increases to that limit instead of decreasing. Consequently, two issues have been observed: - For broadcast-link: it can leave a gap between the link queues that a new packet will be inserted and sent before the previous ones, i.e. not in-order. - For unicast: the algorithm does not work as expected, the link window jumps to the slow-start threshold whereas packet retransmission occurs. This commit fixes the issues by avoiding such the link window increase, but still decreasing if the 'ssthresh' is lowered. Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Acked-by: Jon Maloy Signed-off-by: Tuong Lien Signed-off-by: David S. Miller --- net/tipc/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 467c53a1fb5c..d4675e922a8f 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1065,7 +1065,7 @@ static void tipc_link_update_cwin(struct tipc_link *l, int released, /* Enter fast recovery */ if (unlikely(retransmitted)) { l->ssthresh = max_t(u16, l->window / 2, 300); - l->window = l->ssthresh; + l->window = min_t(u16, l->ssthresh, l->window); return; } /* Enter slow start */ From f560cda91bd59a872fe0e3217b74c3f33c131b50 Mon Sep 17 00:00:00 2001 From: Ronnie Sahlberg Date: Sun, 12 Apr 2020 16:09:26 +1000 Subject: [PATCH 251/331] cifs: dump the session id and keys also for SMB2 sessions We already dump these keys for SMB3, lets also dump it for SMB2 sessions so that we can use the session key in wireshark to check and validate that the signatures are correct. Signed-off-by: Ronnie Sahlberg Signed-off-by: Steve French Reviewed-by: Aurelien Aptel --- fs/cifs/smb2pdu.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 47d3e382ecaa..b30aa3cdd845 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1552,6 +1552,21 @@ SMB2_sess_auth_rawntlmssp_authenticate(struct SMB2_sess_data *sess_data) } rc = SMB2_sess_establish_session(sess_data); +#ifdef CONFIG_CIFS_DEBUG_DUMP_KEYS + if (ses->server->dialect < SMB30_PROT_ID) { + cifs_dbg(VFS, "%s: dumping generated SMB2 session keys\n", __func__); + /* + * The session id is opaque in terms of endianness, so we can't + * print it as a long long. we dump it as we got it on the wire + */ + cifs_dbg(VFS, "Session Id %*ph\n", (int)sizeof(ses->Suid), + &ses->Suid); + cifs_dbg(VFS, "Session Key %*ph\n", + SMB2_NTLMV2_SESSKEY_SIZE, ses->auth_key.response); + cifs_dbg(VFS, "Signing Key %*ph\n", + SMB3_SIGN_KEY_SIZE, ses->auth_key.response); + } +#endif out: kfree(ntlmssp_blob); SMB2_sess_free_buffer(sess_data); From 1f641d9410c3c4edd4ce9136bd2dbe0c00af9770 Mon Sep 17 00:00:00 2001 From: Jones Syue Date: Mon, 13 Apr 2020 09:37:23 +0800 Subject: [PATCH 252/331] cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Found a read performance issue when linux kernel page size is 64KB. If linux kernel page size is 64KB and mount options cache=strict & vers=2.1+, it does not support cifs_readpages(). Instead, it is using cifs_readpage() and cifs_read() with maximum read IO size 16KB, which is much slower than read IO size 1MB when negotiated SMB 2.1+. Since modern SMB server supported SMB 2.1+ and Max Read Size can reach more than 64KB (for example 1MB ~ 8MB), this patch check max_read instead of maxBuf to determine whether server support readpages() and improve read performance for page size 64KB & cache=strict & vers=2.1+, and for SMB1 it is more cleaner to initialize server->max_read to server->maxBuf. The client is a linux box with linux kernel 4.2.8, page size 64KB (CONFIG_ARM64_64K_PAGES=y), cpu arm 1.7GHz, and use mount.cifs as smb client. The server is another linux box with linux kernel 4.2.8, share a file '10G.img' with size 10GB, and use samba-4.7.12 as smb server. The client mount a share from the server with different cache options: cache=strict and cache=none, mount -tcifs ///Public /cache_strict -overs=3.0,cache=strict,username=,password= mount -tcifs ///Public /cache_none -overs=3.0,cache=none,username=,password= The client download a 10GbE file from the server across 1GbE network, dd if=/cache_strict/10G.img of=/dev/null bs=1M count=10240 dd if=/cache_none/10G.img of=/dev/null bs=1M count=10240 Found that cache=strict (without patch) is slower read throughput and smaller read IO size than cache=none. cache=strict (without patch): read throughput 40MB/s, read IO size is 16KB cache=strict (with patch): read throughput 113MB/s, read IO size is 1MB cache=none: read throughput 109MB/s, read IO size is 1MB Looks like if page size is 64KB, cifs_set_ops() would use cifs_addr_ops_smallbuf instead of cifs_addr_ops, /* check if server can support readpages */ if (cifs_sb_master_tcon(cifs_sb)->ses->server->maxBuf < PAGE_SIZE + MAX_CIFS_HDR_SIZE) inode->i_data.a_ops = &cifs_addr_ops_smallbuf; else inode->i_data.a_ops = &cifs_addr_ops; maxBuf is came from 2 places, SMB2_negotiate() and CIFSSMBNegotiate(), (SMB2_MAX_BUFFER_SIZE is 64KB) SMB2_negotiate(): /* set it to the maximum buffer size value we can send with 1 credit */ server->maxBuf = min_t(unsigned int, le32_to_cpu(rsp->MaxTransactSize),       SMB2_MAX_BUFFER_SIZE); CIFSSMBNegotiate(): server->maxBuf = le32_to_cpu(pSMBr->MaxBufferSize); Page size 64KB and cache=strict lead to read_pages() use cifs_readpage() instead of cifs_readpages(), and then cifs_read() using maximum read IO size 16KB, which is much slower than maximum read IO size 1MB. (CIFSMaxBufSize is 16KB by default) /* FIXME: set up handlers for larger reads and/or convert to async */ rsize = min_t(unsigned int, cifs_sb->rsize, CIFSMaxBufSize); Reviewed-by: Pavel Shilovsky Signed-off-by: Jones Syue Signed-off-by: Steve French --- fs/cifs/cifssmb.c | 4 ++++ fs/cifs/inode.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 140efc1a9374..182b864b3075 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -594,6 +594,8 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, NEGOTIATE_RSP *pSMBr) cifs_max_pending); set_credits(server, server->maxReq); server->maxBuf = le16_to_cpu(rsp->MaxBufSize); + /* set up max_read for readpages check */ + server->max_read = server->maxBuf; /* even though we do not use raw we might as well set this accurately, in case we ever find a need for it */ if ((le16_to_cpu(rsp->RawMode) & RAW_ENABLE) == RAW_ENABLE) { @@ -755,6 +757,8 @@ CIFSSMBNegotiate(const unsigned int xid, struct cifs_ses *ses) set_credits(server, server->maxReq); /* probably no need to store and check maxvcs */ server->maxBuf = le32_to_cpu(pSMBr->MaxBufferSize); + /* set up max_read for readpages check */ + server->max_read = server->maxBuf; server->max_rw = le32_to_cpu(pSMBr->MaxRawSize); cifs_dbg(NOISY, "Max buf = %d\n", ses->server->maxBuf); server->capabilities = le32_to_cpu(pSMBr->Capabilities); diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 8fbbdcdad8ff..390d2b15ef6e 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -61,7 +61,7 @@ static void cifs_set_ops(struct inode *inode) } /* check if server can support readpages */ - if (cifs_sb_master_tcon(cifs_sb)->ses->server->maxBuf < + if (cifs_sb_master_tcon(cifs_sb)->ses->server->max_read < PAGE_SIZE + MAX_CIFS_HDR_SIZE) inode->i_data.a_ops = &cifs_addr_ops_smallbuf; else From c2a559bc0e7ed5a715ad6b947025b33cb7c05ea7 Mon Sep 17 00:00:00 2001 From: yangerkun Date: Wed, 26 Feb 2020 12:10:02 +0800 Subject: [PATCH 253/331] ext4: use matching invalidatepage in ext4_writepage Run generic/388 with journal data mode sometimes may trigger the warning in ext4_invalidatepage. Actually, we should use the matching invalidatepage in ext4_writepage. Signed-off-by: yangerkun Signed-off-by: Theodore Ts'o Reviewed-by: Ritesh Harjani Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20200226041002.13914-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e416096fc081..68f6c0af8e5d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1973,7 +1973,7 @@ static int ext4_writepage(struct page *page, bool keep_towrite = false; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) { - ext4_invalidatepage(page, 0, PAGE_SIZE); + inode->i_mapping->a_ops->invalidatepage(page, 0, PAGE_SIZE); unlock_page(page); return -EIO; } From d87f639258a6a5980183f11876c884931ad93da2 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 28 Feb 2020 16:14:11 -0800 Subject: [PATCH 254/331] ext4: use non-movable memory for superblock readahead Since commit a8ac900b8163 ("ext4: use non-movable memory for the superblock") buffers for ext4 superblock were allocated using the sb_bread_unmovable() helper which allocated buffer heads out of non-movable memory blocks. It was necessarily to not block page migrations and do not cause cma allocation failures. However commit 85c8f176a611 ("ext4: preload block group descriptors") broke this by introducing pre-reading of the ext4 superblock. The problem is that __breadahead() is using __getblk() underneath, which allocates buffer heads out of movable memory. It resulted in page migration failures I've seen on a machine with an ext4 partition and a preallocated cma area. Fix this by introducing sb_breadahead_unmovable() and __breadahead_gfp() helpers which use non-movable memory for buffer head allocations and use them for the ext4 superblock readahead. Reviewed-by: Andreas Dilger Fixes: 85c8f176a611 ("ext4: preload block group descriptors") Signed-off-by: Roman Gushchin Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.com Signed-off-by: Theodore Ts'o --- fs/buffer.c | 11 +++++++++++ fs/ext4/inode.c | 2 +- fs/ext4/super.c | 2 +- include/linux/buffer_head.h | 8 ++++++++ 4 files changed, 21 insertions(+), 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index f73276d746bb..599a0bf7257b 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1371,6 +1371,17 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size) } EXPORT_SYMBOL(__breadahead); +void __breadahead_gfp(struct block_device *bdev, sector_t block, unsigned size, + gfp_t gfp) +{ + struct buffer_head *bh = __getblk_gfp(bdev, block, size, gfp); + if (likely(bh)) { + ll_rw_block(REQ_OP_READ, REQ_RAHEAD, 1, &bh); + brelse(bh); + } +} +EXPORT_SYMBOL(__breadahead_gfp); + /** * __bread_gfp() - reads a specified block and returns the bh * @bdev: the block_device to read from diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 68f6c0af8e5d..2a4aae6acdcb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4364,7 +4364,7 @@ make_io: if (end > table) end = table; while (b <= end) - sb_breadahead(sb, b++); + sb_breadahead_unmovable(sb, b++); } /* diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9728e7b0e84f..83413f0f1e28 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4340,7 +4340,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) /* Pre-read the descriptors into the buffer cache */ for (i = 0; i < db_count; i++) { block = descriptor_loc(sb, logical_sb_block, i); - sb_breadahead(sb, block); + sb_breadahead_unmovable(sb, block); } for (i = 0; i < db_count; i++) { diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index e0b020eaf32e..15b765a181b8 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -189,6 +189,8 @@ struct buffer_head *__getblk_gfp(struct block_device *bdev, sector_t block, void __brelse(struct buffer_head *); void __bforget(struct buffer_head *); void __breadahead(struct block_device *, sector_t block, unsigned int size); +void __breadahead_gfp(struct block_device *, sector_t block, unsigned int size, + gfp_t gfp); struct buffer_head *__bread_gfp(struct block_device *, sector_t block, unsigned size, gfp_t gfp); void invalidate_bh_lrus(void); @@ -319,6 +321,12 @@ sb_breadahead(struct super_block *sb, sector_t block) __breadahead(sb->s_bdev, block, sb->s_blocksize); } +static inline void +sb_breadahead_unmovable(struct super_block *sb, sector_t block) +{ + __breadahead_gfp(sb->s_bdev, block, sb->s_blocksize, 0); +} + static inline struct buffer_head * sb_getblk(struct super_block *sb, sector_t block) { From 9033783c8cfda0834cf384940162e2bf1e9a6db7 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 29 Mar 2020 13:21:41 -0700 Subject: [PATCH 255/331] ext4: fix return-value types in several function comments The documentation comments for ext4_read_block_bitmap_nowait and ext4_read_inode_bitmap describe them as returning NULL on error, but they return an ERR_PTR on error; update the documentation to match. The documentation comment for ext4_wait_block_bitmap describes it as returning 1 on error, but it returns -errno on error; update the documentation to match. Signed-off-by: Josh Triplett Reviewed-by: Ritesh Harani Link: https://lore.kernel.org/r/60a3f4996f4932c45515aaa6b75ca42f2a78ec9b.1585512514.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o --- fs/ext4/balloc.c | 4 ++-- fs/ext4/ialloc.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 0e0a4d6209c7..a32e5f7b5385 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -410,7 +410,7 @@ verified: * Read the bitmap for a given block_group,and validate the * bits for block/inode/inode tables are set in the bitmaps * - * Return buffer_head on success or NULL in case of failure. + * Return buffer_head on success or an ERR_PTR in case of failure. */ struct buffer_head * ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) @@ -502,7 +502,7 @@ out: return ERR_PTR(err); } -/* Returns 0 on success, 1 on error */ +/* Returns 0 on success, -errno on error */ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group, struct buffer_head *bh) { diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index b420c9dc444d..9faaf32be5cc 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -113,7 +113,7 @@ verified: * Read the inode allocation bitmap for a given block_group, reading * into the specified slot in the superblock's bitmap cache. * - * Return buffer_head of bitmap on success or NULL. + * Return buffer_head of bitmap on success, or an ERR_PTR on error. */ static struct buffer_head * ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) From 801674f34ecfed033b062a0f217506b93c8d5e8a Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 31 Mar 2020 12:50:16 +0200 Subject: [PATCH 256/331] ext4: do not zeroout extents beyond i_disksize We do not want to create initialized extents beyond end of file because for e2fsck it is impossible to distinguish them from a case of corrupted file size / extent tree and so it complains like: Inode 12, i_size is 147456, should be 163840. Fix? no Code in ext4_ext_convert_to_initialized() and ext4_split_convert_extents() try to make sure it does not create initialized extents beyond inode size however they check against inode->i_size which is wrong. They should instead check against EXT4_I(inode)->i_disksize which is the current inode size on disk. That's what e2fsck is going to see in case of crash before all dirty data is written. This bug manifests as generic/456 test failure (with recent enough fstests where fsx got fixed to properly pass FALLOC_KEEP_SIZE_FL flags to the kernel) when run with dioread_lock mount option. CC: stable@vger.kernel.org Fixes: 21ca087a3891 ("ext4: Do not zero out uninitialized extents beyond i_size") Reviewed-by: Lukas Czerner Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Link: https://lore.kernel.org/r/20200331105016.8674-1-jack@suse.cz Signed-off-by: Theodore Ts'o --- fs/ext4/extents.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 031752cfb6f7..f2b577b315a0 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3374,8 +3374,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, (unsigned long long)map->m_lblk, map_len); sbi = EXT4_SB(inode->i_sb); - eof_block = (inode->i_size + inode->i_sb->s_blocksize - 1) >> - inode->i_sb->s_blocksize_bits; + eof_block = (EXT4_I(inode)->i_disksize + inode->i_sb->s_blocksize - 1) + >> inode->i_sb->s_blocksize_bits; if (eof_block < map->m_lblk + map_len) eof_block = map->m_lblk + map_len; @@ -3627,8 +3627,8 @@ static int ext4_split_convert_extents(handle_t *handle, __func__, inode->i_ino, (unsigned long long)map->m_lblk, map->m_len); - eof_block = (inode->i_size + inode->i_sb->s_blocksize - 1) >> - inode->i_sb->s_blocksize_bits; + eof_block = (EXT4_I(inode)->i_disksize + inode->i_sb->s_blocksize - 1) + >> inode->i_sb->s_blocksize_bits; if (eof_block < map->m_lblk + map->m_len) eof_block = map->m_lblk + map->m_len; /* From 05ca87c149ae8078fb2a23adc6329eed5bb078fb Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Thu, 2 Apr 2020 11:39:39 +0800 Subject: [PATCH 257/331] ext4: remove set but not used variable 'es' Fix the following gcc warning: fs/ext4/super.c:599:27: warning: variable 'es' set but not used [-Wunused-but-set-variable] struct ext4_super_block *es; ^~ Fixes: 2ea2fc775321 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Hulk Robot Signed-off-by: Jason Yan Link: https://lore.kernel.org/r/20200402033939.25303-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o --- fs/ext4/super.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 83413f0f1e28..bf5fcb477f66 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -596,7 +596,6 @@ void __ext4_error_file(struct file *file, const char *function, { va_list args; struct va_format vaf; - struct ext4_super_block *es; struct inode *inode = file_inode(file); char pathname[80], *path; @@ -604,7 +603,6 @@ void __ext4_error_file(struct file *file, const char *function, return; trace_ext4_error(inode->i_sb, function, line); - es = EXT4_SB(inode->i_sb)->s_es; if (ext4_error_ratelimit(inode->i_sb)) { path = file_path(file, pathname, sizeof(pathname)); if (IS_ERR(path)) From 648814111af26485762a22da0f4b3159f3f9632c Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Thu, 2 Apr 2020 11:47:59 +0800 Subject: [PATCH 258/331] ext4: remove set but not used variable 'es' in ext4_jbd2.c Fix the following gcc warning: fs/ext4/ext4_jbd2.c:341:30: warning: variable 'es' set but not used [-Wunused-but-set-variable] struct ext4_super_block *es; ^~ Fixes: 2ea2fc775321 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Hulk Robot Signed-off-by: Jason Yan Link: https://lore.kernel.org/r/20200402034759.29957-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o --- fs/ext4/ext4_jbd2.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7f16e1af8d5c..0c76cdd44d90 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -338,9 +338,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, if (inode && inode_needs_sync(inode)) { sync_dirty_buffer(bh); if (buffer_req(bh) && !buffer_uptodate(bh)) { - struct ext4_super_block *es; - - es = EXT4_SB(inode->i_sb)->s_es; ext4_error_inode_err(inode, where, line, bh->b_blocknr, EIO, "IO error syncing itable block"); From a17a9d935dc4a50acefaf319d58030f1da7f115a Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Mon, 13 Apr 2020 22:30:52 -0400 Subject: [PATCH 259/331] ext4: increase wait time needed before reuse of deleted inode numbers Current wait times have proven to be too short to protect against inode reuses that lead to metadata inconsistencies. Now that we will retry the inode allocation if we can't find any recently deleted inodes, it's a lot safer to increase the recently deleted time from 5 seconds to a minute. Link: https://lore.kernel.org/r/20200414023925.273867-1-tytso@mit.edu Google-Bug-Id: 36602237 Signed-off-by: Theodore Ts'o --- fs/ext4/ialloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 9faaf32be5cc..4b8c9a9bdf0c 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -662,7 +662,7 @@ static int find_group_other(struct super_block *sb, struct inode *parent, * block has been written back to disk. (Yes, these values are * somewhat arbitrary...) */ -#define RECENTCY_MIN 5 +#define RECENTCY_MIN 60 #define RECENTCY_DIRTY 300 static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino) From 907ea529fc4c3296701d2bfc8b831dd2a8121a34 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Mon, 13 Apr 2020 23:33:05 -0400 Subject: [PATCH 260/331] ext4: convert BUG_ON's to WARN_ON's in mballoc.c If the in-core buddy bitmap gets corrupted (or out of sync with the block bitmap), issue a WARN_ON and try to recover. In most cases this involves skipping trying to allocate out of a particular block group. We can end up declaring the file system corrupted, which is fair, since the file system probably should be checked before we proceed any further. Link: https://lore.kernel.org/r/20200414035649.293164-1-tytso@mit.edu Google-Bug-Id: 34811296 Google-Bug-Id: 34639169 Signed-off-by: Theodore Ts'o --- fs/ext4/mballoc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 87c85be4c12e..30d5d97548c4 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1943,7 +1943,8 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, int free; free = e4b->bd_info->bb_free; - BUG_ON(free <= 0); + if (WARN_ON(free <= 0)) + return; i = e4b->bd_info->bb_first_free; @@ -1966,7 +1967,8 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, } mb_find_extent(e4b, i, ac->ac_g_ex.fe_len, &ex); - BUG_ON(ex.fe_len <= 0); + if (WARN_ON(ex.fe_len <= 0)) + break; if (free < ex.fe_len) { ext4_grp_locked_error(sb, e4b->bd_group, 0, 0, "%d free clusters as per " From 4fa3b1c417377c352208ee9f487e17cfcee32348 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 15 Apr 2020 12:37:27 -0500 Subject: [PATCH 261/331] proc: Handle umounts cleanly syzbot writes: > KASAN: use-after-free Read in dput (2) > > proc_fill_super: allocate dentry failed > ================================================================== > BUG: KASAN: use-after-free in fast_dput fs/dcache.c:727 [inline] > BUG: KASAN: use-after-free in dput+0x53e/0xdf0 fs/dcache.c:846 > Read of size 4 at addr ffff88808a618cf0 by task syz-executor.0/8426 > > CPU: 0 PID: 8426 Comm: syz-executor.0 Not tainted 5.6.0-next-20200412-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x188/0x20d lib/dump_stack.c:118 > print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382 > __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511 > kasan_report+0x33/0x50 mm/kasan/common.c:625 > fast_dput fs/dcache.c:727 [inline] > dput+0x53e/0xdf0 fs/dcache.c:846 > proc_kill_sb+0x73/0xf0 fs/proc/root.c:195 > deactivate_locked_super+0x8c/0xf0 fs/super.c:335 > vfs_get_super+0x258/0x2d0 fs/super.c:1212 > vfs_get_tree+0x89/0x2f0 fs/super.c:1547 > do_new_mount fs/namespace.c:2813 [inline] > do_mount+0x1306/0x1b30 fs/namespace.c:3138 > __do_sys_mount fs/namespace.c:3347 [inline] > __se_sys_mount fs/namespace.c:3324 [inline] > __x64_sys_mount+0x18f/0x230 fs/namespace.c:3324 > do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 > entry_SYSCALL_64_after_hwframe+0x49/0xb3 > RIP: 0033:0x45c889 > Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007ffc1930ec48 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 > RAX: ffffffffffffffda RBX: 0000000001324914 RCX: 000000000045c889 > RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000000000000 > RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > R13: 0000000000000749 R14: 00000000004ca15a R15: 0000000000000013 Looking at the code now that it the internal mount of proc is no longer used it is possible to unmount proc. If proc is unmounted the fields of the pid namespace that were used for filesystem specific state are not reinitialized. Which means that proc_self and proc_thread_self can be pointers to already freed dentries. The reported user after free appears to be from mounting and unmounting proc followed by mounting proc again and using error injection to cause the new root dentry allocation to fail. This in turn results in proc_kill_sb running with proc_self and proc_thread_self still retaining their values from the previous mount of proc. Then calling dput on either proc_self of proc_thread_self will result in double put. Which KASAN sees as a use after free. Solve this by always reinitializing the filesystem state stored in the struct pid_namespace, when proc is unmounted. Reported-by: syzbot+72868dd424eb66c6b95f@syzkaller.appspotmail.com Acked-by: Christian Brauner Fixes: 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") Signed-off-by: "Eric W. Biederman" --- fs/proc/root.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/proc/root.c b/fs/proc/root.c index 2633f10446c3..cdbe9293ea55 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -196,6 +196,13 @@ static void proc_kill_sb(struct super_block *sb) if (ns->proc_thread_self) dput(ns->proc_thread_self); kill_anon_super(sb); + + /* Make the pid namespace safe for the next mount of proc */ + ns->proc_self = NULL; + ns->proc_thread_self = NULL; + ns->pid_gid = GLOBAL_ROOT_GID; + ns->hide_pid = 0; + put_pid_ns(ns); } From 92f673a12d14b5393138d2b1cfeb41d72b47362d Mon Sep 17 00:00:00 2001 From: Ben Skeggs Date: Thu, 16 Apr 2020 15:26:01 +1000 Subject: [PATCH 262/331] drm/nouveau/sec2/gv100-: add missing MODULE_FIRMWARE() ASB was failing to load on Turing GPUs when firmware is being loaded from initramfs, leaving the GPU in an odd state and causing suspend/ resume to fail. Add missing MODULE_FIRMWARE() lines for initramfs generators. Signed-off-by: Ben Skeggs Cc: # 5.6 --- drivers/gpu/drm/nouveau/nvkm/engine/sec2/gp108.c | 3 +++ drivers/gpu/drm/nouveau/nvkm/engine/sec2/tu102.c | 16 ++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/sec2/gp108.c b/drivers/gpu/drm/nouveau/nvkm/engine/sec2/gp108.c index 232a9d7c51e5..e770c9497871 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/sec2/gp108.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/sec2/gp108.c @@ -25,6 +25,9 @@ MODULE_FIRMWARE("nvidia/gp108/sec2/desc.bin"); MODULE_FIRMWARE("nvidia/gp108/sec2/image.bin"); MODULE_FIRMWARE("nvidia/gp108/sec2/sig.bin"); +MODULE_FIRMWARE("nvidia/gv100/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/gv100/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/gv100/sec2/sig.bin"); static const struct nvkm_sec2_fwif gp108_sec2_fwif[] = { diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/sec2/tu102.c b/drivers/gpu/drm/nouveau/nvkm/engine/sec2/tu102.c index b6ebd95c9ba1..a8295653ceab 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/sec2/tu102.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/sec2/tu102.c @@ -56,6 +56,22 @@ tu102_sec2_nofw(struct nvkm_sec2 *sec2, int ver, return 0; } +MODULE_FIRMWARE("nvidia/tu102/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/tu102/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/tu102/sec2/sig.bin"); +MODULE_FIRMWARE("nvidia/tu104/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/tu104/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/tu104/sec2/sig.bin"); +MODULE_FIRMWARE("nvidia/tu106/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/tu106/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/tu106/sec2/sig.bin"); +MODULE_FIRMWARE("nvidia/tu116/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/tu116/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/tu116/sec2/sig.bin"); +MODULE_FIRMWARE("nvidia/tu117/sec2/desc.bin"); +MODULE_FIRMWARE("nvidia/tu117/sec2/image.bin"); +MODULE_FIRMWARE("nvidia/tu117/sec2/sig.bin"); + static const struct nvkm_sec2_fwif tu102_sec2_fwif[] = { { 0, gp102_sec2_load, &tu102_sec2, &gp102_sec2_acr_1 }, From 96806229ca033f85310bc5c203410189f8a1d2ee Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 10 Apr 2020 11:13:26 +0100 Subject: [PATCH 263/331] irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling When a vPE is made resident, the GIC starts parsing the virtual pending table to deliver pending interrupts. This takes place asynchronously, and can at times take a long while. Long enough that the vcpu enters the guest and hits WFI before any interrupt has been signaled yet. The vcpu then exits, blocks, and now gets a doorbell. Rince, repeat. In order to avoid the above, a (optional on GICv4, mandatory on v4.1) feature allows the GIC to feedback to the hypervisor whether it is done parsing the VPT by clearing the GICR_VPENDBASER.Dirty bit. The hypervisor can then wait until the GIC is ready before actually running the vPE. Plug the detection code as well as polling on vPE schedule. While at it, tidy-up the kernel message that displays the GICv4 optional features. Reviewed-by: Zenghui Yu Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3-its.c | 19 +++++++++++++++++++ drivers/irqchip/irq-gic-v3.c | 11 +++++++---- include/linux/irqchip/arm-gic-v3.h | 2 ++ 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 54d142ccc63a..affd325cc3d4 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -3672,6 +3673,20 @@ out: return IRQ_SET_MASK_OK_DONE; } +static void its_wait_vpt_parse_complete(void) +{ + void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); + u64 val; + + if (!gic_rdists->has_vpend_valid_dirty) + return; + + WARN_ON_ONCE(readq_relaxed_poll_timeout(vlpi_base + GICR_VPENDBASER, + val, + !(val & GICR_VPENDBASER_Dirty), + 10, 500)); +} + static void its_vpe_schedule(struct its_vpe *vpe) { void __iomem *vlpi_base = gic_data_rdist_vlpi_base(); @@ -3702,6 +3717,8 @@ static void its_vpe_schedule(struct its_vpe *vpe) val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0; val |= GICR_VPENDBASER_Valid; gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + its_wait_vpt_parse_complete(); } static void its_vpe_deschedule(struct its_vpe *vpe) @@ -3910,6 +3927,8 @@ static void its_vpe_4_1_schedule(struct its_vpe *vpe, val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id); gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + its_wait_vpt_parse_complete(); } static void its_vpe_4_1_deschedule(struct its_vpe *vpe, diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 9dbc81b6f62e..d7006ef18a0d 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -873,6 +873,7 @@ static int __gic_update_rdist_properties(struct redist_region *region, gic_data.rdists.has_rvpeid &= !!(typer & GICR_TYPER_RVPEID); gic_data.rdists.has_direct_lpi &= (!!(typer & GICR_TYPER_DirectLPIS) | gic_data.rdists.has_rvpeid); + gic_data.rdists.has_vpend_valid_dirty &= !!(typer & GICR_TYPER_DIRTY); /* Detect non-sensical configurations */ if (WARN_ON_ONCE(gic_data.rdists.has_rvpeid && !gic_data.rdists.has_vlpis)) { @@ -893,10 +894,11 @@ static void gic_update_rdist_properties(void) if (WARN_ON(gic_data.ppi_nr == UINT_MAX)) gic_data.ppi_nr = 0; pr_info("%d PPIs implemented\n", gic_data.ppi_nr); - pr_info("%sVLPI support, %sdirect LPI support, %sRVPEID support\n", - !gic_data.rdists.has_vlpis ? "no " : "", - !gic_data.rdists.has_direct_lpi ? "no " : "", - !gic_data.rdists.has_rvpeid ? "no " : ""); + if (gic_data.rdists.has_vlpis) + pr_info("GICv4 features: %s%s%s\n", + gic_data.rdists.has_direct_lpi ? "DirectLPI " : "", + gic_data.rdists.has_rvpeid ? "RVPEID " : "", + gic_data.rdists.has_vpend_valid_dirty ? "Valid+Dirty " : ""); } /* Check whether it's single security state view */ @@ -1620,6 +1622,7 @@ static int __init gic_init_bases(void __iomem *dist_base, gic_data.rdists.has_rvpeid = true; gic_data.rdists.has_vlpis = true; gic_data.rdists.has_direct_lpi = true; + gic_data.rdists.has_vpend_valid_dirty = true; if (WARN_ON(!gic_data.domain) || WARN_ON(!gic_data.rdists.rdist)) { err = -ENOMEM; diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 765d9b769b69..6c36b6cc3edf 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -243,6 +243,7 @@ #define GICR_TYPER_PLPIS (1U << 0) #define GICR_TYPER_VLPIS (1U << 1) +#define GICR_TYPER_DIRTY (1U << 2) #define GICR_TYPER_DirectLPIS (1U << 3) #define GICR_TYPER_LAST (1U << 4) #define GICR_TYPER_RVPEID (1U << 7) @@ -686,6 +687,7 @@ struct rdists { bool has_vlpis; bool has_rvpeid; bool has_direct_lpi; + bool has_vpend_valid_dirty; }; struct irq_domain; From 4b2dfe1e7799d0e20b55711dfcc45d2ad35ff46e Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 10 Apr 2020 12:11:39 +0100 Subject: [PATCH 264/331] irqchip/gic-v4.1: Update effective affinity of virtual SGIs Although the vSGIs are not directly visible to the host, they still get moved around by the CPU hotplug, for example. This results in the kernel moaning on the console, such as: genirq: irq_chip GICv4.1-sgi did not update eff. affinity mask of irq 38 Updating the effective affinity on set_affinity() fixes it. Reviewed-by: Zenghui Yu Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3-its.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index affd325cc3d4..124251b0ccba 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -4054,6 +4054,7 @@ static int its_sgi_set_affinity(struct irq_data *d, * not on the host (since they can only be targetting a vPE). * Tell the kernel we've done whatever it asked for. */ + irq_data_update_effective_affinity(d, mask_val); return IRQ_SET_MASK_OK; } From 94d440d618467806009c8edc70b094d64e12ee5a Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Sat, 11 Apr 2020 08:40:31 -0700 Subject: [PATCH 265/331] proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets Michael Kerrisk suggested to replace numeric clock IDs with symbolic names. Now the content of these files looks like this: $ cat /proc/774/timens_offsets monotonic 864000 0 boottime 1728000 0 For setting offsets, both representations of clocks (numeric and symbolic) can be used. As for compatibility, it is acceptable to change things as long as userspace doesn't care. The format of timens_offsets files is very new and there are no userspace tools yet which rely on this format. But three projects crun, util-linux and criu rely on the interface of setting time offsets and this is why it's required to continue supporting the numeric clock IDs on write. Fixes: 04a8682a71be ("fs/proc: Introduce /proc/pid/timens_offsets") Suggested-by: Michael Kerrisk Signed-off-by: Andrei Vagin Signed-off-by: Thomas Gleixner Tested-by: Michael Kerrisk Acked-by: Michael Kerrisk Cc: Andrew Morton Cc: Eric W. Biederman Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200411154031.642557-1-avagin@gmail.com --- fs/proc/base.c | 14 +++++++++++++- kernel/time/namespace.c | 15 ++++++++++++++- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 6042b646ab27..572898dd16a0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1573,6 +1573,7 @@ static ssize_t timens_offsets_write(struct file *file, const char __user *buf, noffsets = 0; for (pos = kbuf; pos; pos = next_line) { struct proc_timens_offset *off = &offsets[noffsets]; + char clock[10]; int err; /* Find the end of line and ensure we don't look past it */ @@ -1584,10 +1585,21 @@ static ssize_t timens_offsets_write(struct file *file, const char __user *buf, next_line = NULL; } - err = sscanf(pos, "%u %lld %lu", &off->clockid, + err = sscanf(pos, "%9s %lld %lu", clock, &off->val.tv_sec, &off->val.tv_nsec); if (err != 3 || off->val.tv_nsec >= NSEC_PER_SEC) goto out; + + clock[sizeof(clock) - 1] = 0; + if (strcmp(clock, "monotonic") == 0 || + strcmp(clock, __stringify(CLOCK_MONOTONIC)) == 0) + off->clockid = CLOCK_MONOTONIC; + else if (strcmp(clock, "boottime") == 0 || + strcmp(clock, __stringify(CLOCK_BOOTTIME)) == 0) + off->clockid = CLOCK_BOOTTIME; + else + goto out; + noffsets++; if (noffsets == ARRAY_SIZE(offsets)) { if (next_line) diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 3b30288793fe..53bce347cd50 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -338,7 +338,20 @@ static struct user_namespace *timens_owner(struct ns_common *ns) static void show_offset(struct seq_file *m, int clockid, struct timespec64 *ts) { - seq_printf(m, "%d %lld %ld\n", clockid, ts->tv_sec, ts->tv_nsec); + char *clock; + + switch (clockid) { + case CLOCK_BOOTTIME: + clock = "boottime"; + break; + case CLOCK_MONOTONIC: + clock = "monotonic"; + break; + default: + clock = "unknown"; + break; + } + seq_printf(m, "%-10s %10lld %9ld\n", clock, ts->tv_sec, ts->tv_nsec); } void proc_timens_show_offsets(struct task_struct *p, struct seq_file *m) From 5fe56de799ad03e92d794c7936bf363922b571df Mon Sep 17 00:00:00 2001 From: John Garry Date: Thu, 16 Apr 2020 19:18:51 +0800 Subject: [PATCH 266/331] blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget If in blk_mq_dispatch_rq_list() we find no budget, then we break of the dispatch loop, but the request may keep the driver tag, evaulated in 'nxt' in the previous loop iteration. Fix by putting the driver tag for that request. Reviewed-by: Ming Lei Signed-off-by: John Garry Signed-off-by: Jens Axboe --- block/blk-mq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 8e56884fd2e9..a7785df2c944 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1222,8 +1222,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, rq = list_first_entry(list, struct request, queuelist); hctx = rq->mq_hctx; - if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) + if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) { + blk_mq_put_driver_tag(rq); break; + } if (!blk_mq_get_driver_tag(rq)) { /* From f0f7a674d4df1510d8ca050a669e1420cf7d7fab Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Sun, 12 Apr 2020 13:11:10 -0700 Subject: [PATCH 267/331] xfs: move inode flush to the sync workqueue Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions. Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space. Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster --- fs/xfs/xfs_mount.h | 6 +++++- fs/xfs/xfs_super.c | 40 ++++++++++++++++++++++------------------ 2 files changed, 27 insertions(+), 19 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 50c43422fa17..b2e4598fdf7d 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -167,8 +167,12 @@ typedef struct xfs_mount { struct xfs_kobj m_error_meta_kobj; struct xfs_error_cfg m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX]; struct xstats m_stats; /* per-fs stats */ - struct ratelimit_state m_flush_inodes_ratelimit; + /* + * Workqueue item so that we can coalesce multiple inode flush attempts + * into a single flush. + */ + struct work_struct m_flush_inodes_work; struct workqueue_struct *m_buf_workqueue; struct workqueue_struct *m_unwritten_workqueue; struct workqueue_struct *m_cil_workqueue; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index abf06bf9c3f3..424bb9a2d532 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -516,6 +516,20 @@ xfs_destroy_mount_workqueues( destroy_workqueue(mp->m_buf_workqueue); } +static void +xfs_flush_inodes_worker( + struct work_struct *work) +{ + struct xfs_mount *mp = container_of(work, struct xfs_mount, + m_flush_inodes_work); + struct super_block *sb = mp->m_super; + + if (down_read_trylock(&sb->s_umount)) { + sync_inodes_sb(sb); + up_read(&sb->s_umount); + } +} + /* * Flush all dirty data to disk. Must not be called while holding an XFS_ILOCK * or a page lock. We use sync_inodes_sb() here to ensure we block while waiting @@ -526,15 +540,15 @@ void xfs_flush_inodes( struct xfs_mount *mp) { - struct super_block *sb = mp->m_super; - - if (!__ratelimit(&mp->m_flush_inodes_ratelimit)) + /* + * If flush_work() returns true then that means we waited for a flush + * which was already in progress. Don't bother running another scan. + */ + if (flush_work(&mp->m_flush_inodes_work)) return; - if (down_read_trylock(&sb->s_umount)) { - sync_inodes_sb(sb); - up_read(&sb->s_umount); - } + queue_work(mp->m_sync_workqueue, &mp->m_flush_inodes_work); + flush_work(&mp->m_flush_inodes_work); } /* Catch misguided souls that try to use this interface on XFS */ @@ -1369,17 +1383,6 @@ xfs_fc_fill_super( if (error) goto out_free_names; - /* - * Cap the number of invocations of xfs_flush_inodes to 16 for every - * quarter of a second. The magic numbers here were determined by - * observation neither to cause stalls in writeback when there are a - * lot of IO threads and the fs is near ENOSPC, nor cause any fstest - * regressions. YMMV. - */ - ratelimit_state_init(&mp->m_flush_inodes_ratelimit, HZ / 4, 16); - ratelimit_set_flags(&mp->m_flush_inodes_ratelimit, - RATELIMIT_MSG_ON_RELEASE); - error = xfs_init_mount_workqueues(mp); if (error) goto out_close_devices; @@ -1752,6 +1755,7 @@ static int xfs_init_fs_context( spin_lock_init(&mp->m_perag_lock); mutex_init(&mp->m_growlock); atomic_set(&mp->m_active_trans, 0); + INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker); INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker); From 1f2ef049cb11c68134ce699f749f16ca8d34468e Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Thu, 16 Apr 2020 14:35:40 +0800 Subject: [PATCH 268/331] ahci: Add Intel Comet Lake PCH-U PCI ID Add Intel Comet Lake PCH-U PCI ID to the list of supported controllers. Set default SATA LPM so the SoC can enter S0ix. Signed-off-by: Kai-Heng Feng Signed-off-by: Jens Axboe --- drivers/ata/ahci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 0101b65250cb..0c0a736eb861 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -410,6 +410,7 @@ static const struct pci_device_id ahci_pci_tbl[] = { { PCI_VDEVICE(INTEL, 0x22a3), board_ahci_mobile }, /* Cherry Tr. AHCI */ { PCI_VDEVICE(INTEL, 0x5ae3), board_ahci_mobile }, /* ApolloLake AHCI */ { PCI_VDEVICE(INTEL, 0x34d3), board_ahci_mobile }, /* Ice Lake LP AHCI */ + { PCI_VDEVICE(INTEL, 0x02d3), board_ahci_mobile }, /* Comet Lake PCH-U AHCI */ { PCI_VDEVICE(INTEL, 0x02d7), board_ahci_mobile }, /* Comet Lake PCH RAID */ /* JMicron 360/1/3/5/6, match class to avoid IDE function */ From 86d32f9a7c54ad74f4514d7fef7c847883207291 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Tue, 14 Apr 2020 21:33:16 +0100 Subject: [PATCH 269/331] keys: Fix proc_keys_next to increase position index If seq_file .next function does not change position index, read after some lseek can generate unexpected output: $ dd if=/proc/keys bs=1 # full usual output 0f6bfdf5 I--Q--- 2 perm 3f010000 1000 1000 user 4af2f79ab8848d0a: 740 1fb91b32 I--Q--- 3 perm 1f3f0000 1000 65534 keyring _uid.1000: 2 27589480 I--Q--- 1 perm 0b0b0000 0 0 user invocation_id: 16 2f33ab67 I--Q--- 152 perm 3f030000 0 0 keyring _ses: 2 33f1d8fa I--Q--- 4 perm 3f030000 1000 1000 keyring _ses: 1 3d427fda I--Q--- 2 perm 3f010000 1000 1000 user 69ec44aec7678e5a: 740 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 521+0 records in 521+0 records out 521 bytes copied, 0,00123769 s, 421 kB/s But a read after lseek in middle of last line results in the partial last line and then a repeat of the final line: $ dd if=/proc/keys bs=500 skip=1 dd: /proc/keys: cannot skip to specified offset g _uid_ses.1000: 1 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 0+1 records in 0+1 records out 97 bytes copied, 0,000135035 s, 718 kB/s and a read after lseek beyond end of file results in the last line being shown: $ dd if=/proc/keys bs=1000 skip=1 # read after lseek beyond end of file dd: /proc/keys: cannot skip to specified offset 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 0+1 records in 0+1 records out 76 bytes copied, 0,000119981 s, 633 kB/s See https://bugzilla.kernel.org/show_bug.cgi?id=206283 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...") Signed-off-by: Vasily Averin Signed-off-by: David Howells Reviewed-by: Jarkko Sakkinen Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- security/keys/proc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/security/keys/proc.c b/security/keys/proc.c index 415f3f1c2da0..d0cde6685627 100644 --- a/security/keys/proc.c +++ b/security/keys/proc.c @@ -139,6 +139,8 @@ static void *proc_keys_next(struct seq_file *p, void *v, loff_t *_pos) n = key_serial_next(p, v); if (n) *_pos = key_node_serial(n); + else + (*_pos)++; return n; } From 9692ea9d3288a201df762868a52552b2e07e1c55 Mon Sep 17 00:00:00 2001 From: Steve French Date: Wed, 15 Apr 2020 01:12:34 -0500 Subject: [PATCH 270/331] smb3: remove overly noisy debug line in signing errors A dump_stack call for signature related errors can be too noisy and not of much value in debugging such problems. Signed-off-by: Steve French Reviewed-by: Shyam Prasad N --- fs/cifs/smb2transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c index 1a6c227ada8f..c0348e3b1695 100644 --- a/fs/cifs/smb2transport.c +++ b/fs/cifs/smb2transport.c @@ -660,8 +660,8 @@ smb2_verify_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server) return rc; if (memcmp(server_response_sig, shdr->Signature, SMB2_SIGNATURE_SIZE)) { - dump_stack(); - cifs_dbg(VFS, "sign fail cmd 0x%x message id 0x%llx\n", shdr->Command, shdr->MessageId); + cifs_dbg(VFS, "sign fail cmd 0x%x message id 0x%llx\n", + shdr->Command, shdr->MessageId); return -EACCES; } else return 0; From 9b5d2a4f797a585cd70ea3f4b03e6bdf48979548 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 16 Apr 2020 12:30:53 +0200 Subject: [PATCH 271/331] dt-bindings: Fix misspellings of "Analog Devices" According to https://www.analog.com/, the company name is spelled "Analog Devices". Signed-off-by: Geert Uytterhoeven Signed-off-by: Rob Herring --- .../devicetree/bindings/display/bridge/adi,adv7123.txt | 4 ++-- .../devicetree/bindings/display/bridge/adi,adv7511.txt | 4 ++-- Documentation/devicetree/bindings/dma/adi,axi-dmac.txt | 2 +- Documentation/devicetree/bindings/iio/dac/ad5755.txt | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt b/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt index a6b2b2b8f3d9..d3c2a4914ea2 100644 --- a/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt +++ b/Documentation/devicetree/bindings/display/bridge/adi,adv7123.txt @@ -1,5 +1,5 @@ -Analog Device ADV7123 Video DAC -------------------------------- +Analog Devices ADV7123 Video DAC +-------------------------------- The ADV7123 is a digital-to-analog converter that outputs VGA signals from a parallel video input. diff --git a/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt b/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt index e8ddec5d9d91..659523f538bf 100644 --- a/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt +++ b/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt @@ -1,5 +1,5 @@ -Analog Device ADV7511(W)/13/33/35 HDMI Encoders ------------------------------------------ +Analog Devices ADV7511(W)/13/33/35 HDMI Encoders +------------------------------------------------ The ADV7511, ADV7511W, ADV7513, ADV7533 and ADV7535 are HDMI audio and video transmitters compatible with HDMI 1.4 and DVI 1.0. They support color space diff --git a/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt b/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt index b38ee732efa9..cd17684aaab5 100644 --- a/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt +++ b/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt @@ -1,4 +1,4 @@ -Analog Device AXI-DMAC DMA controller +Analog Devices AXI-DMAC DMA controller Required properties: - compatible: Must be "adi,axi-dmac-1.00.a". diff --git a/Documentation/devicetree/bindings/iio/dac/ad5755.txt b/Documentation/devicetree/bindings/iio/dac/ad5755.txt index f0bbd7e1029b..502e1e55adbd 100644 --- a/Documentation/devicetree/bindings/iio/dac/ad5755.txt +++ b/Documentation/devicetree/bindings/iio/dac/ad5755.txt @@ -1,4 +1,4 @@ -* Analog Device AD5755 IIO Multi-Channel DAC Linux Driver +* Analog Devices AD5755 IIO Multi-Channel DAC Linux Driver Required properties: - compatible: Has to contain one of the following: From e045124e93995fe01e42ed530003ddba5d55db4f Mon Sep 17 00:00:00 2001 From: DENG Qingfang Date: Tue, 14 Apr 2020 14:34:08 +0800 Subject: [PATCH 272/331] net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In VLAN-unaware mode, the Egress Tag (EG_TAG) field in Port VLAN Control register must be set to Consistent to let tagged frames pass through as is, otherwise their tags will be stripped. Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang Reviewed-by: Florian Fainelli Tested-by: René van Dorst Signed-off-by: David S. Miller --- drivers/net/dsa/mt7530.c | 18 ++++++++++++------ drivers/net/dsa/mt7530.h | 7 +++++++ 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 84391c8a0e16..5c444cd722bd 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -773,8 +773,9 @@ mt7530_port_set_vlan_unaware(struct dsa_switch *ds, int port) */ mt7530_rmw(priv, MT7530_PCR_P(port), PCR_PORT_VLAN_MASK, MT7530_PORT_MATRIX_MODE); - mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK, - VLAN_ATTR(MT7530_VLAN_TRANSPARENT)); + mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK | PVC_EG_TAG_MASK, + VLAN_ATTR(MT7530_VLAN_TRANSPARENT) | + PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT)); for (i = 0; i < MT7530_NUM_PORTS; i++) { if (dsa_is_user_port(ds, i) && @@ -790,8 +791,8 @@ mt7530_port_set_vlan_unaware(struct dsa_switch *ds, int port) if (all_user_ports_removed) { mt7530_write(priv, MT7530_PCR_P(MT7530_CPU_PORT), PCR_MATRIX(dsa_user_ports(priv->ds))); - mt7530_write(priv, MT7530_PVC_P(MT7530_CPU_PORT), - PORT_SPEC_TAG); + mt7530_write(priv, MT7530_PVC_P(MT7530_CPU_PORT), PORT_SPEC_TAG + | PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT)); } } @@ -817,8 +818,9 @@ mt7530_port_set_vlan_aware(struct dsa_switch *ds, int port) /* Set the port as a user port which is to be able to recognize VID * from incoming packets before fetching entry within the VLAN table. */ - mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK, - VLAN_ATTR(MT7530_VLAN_USER)); + mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK | PVC_EG_TAG_MASK, + VLAN_ATTR(MT7530_VLAN_USER) | + PVC_EG_TAG(MT7530_VLAN_EG_DISABLED)); } static void @@ -1303,6 +1305,10 @@ mt7530_setup(struct dsa_switch *ds) mt7530_cpu_port_enable(priv, i); else mt7530_port_disable(ds, i); + + /* Enable consistent egress tag */ + mt7530_rmw(priv, MT7530_PVC_P(i), PVC_EG_TAG_MASK, + PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT)); } /* Setup port 5 */ diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index 4aef6024441b..979bb6374678 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -172,9 +172,16 @@ enum mt7530_port_mode { /* Register for port vlan control */ #define MT7530_PVC_P(x) (0x2010 + ((x) * 0x100)) #define PORT_SPEC_TAG BIT(5) +#define PVC_EG_TAG(x) (((x) & 0x7) << 8) +#define PVC_EG_TAG_MASK PVC_EG_TAG(7) #define VLAN_ATTR(x) (((x) & 0x3) << 6) #define VLAN_ATTR_MASK VLAN_ATTR(3) +enum mt7530_vlan_port_eg_tag { + MT7530_VLAN_EG_DISABLED = 0, + MT7530_VLAN_EG_CONSISTENT = 1, +}; + enum mt7530_vlan_port_attr { MT7530_VLAN_USER = 0, MT7530_VLAN_TRANSPARENT = 3, From 806fd188ce2a4f8b587e83e73c478e6484fbfa55 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 14 Apr 2020 15:39:52 -0700 Subject: [PATCH 273/331] net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes After commit bfcb813203e619a8960a819bf533ad2a108d8105 ("net: dsa: configure the MTU for switch ports") my Lamobo R1 platform which uses an allwinner,sun7i-a20-gmac compatible Ethernet MAC started to fail by rejecting a MTU of 1536. The reason for that is that the DMA capabilities are not readable on this version of the IP, and there is also no 'tx-fifo-depth' property being provided in Device Tree. The property is documented as optional, and is not provided. Chen-Yu indicated that the FIFO sizes are 4KB for TX and 16KB for RX, so provide these values through platform data as an immediate fix until various Device Tree sources get updated accordingly. Fixes: eaf4fac47807 ("net: stmmac: Do not accept invalid MTU values") Suggested-by: Chen-Yu Tsai Signed-off-by: Florian Fainelli Acked-by: Chen-Yu Tsai Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c index 7d40760e9ba8..0e1ca2cba3c7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c @@ -150,6 +150,8 @@ static int sun7i_gmac_probe(struct platform_device *pdev) plat_dat->init = sun7i_gmac_init; plat_dat->exit = sun7i_gmac_exit; plat_dat->fix_mac_speed = sun7i_fix_speed; + plat_dat->tx_fifo_size = 4096; + plat_dat->rx_fifo_size = 16384; ret = sun7i_gmac_init(pdev, plat_dat->bsp_priv); if (ret) From 05eab4f328bb127de37c1d619013c340cc5aaf39 Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Wed, 15 Apr 2020 16:42:26 +0800 Subject: [PATCH 274/331] mISDN: make dmril and dmrim static Fix the following sparse warning: drivers/isdn/hardware/mISDN/mISDNisar.c:746:12: warning: symbol 'dmril' was not declared. Should it be static? drivers/isdn/hardware/mISDN/mISDNisar.c:749:12: warning: symbol 'dmrim' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Signed-off-by: David S. Miller --- drivers/isdn/hardware/mISDN/mISDNisar.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/isdn/hardware/mISDN/mISDNisar.c b/drivers/isdn/hardware/mISDN/mISDNisar.c index e325e87c0593..11e8c7d8b6e8 100644 --- a/drivers/isdn/hardware/mISDN/mISDNisar.c +++ b/drivers/isdn/hardware/mISDN/mISDNisar.c @@ -743,10 +743,10 @@ check_send(struct isar_hw *isar, u8 rdm) } } -const char *dmril[] = {"NO SPEED", "1200/75", "NODEF2", "75/1200", "NODEF4", +static const char *dmril[] = {"NO SPEED", "1200/75", "NODEF2", "75/1200", "NODEF4", "300", "600", "1200", "2400", "4800", "7200", "9600nt", "9600t", "12000", "14400", "WRONG"}; -const char *dmrim[] = {"NO MOD", "NO DEF", "V32/V32b", "V22", "V21", +static const char *dmrim[] = {"NO MOD", "NO DEF", "V32/V32b", "V22", "V21", "Bell103", "V23", "Bell202", "V17", "V29", "V27ter"}; static void From d518691cbd3be3dae218e05cca3f3fc9b2f1aa77 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 16 Apr 2020 17:57:40 +0200 Subject: [PATCH 275/331] amd-xgbe: Use __napi_schedule() in BH context The driver uses __napi_schedule_irqoff() which is fine as long as it is invoked with disabled interrupts by everybody. Since the commit mentioned below the driver may invoke xgbe_isr_task() in tasklet/softirq context. This may lead to list corruption if another driver uses __napi_schedule_irqoff() in IRQ context. Use __napi_schedule() which safe to use from IRQ and softirq context. Fixes: 85b85c853401d ("amd-xgbe: Re-issue interrupt if interrupt status not cleared") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Tom Lendacky Cc: Tom Lendacky Signed-off-by: David S. Miller --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index b71f9b04a51e..a87264f95f1a 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -514,7 +514,7 @@ static void xgbe_isr_task(unsigned long data) xgbe_disable_rx_tx_ints(pdata); /* Turn on polling */ - __napi_schedule_irqoff(&pdata->napi); + __napi_schedule(&pdata->napi); } } else { /* Don't clear Rx/Tx status if doing per channel DMA From 74f4c438f22ca3fff157fb45e694805931487c55 Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Wed, 15 Apr 2020 16:48:53 +0800 Subject: [PATCH 276/331] arm/xen: make _xen_start_info static Fix the following sparse warning: arch/arm64/xen/../../arm/xen/enlighten.c:39:19: warning: symbol '_xen_start_info' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Reviewed-by: Stefano Stabellini Link: https://lore.kernel.org/r/20200415084853.5808-1-yanaijie@huawei.com Signed-off-by: Juergen Gross --- arch/arm/xen/enlighten.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index dd6804a64f1a..fd4e1ce1daf9 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -36,7 +36,7 @@ #include -struct start_info _xen_start_info; +static struct start_info _xen_start_info; struct start_info *xen_start_info = &_xen_start_info; EXPORT_SYMBOL(xen_start_info); From edfc23f6f9fdbd7825d50ac1f380243cde19b679 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Wed, 8 Apr 2020 19:43:52 +0800 Subject: [PATCH 277/331] irqchip/mbigen: Free msi_desc on device teardown Using irq_domain_free_irqs_common() on the irqdomain free path will leave the MSI descriptor unfreed when platform devices get removed. Properly free it by MSI domain free function. Fixes: 9650c60ebfec0 ("irqchip/mbigen: Create irq domain for each mbigen device") Signed-off-by: Zenghui Yu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200408114352.1604-1-yuzenghui@huawei.com --- drivers/irqchip/irq-mbigen.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c index 6b566bba263b..ff7627b57772 100644 --- a/drivers/irqchip/irq-mbigen.c +++ b/drivers/irqchip/irq-mbigen.c @@ -220,10 +220,16 @@ static int mbigen_irq_domain_alloc(struct irq_domain *domain, return 0; } +static void mbigen_irq_domain_free(struct irq_domain *domain, unsigned int virq, + unsigned int nr_irqs) +{ + platform_msi_domain_free(domain, virq, nr_irqs); +} + static const struct irq_domain_ops mbigen_domain_ops = { .translate = mbigen_domain_translate, .alloc = mbigen_irq_domain_alloc, - .free = irq_domain_free_irqs_common, + .free = mbigen_irq_domain_free, }; static int mbigen_of_create_domain(struct platform_device *pdev, From 3688b0db5c331f4ec3fa5eb9f670a4b04f530700 Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Wed, 8 Apr 2020 22:15:32 +0300 Subject: [PATCH 278/331] irqchip/ti-sci-inta: Fix processing of masked irqs The ti_sci_inta_irq_handler() does not take into account INTA IRQs state (masked/unmasked) as it uses INTA_STATUS_CLEAR_j register to get INTA IRQs status, which provides raw status value. This causes hard IRQ handlers to be called or threaded handlers to be scheduled many times even if corresponding INTA IRQ is masked. Above, first of all, affects the LEVEL interrupts processing and causes unexpected behavior up the system stack or crash. Fix it by using the Interrupt Masked Status INTA_STATUSM_j register which provides masked INTA IRQs status. Fixes: 9f1463b86c13 ("irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver") Signed-off-by: Grygorii Strashko Signed-off-by: Marc Zyngier Reviewed-by: Lokesh Vutla Link: https://lore.kernel.org/r/20200408191532.31252-1-grygorii.strashko@ti.com Cc: stable@vger.kernel.org --- drivers/irqchip/irq-ti-sci-inta.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ti-sci-inta.c b/drivers/irqchip/irq-ti-sci-inta.c index 8f6e6b08eadf..7e3ebf6ed2cd 100644 --- a/drivers/irqchip/irq-ti-sci-inta.c +++ b/drivers/irqchip/irq-ti-sci-inta.c @@ -37,6 +37,7 @@ #define VINT_ENABLE_SET_OFFSET 0x0 #define VINT_ENABLE_CLR_OFFSET 0x8 #define VINT_STATUS_OFFSET 0x18 +#define VINT_STATUS_MASKED_OFFSET 0x20 /** * struct ti_sci_inta_event_desc - Description of an event coming to @@ -116,7 +117,7 @@ static void ti_sci_inta_irq_handler(struct irq_desc *desc) chained_irq_enter(irq_desc_get_chip(desc), desc); val = readq_relaxed(inta->base + vint_desc->vint_id * 0x1000 + - VINT_STATUS_OFFSET); + VINT_STATUS_MASKED_OFFSET); for_each_set_bit(bit, &val, MAX_EVENTS_PER_VINT) { virq = irq_find_mapping(domain, vint_desc->events[bit].hwirq); From d727be7bbf7b68ccc18a3278469325d8f486d75b Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Thu, 2 Apr 2020 18:46:09 -0700 Subject: [PATCH 279/331] irqchip/sifive-plic: Fix maximum priority threshold value As per the PLIC specification, maximum priority threshold value is 0x7 not 0xF. Even though it doesn't cause any error in qemu/hifive unleashed, there may be some implementation which checks the upper bound resulting in an illegal access. Fixes: ccbe80bad571 ("irqchip/sifive-plic: Enable/Disable external interrupts upon cpu online/offline") Signed-off-by: Atish Patra Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200403014609.71831-1-atish.patra@wdc.com --- drivers/irqchip/irq-sifive-plic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index c34fb3ae0ff8..d0a71febdadc 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -56,7 +56,7 @@ #define CONTEXT_THRESHOLD 0x00 #define CONTEXT_CLAIM 0x04 -#define PLIC_DISABLE_THRESHOLD 0xf +#define PLIC_DISABLE_THRESHOLD 0x7 #define PLIC_ENABLE_THRESHOLD 0 struct plic_priv { From 0a66d6f90cf7d704c6a0f663f7058099eb8c97b0 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 6 Apr 2020 08:52:07 +0100 Subject: [PATCH 280/331] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order Running a lockedp-enabled kernel on a vim3l board (Amlogic SM1) leads to the following splat: [ 13.557138] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected [ 13.587485] ip/456 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 13.625922] ffff000059908cf0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __setup_irq+0xf8/0x8d8 [ 13.632273] which would create a new lock dependency: [ 13.637272] (&irq_desc_lock_class){-.-.}-{2:2} -> (&ctl->lock){+.+.}-{2:2} [ 13.644209] [ 13.644209] but this new dependency connects a HARDIRQ-irq-safe lock: [ 13.654122] (&irq_desc_lock_class){-.-.}-{2:2} [ 13.654125] [ 13.654125] ... which became HARDIRQ-irq-safe at: [ 13.664759] lock_acquire+0xec/0x368 [ 13.666926] _raw_spin_lock+0x60/0x88 [ 13.669979] handle_fasteoi_irq+0x30/0x178 [ 13.674082] generic_handle_irq+0x38/0x50 [ 13.678098] __handle_domain_irq+0x6c/0xc8 [ 13.682209] gic_handle_irq+0x5c/0xb0 [ 13.685872] el1_irq+0xd0/0x180 [ 13.689010] arch_cpu_idle+0x40/0x220 [ 13.692732] default_idle_call+0x54/0x60 [ 13.696677] do_idle+0x23c/0x2e8 [ 13.699903] cpu_startup_entry+0x30/0x50 [ 13.703852] rest_init+0x1e0/0x2b4 [ 13.707301] arch_call_rest_init+0x18/0x24 [ 13.711449] start_kernel+0x4ec/0x51c [ 13.715167] [ 13.715167] to a HARDIRQ-irq-unsafe lock: [ 13.722426] (&ctl->lock){+.+.}-{2:2} [ 13.722430] [ 13.722430] ... which became HARDIRQ-irq-unsafe at: [ 13.732319] ... [ 13.732324] lock_acquire+0xec/0x368 [ 13.735985] _raw_spin_lock+0x60/0x88 [ 13.739452] meson_gpio_irq_domain_alloc+0xcc/0x290 [ 13.744392] irq_domain_alloc_irqs_hierarchy+0x24/0x60 [ 13.749586] __irq_domain_alloc_irqs+0x160/0x2f0 [ 13.754254] irq_create_fwspec_mapping+0x118/0x320 [ 13.759073] irq_create_of_mapping+0x78/0xa0 [ 13.763360] of_irq_get+0x6c/0x80 [ 13.766701] of_mdiobus_register_phy+0x10c/0x238 [of_mdio] [ 13.772227] of_mdiobus_register+0x158/0x380 [of_mdio] [ 13.777388] mdio_mux_init+0x180/0x2e8 [mdio_mux] [ 13.782128] g12a_mdio_mux_probe+0x290/0x398 [mdio_mux_meson_g12a] [ 13.788349] platform_drv_probe+0x5c/0xb0 [ 13.792379] really_probe+0xe4/0x448 [ 13.795979] driver_probe_device+0xe8/0x140 [ 13.800189] __device_attach_driver+0x94/0x120 [ 13.804639] bus_for_each_drv+0x84/0xd8 [ 13.808474] __device_attach+0xe4/0x168 [ 13.812361] device_initial_probe+0x1c/0x28 [ 13.816592] bus_probe_device+0xa4/0xb0 [ 13.820430] deferred_probe_work_func+0xa8/0x100 [ 13.825064] process_one_work+0x264/0x688 [ 13.829088] worker_thread+0x4c/0x458 [ 13.832768] kthread+0x154/0x158 [ 13.836018] ret_from_fork+0x10/0x18 [ 13.839612] [ 13.839612] other info that might help us debug this: [ 13.839612] [ 13.850354] Possible interrupt unsafe locking scenario: [ 13.850354] [ 13.855720] CPU0 CPU1 [ 13.858774] ---- ---- [ 13.863242] lock(&ctl->lock); [ 13.866330] local_irq_disable(); [ 13.872233] lock(&irq_desc_lock_class); [ 13.878705] lock(&ctl->lock); [ 13.884297] [ 13.886857] lock(&irq_desc_lock_class); [ 13.891014] [ 13.891014] *** DEADLOCK *** The issue can occur when CPU1 is doing something like irq_set_type() and CPU0 performing an interrupt allocation, for example. Taking an interrupt (like the one being reconfigured) would lead to a deadlock. A solution to this is: - Reorder the locking so that meson_gpio_irq_update_bits takes the lock itself at all times, instead of relying on the caller to lock or not, hence making the RMW sequence atomic, - Rework the critical section in meson_gpio_irq_request_channel to only cover the allocation itself, and let the gpio_irq_sel_pin callback deal with its own locking if required, - Take the private spin-lock with interrupts disabled at all times Reviewed-by: Jerome Brunet Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-meson-gpio.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/irqchip/irq-meson-gpio.c b/drivers/irqchip/irq-meson-gpio.c index ccc7f823911b..bc7aebcc96e9 100644 --- a/drivers/irqchip/irq-meson-gpio.c +++ b/drivers/irqchip/irq-meson-gpio.c @@ -144,12 +144,17 @@ struct meson_gpio_irq_controller { static void meson_gpio_irq_update_bits(struct meson_gpio_irq_controller *ctl, unsigned int reg, u32 mask, u32 val) { + unsigned long flags; u32 tmp; + spin_lock_irqsave(&ctl->lock, flags); + tmp = readl_relaxed(ctl->base + reg); tmp &= ~mask; tmp |= val; writel_relaxed(tmp, ctl->base + reg); + + spin_unlock_irqrestore(&ctl->lock, flags); } static void meson_gpio_irq_init_dummy(struct meson_gpio_irq_controller *ctl) @@ -196,14 +201,15 @@ meson_gpio_irq_request_channel(struct meson_gpio_irq_controller *ctl, unsigned long hwirq, u32 **channel_hwirq) { + unsigned long flags; unsigned int idx; - spin_lock(&ctl->lock); + spin_lock_irqsave(&ctl->lock, flags); /* Find a free channel */ idx = find_first_zero_bit(ctl->channel_map, NUM_CHANNEL); if (idx >= NUM_CHANNEL) { - spin_unlock(&ctl->lock); + spin_unlock_irqrestore(&ctl->lock, flags); pr_err("No channel available\n"); return -ENOSPC; } @@ -211,6 +217,8 @@ meson_gpio_irq_request_channel(struct meson_gpio_irq_controller *ctl, /* Mark the channel as used */ set_bit(idx, ctl->channel_map); + spin_unlock_irqrestore(&ctl->lock, flags); + /* * Setup the mux of the channel to route the signal of the pad * to the appropriate input of the GIC @@ -225,8 +233,6 @@ meson_gpio_irq_request_channel(struct meson_gpio_irq_controller *ctl, */ *channel_hwirq = &(ctl->channel_irqs[idx]); - spin_unlock(&ctl->lock); - pr_debug("hwirq %lu assigned to channel %d - irq %u\n", hwirq, idx, **channel_hwirq); @@ -287,13 +293,9 @@ static int meson_gpio_irq_type_setup(struct meson_gpio_irq_controller *ctl, val |= REG_EDGE_POL_LOW(params, idx); } - spin_lock(&ctl->lock); - meson_gpio_irq_update_bits(ctl, REG_EDGE_POL, REG_EDGE_POL_MASK(params, idx), val); - spin_unlock(&ctl->lock); - return 0; } From 9fed9ccb16de9b18ba843d2df57312c9b8260f96 Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Fri, 17 Apr 2020 15:40:46 +0800 Subject: [PATCH 281/331] irqchip/irq-mvebu-icu: Make legacy_bindings static Fix the following sparse warning: drivers/irqchip/irq-mvebu-icu.c:69:1: warning: symbol 'legacy_bindings' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200417074046.46771-1-yanaijie@huawei.com --- drivers/irqchip/irq-mvebu-icu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-mvebu-icu.c b/drivers/irqchip/irq-mvebu-icu.c index 547045d89c4b..91adf771f185 100644 --- a/drivers/irqchip/irq-mvebu-icu.c +++ b/drivers/irqchip/irq-mvebu-icu.c @@ -66,7 +66,7 @@ struct mvebu_icu_irq_data { unsigned int type; }; -DEFINE_STATIC_KEY_FALSE(legacy_bindings); +static DEFINE_STATIC_KEY_FALSE(legacy_bindings); static void mvebu_icu_init(struct mvebu_icu *icu, struct mvebu_icu_msi_data *msi_data, From 8f374923de1ced05db3c98b9e4e1ce21c5aede2c Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Fri, 17 Apr 2020 15:40:36 +0800 Subject: [PATCH 282/331] irqchip/irq-bcm7038-l1: Make bcm7038_l1_of_init() static Fix the following sparse warning: drivers/irqchip/irq-bcm7038-l1.c:419:12: warning: symbol 'bcm7038_l1_of_init' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: Jason Yan Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200417074036.46594-1-yanaijie@huawei.com --- drivers/irqchip/irq-bcm7038-l1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-bcm7038-l1.c b/drivers/irqchip/irq-bcm7038-l1.c index eb9bce93cd05..fd7c537fb42a 100644 --- a/drivers/irqchip/irq-bcm7038-l1.c +++ b/drivers/irqchip/irq-bcm7038-l1.c @@ -416,7 +416,7 @@ static const struct irq_domain_ops bcm7038_l1_domain_ops = { .map = bcm7038_l1_map, }; -int __init bcm7038_l1_of_init(struct device_node *dn, +static int __init bcm7038_l1_of_init(struct device_node *dn, struct device_node *parent) { struct bcm7038_l1_chip *intc; From 3ab0762d1edfda6ccbc08f636acab42c103c299f Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 16 Apr 2020 13:57:52 -0700 Subject: [PATCH 283/331] x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL() The SPLIT_LOCK_CPU() macro escaped the tree-wide sweep for old-style initialization. Update to use X86_MATCH_INTEL_FAM6_MODEL(). Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200416205754.21177-2-tony.luck@intel.com --- arch/x86/kernel/cpu/intel.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bf08d4508ecb..ec0d8c74932f 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1119,8 +1119,6 @@ void switch_to_sld(unsigned long tifn) sld_update_msr(!(tifn & _TIF_SLD)); } -#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY} - /* * The following processors have the split lock detection feature. But * since they don't have the IA32_CORE_CAPABILITIES MSR, the feature cannot @@ -1128,8 +1126,8 @@ void switch_to_sld(unsigned long tifn) * processors. */ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { - SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X), - SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, 0), {} }; From aec7db3b13a07d515c15ada752a7287a44a79ea0 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Fri, 10 Apr 2020 11:42:48 -0400 Subject: [PATCH 284/331] btrfs: fix setting last_trans for reloc roots I made a mistake with my previous fix, I assumed that we didn't need to mess with the reloc roots once we were out of the part of relocation where we are actually moving the extents. The subtle thing that I missed is that btrfs_init_reloc_root() also updates the last_trans for the reloc root when we do btrfs_record_root_in_trans() for the corresponding fs_root. I've added a comment to make sure future me doesn't make this mistake again. This showed up as a WARN_ON() in btrfs_copy_root() because our last_trans didn't == the current transid. This could happen if we snapshotted a fs root with a reloc root after we set rc->create_reloc_tree = 0, but before we actually merge the reloc root. Worth mentioning that the regression produced the following warning when running snapshot creation and balance in parallel: BTRFS info (device sdc): relocating block group 30408704 flags metadata|dup ------------[ cut here ]------------ WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191 btrfs_copy_root+0x26f/0x430 [btrfs] CPU: 0 PID: 12823 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs] RSP: 0018:ffffb96e044279b8 EFLAGS: 00010202 RAX: 0000000000000009 RBX: ffff9da70bf61000 RCX: ffffb96e04427a48 RDX: ffff9da733a770c8 RSI: ffff9da70bf61000 RDI: ffff9da694163818 RBP: ffff9da733a770c8 R08: fffffffffffffff8 R09: 0000000000000002 R10: ffffb96e044279a0 R11: 0000000000000000 R12: ffff9da694163818 R13: fffffffffffffff8 R14: ffff9da6d2512000 R15: ffff9da714cdac00 FS: 00007fdeacf328c0(0000) GS:ffff9da735e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a2a5b8a118 CR3: 00000001eed78002 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? create_reloc_root+0x49/0x2b0 [btrfs] ? kmem_cache_alloc_trace+0xe5/0x200 create_reloc_root+0x8b/0x2b0 [btrfs] btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs] create_pending_snapshot+0x610/0x1010 [btrfs] create_pending_snapshots+0xa8/0xd0 [btrfs] btrfs_commit_transaction+0x4c7/0xc50 [btrfs] ? btrfs_mksubvol+0x3cd/0x560 [btrfs] btrfs_mksubvol+0x455/0x560 [btrfs] __btrfs_ioctl_snap_create+0x15f/0x190 [btrfs] btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs] ? mem_cgroup_commit_charge+0x6e/0x540 btrfs_ioctl+0x12d8/0x3760 [btrfs] ? do_raw_spin_unlock+0x49/0xc0 ? _raw_spin_unlock+0x29/0x40 ? __handle_mm_fault+0x11b3/0x14b0 ? ksys_ioctl+0x92/0xb0 ksys_ioctl+0x92/0xb0 ? trace_hardirqs_off_thunk+0x1a/0x1c __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fdeabd3bdd7 Fixes: 2abc726ab4b8 ("btrfs: do not init a reloc root if we aren't relocating") Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Signed-off-by: David Sterba --- fs/btrfs/relocation.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 7e362a6935fd..d35936c934ab 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1527,8 +1527,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, int clear_rsv = 0; int ret; - if (!rc || !rc->create_reloc_tree || - root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) + if (!rc) return 0; /* @@ -1538,12 +1537,28 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, if (reloc_root_is_dead(root)) return 0; + /* + * This is subtle but important. We do not do + * record_root_in_transaction for reloc roots, instead we record their + * corresponding fs root, and then here we update the last trans for the + * reloc root. This means that we have to do this for the entire life + * of the reloc root, regardless of which stage of the relocation we are + * in. + */ if (root->reloc_root) { reloc_root = root->reloc_root; reloc_root->last_trans = trans->transid; return 0; } + /* + * We are merging reloc roots, we do not need new reloc trees. Also + * reloc trees never need their own reloc tree. + */ + if (!rc->create_reloc_tree || + root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) + return 0; + if (!trans->reloc_reserved) { rsv = trans->block_rsv; trans->block_rsv = rc->block_rsv; From 2cf3818f18b26992ff20a730df46e08e2485fd67 Mon Sep 17 00:00:00 2001 From: Alexandru Tachici Date: Thu, 16 Apr 2020 14:58:48 +0300 Subject: [PATCH 285/331] dt-bindings: iio: dac: AD5570R fix bindings errors Replaced num property with reg property, fixed errors reported by dt-binding-check. Fixes: ea52c21268e6 ("dt-bindings: iio: dac: Add docs for AD5770R DAC") Signed-off-by: Alexandru Tachici [robh: Fix required property list, fix Fixes tag] Signed-off-by: Rob Herring --- .../bindings/iio/dac/adi,ad5770r.yaml | 93 +++++++++---------- 1 file changed, 44 insertions(+), 49 deletions(-) diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml index 3b1a85236dd9..58d81ca43460 100644 --- a/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5770r.yaml @@ -49,93 +49,86 @@ properties: asserted during driver probe. maxItems: 1 - channel0: + channel@0: description: Represents an external channel which are connected to the DAC. Channel 0 can act both as a current source and sink. type: object properties: - num: + reg: description: This represents the channel number. - items: - const: 0 + const: 0 adi,range-microamp: description: Output range of the channel. oneOf: - - $ref: /schemas/types.yaml#/definitions/int32-array - items: - - enum: [0 300000] - - enum: [-60000 0] - - enum: [-60000 300000] + - const: 0 + - const: 300000 + - items: + - const: -60000 + - const: 0 + - items: + - const: -60000 + - const: 300000 - channel1: + channel@1: description: Represents an external channel which are connected to the DAC. type: object properties: - num: + reg: description: This represents the channel number. - items: - const: 1 + const: 1 adi,range-microamp: description: Output range of the channel. - oneOf: - - $ref: /schemas/types.yaml#/definitions/uint32-array - - items: - - enum: [0 140000] - - enum: [0 250000] + items: + - const: 0 + - enum: [ 140000, 250000 ] - channel2: + channel@2: description: Represents an external channel which are connected to the DAC. type: object properties: - num: + reg: description: This represents the channel number. - items: - const: 2 + const: 2 adi,range-microamp: description: Output range of the channel. - oneOf: - - $ref: /schemas/types.yaml#/definitions/uint32-array - - items: - - enum: [0 140000] - - enum: [0 250000] + items: + - const: 0 + - enum: [ 55000, 150000 ] patternProperties: "^channel@([3-5])$": type: object description: Represents the external channels which are connected to the DAC. properties: - num: + reg: description: This represents the channel number. - items: - minimum: 3 - maximum: 5 + minimum: 3 + maximum: 5 adi,range-microamp: description: Output range of the channel. - oneOf: - - $ref: /schemas/types.yaml#/definitions/uint32-array - - items: - - enum: [0 45000] - - enum: [0 100000] + items: + - const: 0 + - enum: [ 45000, 100000 ] required: - reg -- diff-channels -- channel0 -- channel1 -- channel2 -- channel3 -- channel4 -- channel5 +- channel@0 +- channel@1 +- channel@2 +- channel@3 +- channel@4 +- channel@5 examples: - | @@ -150,34 +143,36 @@ examples: vref-supply = <&vref>; adi,external-resistor; reset-gpios = <&gpio 22 0>; + #address-cells = <1>; + #size-cells = <0>; channel@0 { - num = <0>; - adi,range-microamp = <(-60000) 300000>; + reg = <0>; + adi,range-microamp = <0 300000>; }; channel@1 { - num = <1>; + reg = <1>; adi,range-microamp = <0 140000>; }; channel@2 { - num = <2>; + reg = <2>; adi,range-microamp = <0 55000>; }; channel@3 { - num = <3>; + reg = <3>; adi,range-microamp = <0 45000>; }; channel@4 { - num = <4>; + reg = <4>; adi,range-microamp = <0 45000>; }; channel@5 { - num = <5>; + reg = <5>; adi,range-microamp = <0 45000>; }; }; From f4d859b7f3162090605b06fa354ee9cb24478e6a Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 14 Apr 2020 18:48:32 +0200 Subject: [PATCH 286/331] MAINTAINERS: dt: update display/allwinner file entry Changeset f5a98bfe7b37 ("dt-bindings: display: Convert Allwinner display pipeline to schemas") split Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt into several files. Yet, it kept the old place at MAINTAINERS. Update it to point to the new place. Fixes: f5a98bfe7b37 ("dt-bindings: display: Convert Allwinner display pipeline to schemas") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Rob Herring --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e64e5db31497..86f98c3e6cfc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5552,7 +5552,7 @@ M: Chen-Yu Tsai L: dri-devel@lists.freedesktop.org S: Supported T: git git://anongit.freedesktop.org/drm/drm-misc -F: Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt +F: Documentation/devicetree/bindings/display/allwinner* F: drivers/gpu/drm/sun4i/ DRM DRIVERS FOR AMLOGIC SOCS From 21a431e627046ff44a2786a9b8e8d6f12aa329f9 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 8 Apr 2020 17:46:21 +0200 Subject: [PATCH 287/331] MAINTAINERS: dt: fix pointers for ARM Integrator, Versatile and RealView There's a conversion from a plain text binding file into 4 yaml ones. The old file got removed, causing this new warning: Warning: MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/arm/arm-boards Address it by replacing the old reference by the new ones Fixes: 4b900070d50d ("dt-bindings: arm: Add Versatile YAML schema") Fixes: 2d483550b6d2 ("dt-bindings: arm: Drop the non-YAML bindings") Fixes: 7db625b9fa75 ("dt-bindings: arm: Add RealView YAML schema") Fixes: 4fb00d9066c1 ("dt-bindings: arm: Add Versatile Express and Juno YAML schema") Fixes: 33fbfb3eaf4e ("dt-bindings: arm: Add Integrator YAML schema") Signed-off-by: Mauro Carvalho Chehab Acked-by: Linus Walleij Signed-off-by: Rob Herring --- MAINTAINERS | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 86f98c3e6cfc..82e4b0a0c921 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1323,7 +1323,10 @@ ARM INTEGRATOR, VERSATILE AND REALVIEW SUPPORT M: Linus Walleij L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained -F: Documentation/devicetree/bindings/arm/arm-boards +F: Documentation/devicetree/bindings/arm/arm,integrator.yaml +F: Documentation/devicetree/bindings/arm/arm,realview.yaml +F: Documentation/devicetree/bindings/arm/arm,versatile.yaml +F: Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml F: Documentation/devicetree/bindings/auxdisplay/arm-charlcd.txt F: Documentation/devicetree/bindings/clock/arm,syscon-icst.yaml F: Documentation/devicetree/bindings/i2c/i2c-versatile.txt From b3fb36ed694b05738d45218ea72cf7feb10ce2b1 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 16 Apr 2020 16:42:46 -0500 Subject: [PATCH 288/331] of: unittest: kmemleak on changeset destroy kmemleak reports several memory leaks from devicetree unittest. This is the fix for problem 1 of 5. of_unittest_changeset() reaches deeply into the dynamic devicetree functions. Several nodes were left with an elevated reference count and thus were not properly cleaned up. Fix the reference counts so that the memory will be freed. Fixes: 201c910bd689 ("of: Transactional DT support.") Reported-by: Erhard F. Signed-off-by: Frank Rowand Signed-off-by: Rob Herring --- drivers/of/unittest.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 7e27670c3616..20ff2dfc3143 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -861,6 +861,10 @@ static void __init of_unittest_changeset(void) unittest(!of_changeset_revert(&chgset), "revert failed\n"); of_changeset_destroy(&chgset); + + of_node_put(n1); + of_node_put(n2); + of_node_put(n21); #endif } From 216830d2413cc61be3f76bc02ffd905e47d2439e Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 16 Apr 2020 16:42:47 -0500 Subject: [PATCH 289/331] of: unittest: kmemleak in of_unittest_platform_populate() kmemleak reports several memory leaks from devicetree unittest. This is the fix for problem 2 of 5. of_unittest_platform_populate() left an elevated reference count for grandchild nodes (which are platform devices). Fix the platform device reference counts so that the memory will be freed. Fixes: fb2caa50fbac ("of/selftest: add testcase for nodes with same name and address") Reported-by: Erhard F. Signed-off-by: Frank Rowand Signed-off-by: Rob Herring --- drivers/of/unittest.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 20ff2dfc3143..4c7818276857 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -1247,10 +1247,13 @@ static void __init of_unittest_platform_populate(void) of_platform_populate(np, match, NULL, &test_bus->dev); for_each_child_of_node(np, child) { - for_each_child_of_node(child, grandchild) - unittest(of_find_device_by_node(grandchild), + for_each_child_of_node(child, grandchild) { + pdev = of_find_device_by_node(grandchild); + unittest(pdev, "Could not create device for node '%pOFn'\n", grandchild); + of_dev_put(pdev); + } } of_platform_depopulate(&test_bus->dev); From 145fc138f9aae4f9e1331352e301df28e16aed35 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 16 Apr 2020 16:42:48 -0500 Subject: [PATCH 290/331] of: unittest: kmemleak in of_unittest_overlay_high_level() kmemleak reports several memory leaks from devicetree unittest. This is the fix for problem 3 of 5. of_unittest_overlay_high_level() failed to kfree the newly created property when the property named 'name' is skipped. Fixes: 39a751a4cb7e ("of: change overlay apply input data from unflattened to FDT") Reported-by: Erhard F. Signed-off-by: Frank Rowand Signed-off-by: Rob Herring --- drivers/of/unittest.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 4c7818276857..f238b7a3865d 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -3094,8 +3094,11 @@ static __init void of_unittest_overlay_high_level(void) goto err_unlock; } if (__of_add_property(of_symbols, new_prop)) { + kfree(new_prop->name); + kfree(new_prop->value); + kfree(new_prop); /* "name" auto-generated by unflatten */ - if (!strcmp(new_prop->name, "name")) + if (!strcmp(prop->name, "name")) continue; unittest(0, "duplicate property '%s' in overlay_base node __symbols__", prop->name); From 478ff649b1c8eb2409b1a54fb75eb46f7c29f140 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 16 Apr 2020 16:42:49 -0500 Subject: [PATCH 291/331] of: overlay: kmemleak in dup_and_fixup_symbol_prop() kmemleak reports several memory leaks from devicetree unittest. This is the fix for problem 4 of 5. target_path was not freed in the non-error path. Fixes: e0a58f3e08d4 ("of: overlay: remove a dependency on device node full_name") Reported-by: Erhard F. Signed-off-by: Frank Rowand Signed-off-by: Rob Herring --- drivers/of/overlay.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index c9219fddf44b..50bbe0edf538 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -261,6 +261,8 @@ static struct property *dup_and_fixup_symbol_prop( of_property_set_flag(new_prop, OF_DYNAMIC); + kfree(target_path); + return new_prop; err_free_new_prop: From 29acfb65598f91671413869e0d0a1ec4e74ac705 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 16 Apr 2020 16:42:50 -0500 Subject: [PATCH 292/331] of: unittest: kmemleak in duplicate property update kmemleak reports several memory leaks from devicetree unittest. This is the fix for problem 5 of 5. When overlay 'overlay_bad_add_dup_prop' is applied, the apply code properly detects that a memory leak will occur if the overlay is removed since the duplicate property is located in a base devicetree node and reports via printk(): OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail The overlay is removed when the apply code detects multiple changesets modifying the same property. This is reported via printk(): OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/rpm_avail As a result of this error, the overlay is removed resulting in the expected memory leak. Add another device node level to the overlay so that the duplicate property is located in a node added by the overlay, thus no memory leak will occur when the overlay is removed. Thus users of kmemleak will not have to debug this leak in the future. Fixes: 2fe0e8769df9 ("of: overlay: check prevents multiple fragments touching same property") Reported-by: Erhard F. Signed-off-by: Frank Rowand Signed-off-by: Rob Herring --- .../overlay_bad_add_dup_prop.dts | 23 +++++++++++++++---- drivers/of/unittest.c | 12 +++++----- 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/of/unittest-data/overlay_bad_add_dup_prop.dts b/drivers/of/unittest-data/overlay_bad_add_dup_prop.dts index c190da54f175..6327d1ffb963 100644 --- a/drivers/of/unittest-data/overlay_bad_add_dup_prop.dts +++ b/drivers/of/unittest-data/overlay_bad_add_dup_prop.dts @@ -3,22 +3,37 @@ /plugin/; /* - * &electric_1/motor-1 and &spin_ctrl_1 are the same node: - * /testcase-data-2/substation@100/motor-1 + * &electric_1/motor-1/electric and &spin_ctrl_1/electric are the same node: + * /testcase-data-2/substation@100/motor-1/electric * * Thus the property "rpm_avail" in each fragment will * result in an attempt to update the same property twice. * This will result in an error and the overlay apply * will fail. + * + * The previous version of this test did not include the extra + * level of node 'electric'. That resulted in the 'rpm_avail' + * property being located in the pre-existing node 'motor-1'. + * Modifying a property results in a WARNING that a memory leak + * will occur if the overlay is removed. Since the overlay apply + * fails, the memory leak does actually occur, and kmemleak will + * further report the memory leak if CONFIG_DEBUG_KMEMLEAK is + * enabled. Adding the overlay node 'electric' avoids the + * memory leak and thus people who use kmemleak will not + * have to debug this non-problem again. */ &electric_1 { motor-1 { - rpm_avail = < 100 >; + electric { + rpm_avail = < 100 >; + }; }; }; &spin_ctrl_1 { - rpm_avail = < 100 200 >; + electric { + rpm_avail = < 100 200 >; + }; }; diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index f238b7a3865d..398de04fd19c 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -3181,21 +3181,21 @@ static __init void of_unittest_overlay_high_level(void) "OF: overlay: ERROR: multiple fragments add and/or delete node /testcase-data-2/substation@100/motor-1/controller"); EXPECT_BEGIN(KERN_ERR, - "OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add and/or delete node /testcase-data-2/substation@100/motor-1/electric"); EXPECT_BEGIN(KERN_ERR, - "OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/electric/rpm_avail"); EXPECT_BEGIN(KERN_ERR, - "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/electric/name"); unittest(overlay_data_apply("overlay_bad_add_dup_prop", NULL), "Adding overlay 'overlay_bad_add_dup_prop' failed\n"); EXPECT_END(KERN_ERR, - "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/electric/name"); EXPECT_END(KERN_ERR, - "OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/electric/rpm_avail"); EXPECT_END(KERN_ERR, - "OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail"); + "OF: overlay: ERROR: multiple fragments add and/or delete node /testcase-data-2/substation@100/motor-1/electric"); unittest(overlay_data_apply("overlay_bad_phandle", NULL), "Adding overlay 'overlay_bad_phandle' failed\n"); From 3dceecfad68cbe6224990654dafd8edd8b71b37e Mon Sep 17 00:00:00 2001 From: Stefan Haberland Date: Fri, 17 Apr 2020 11:48:35 +0200 Subject: [PATCH 293/331] s390/dasd: remove IOSCHED_DEADLINE from DASD Kconfig CONFIG_IOSCHED_DEADLINE was removed with commit f382fb0bcef4 ("block: remove legacy IO schedulers") and setting of the scheduler was removed with commit a5fd8ddce2af ("s390/dasd: remove setting of scheduler from driver"). So get rid of the select. Reported-by: Krzysztof Kozlowski Signed-off-by: Stefan Haberland Signed-off-by: Jens Axboe --- drivers/s390/block/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig index a8682f69effc..376f1efbbb86 100644 --- a/drivers/s390/block/Kconfig +++ b/drivers/s390/block/Kconfig @@ -26,7 +26,6 @@ config DASD def_tristate y prompt "Support for DASD devices" depends on CCW && BLOCK - select IOSCHED_DEADLINE help Enable this option if you want to access DASDs directly utilizing S/390s channel subsystem commands. This is necessary for running From 3a89c25d98da99672414bf20a887f7f8f8768986 Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Fri, 17 Apr 2020 16:00:22 +0300 Subject: [PATCH 294/331] blk-wbt: Use tracepoint_string() for wbt_step tracepoint string literals Use tracepoint_string() for string literals that are used in the wbt_step tracepoint, so that userspace tools can display the string content. Signed-off-by: Tommi Rantala Signed-off-by: Jens Axboe --- block/blk-wbt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 8641ba9793c5..9cb082f38b93 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -313,7 +313,7 @@ static void scale_up(struct rq_wb *rwb) calc_wb_limits(rwb); rwb->unknown_cnt = 0; rwb_wake_all(rwb); - rwb_trace_step(rwb, "scale up"); + rwb_trace_step(rwb, tracepoint_string("scale up")); } static void scale_down(struct rq_wb *rwb, bool hard_throttle) @@ -322,7 +322,7 @@ static void scale_down(struct rq_wb *rwb, bool hard_throttle) return; calc_wb_limits(rwb); rwb->unknown_cnt = 0; - rwb_trace_step(rwb, "scale down"); + rwb_trace_step(rwb, tracepoint_string("scale down")); } static void rwb_arm_timer(struct rq_wb *rwb) From 3f22037d382b45710248b6faa4d5bd30d169c4ba Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Fri, 17 Apr 2020 16:00:23 +0300 Subject: [PATCH 295/331] blk-wbt: Drop needless newlines from tracepoint format strings Drop needless newlines from tracepoint format strings, they only add empty lines to perf tracing output. Signed-off-by: Tommi Rantala Signed-off-by: Jens Axboe --- include/trace/events/wbt.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/trace/events/wbt.h b/include/trace/events/wbt.h index 37342a13c9cb..784814160197 100644 --- a/include/trace/events/wbt.h +++ b/include/trace/events/wbt.h @@ -46,7 +46,7 @@ TRACE_EVENT(wbt_stat, ), TP_printk("%s: rmean=%llu, rmin=%llu, rmax=%llu, rsamples=%llu, " - "wmean=%llu, wmin=%llu, wmax=%llu, wsamples=%llu\n", + "wmean=%llu, wmin=%llu, wmax=%llu, wsamples=%llu", __entry->name, __entry->rmean, __entry->rmin, __entry->rmax, __entry->rnr_samples, __entry->wmean, __entry->wmin, __entry->wmax, __entry->wnr_samples) @@ -73,7 +73,7 @@ TRACE_EVENT(wbt_lat, __entry->lat = div_u64(lat, 1000); ), - TP_printk("%s: latency %lluus\n", __entry->name, + TP_printk("%s: latency %lluus", __entry->name, (unsigned long long) __entry->lat) ); @@ -115,7 +115,7 @@ TRACE_EVENT(wbt_step, __entry->max = max; ), - TP_printk("%s: %s: step=%d, window=%luus, background=%u, normal=%u, max=%u\n", + TP_printk("%s: %s: step=%d, window=%luus, background=%u, normal=%u, max=%u", __entry->name, __entry->msg, __entry->step, __entry->window, __entry->bg, __entry->normal, __entry->max) ); @@ -148,7 +148,7 @@ TRACE_EVENT(wbt_timer, __entry->inflight = inflight; ), - TP_printk("%s: status=%u, step=%d, inflight=%u\n", __entry->name, + TP_printk("%s: status=%u, step=%d, inflight=%u", __entry->name, __entry->status, __entry->step, __entry->inflight) ); From b0151da52a6d4f3951ea24c083e7a95977621436 Mon Sep 17 00:00:00 2001 From: Reinette Chatre Date: Tue, 17 Mar 2020 09:26:45 -0700 Subject: [PATCH 296/331] x86/resctrl: Fix invalid attempt at removing the default resource group The default resource group ("rdtgroup_default") is associated with the root of the resctrl filesystem and should never be removed. New resource groups can be created as subdirectories of the resctrl filesystem and they can be removed from user space. There exists a safeguard in the directory removal code (rdtgroup_rmdir()) that ensures that only subdirectories can be removed by testing that the directory to be removed has to be a child of the root directory. A possible deadlock was recently fixed with 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference"). This fix involved associating the private data of the "mon_groups" and "mon_data" directories to the resource group to which they belong instead of NULL as before. A consequence of this change was that the original safeguard code preventing removal of "mon_groups" and "mon_data" found in the root directory failed resulting in attempts to remove the default resource group that ends in a BUG: kernel BUG at mm/slub.c:3969! invalid opcode: 0000 [#1] SMP PTI Call Trace: rdtgroup_rmdir+0x16b/0x2c0 kernfs_iop_rmdir+0x5c/0x90 vfs_rmdir+0x7a/0x160 do_rmdir+0x17d/0x1e0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by improving the directory removal safeguard to ensure that subdirectories of the resctrl root directory can only be removed if they are a child of the resctrl filesystem's root _and_ not associated with the default resource group. Fixes: 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference") Reported-by: Sai Praneeth Prakhya Signed-off-by: Reinette Chatre Signed-off-by: Borislav Petkov Tested-by: Sai Praneeth Prakhya Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 064e9ef44cd6..9d4e73a9b5a9 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -3072,7 +3072,8 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) * If the rdtgroup is a mon group and parent directory * is a valid "mon_groups" directory, remove the mon group. */ - if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn) { + if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn && + rdtgrp != &rdtgroup_default) { if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP || rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) { ret = rdtgroup_ctrl_remove(kn, rdtgrp); From 0903060fe590105b7d31901c1ed67614c08cee08 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Fri, 17 Apr 2020 13:04:55 +0900 Subject: [PATCH 297/331] kbuild: check libyaml installation for 'make dt_binding_check' If you run 'make dtbs_check' without installing the libyaml package, the error message "dtc needs libyaml ..." is shown. This should be checked also for 'make dt_binding_check' because dtc needs to validate *.example.dts extracted from *.yaml files. It is missing since commit 4f0e3a57d6eb ("kbuild: Add support for DT binding schema checks"), but this fix-up is applicable only after commit e10c4321dc1e ("kbuild: allow to run dt_binding_check and dtbs_check in a single command"). I gave the Fixes tag to the latter in case somebody is interested in back-porting this. Fixes: e10c4321dc1e ("kbuild: allow to run dt_binding_check and dtbs_check in a single command") Signed-off-by: Masahiro Yamada Signed-off-by: Rob Herring --- scripts/dtc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/dtc/Makefile b/scripts/dtc/Makefile index 2f3c3a7e1620..ef85f8b7d4a7 100644 --- a/scripts/dtc/Makefile +++ b/scripts/dtc/Makefile @@ -13,7 +13,7 @@ dtc-objs += dtc-lexer.lex.o dtc-parser.tab.o HOST_EXTRACFLAGS := -I $(srctree)/$(src)/libfdt ifeq ($(shell pkg-config --exists yaml-0.1 2>/dev/null && echo yes),) -ifneq ($(CHECK_DTBS),) +ifneq ($(CHECK_DT_BINDING)$(CHECK_DTBS),) $(error dtc needs libyaml for DT schema validation support. \ Install the necessary libyaml development package.) endif From 9fe0450785abbc04b0ed5d3cf61fcdb8ab656b4b Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 21 Feb 2020 16:21:05 +0000 Subject: [PATCH 298/331] x86/resctrl: Preserve CDP enable over CPU hotplug Resctrl assumes that all CPUs are online when the filesystem is mounted, and that CPUs remember their CDP-enabled state over CPU hotplug. This goes wrong when resctrl's CDP-enabled state changes while all the CPUs in a domain are offline. When a domain comes online, enable (or disable!) CDP to match resctrl's current setting. Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Suggested-by: Reinette Chatre Signed-off-by: James Morse Signed-off-by: Borislav Petkov Cc: Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com --- arch/x86/kernel/cpu/resctrl/core.c | 2 ++ arch/x86/kernel/cpu/resctrl/internal.h | 1 + arch/x86/kernel/cpu/resctrl/rdtgroup.c | 13 +++++++++++++ 3 files changed, 16 insertions(+) diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 89049b343c7a..d8cc5223b7ce 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -578,6 +578,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) d->id = id; cpumask_set_cpu(cpu, &d->cpu_mask); + rdt_domain_reconfigure_cdp(r); + if (r->alloc_capable && domain_setup_ctrlval(r, d)) { kfree(d); return; diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index 181c992f448c..3dd13f3a8b23 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -601,5 +601,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d); void __check_limbo(struct rdt_domain *d, bool force_free); bool cbm_validate_intel(char *buf, u32 *data, struct rdt_resource *r); bool cbm_validate_amd(char *buf, u32 *data, struct rdt_resource *r); +void rdt_domain_reconfigure_cdp(struct rdt_resource *r); #endif /* _ASM_X86_RESCTRL_INTERNAL_H */ diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 9d4e73a9b5a9..5a359d9fcc05 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1859,6 +1859,19 @@ static int set_cache_qos_cfg(int level, bool enable) return 0; } +/* Restore the qos cfg state when a domain comes online */ +void rdt_domain_reconfigure_cdp(struct rdt_resource *r) +{ + if (!r->alloc_capable) + return; + + if (r == &rdt_resources_all[RDT_RESOURCE_L2DATA]) + l2_qos_cfg_update(&r->alloc_enabled); + + if (r == &rdt_resources_all[RDT_RESOURCE_L3DATA]) + l3_qos_cfg_update(&r->alloc_enabled); +} + /* * Enable or disable the MBA software controller * which helps user specify bandwidth in MBps. From 48fd5b5ee714714f4cf9f9e1cba3b49b1fd40ed6 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 16 Apr 2020 13:57:53 -0700 Subject: [PATCH 299/331] x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural The Intel Software Developers' Manual erroneously listed bit 5 of the IA32_CORE_CAPABILITIES register as an architectural feature. It is not. Features enumerated by IA32_CORE_CAPABILITIES are model specific and implementation details may vary in different cpu models. Thus it is only safe to trust features after checking the CPU model. Icelake client and server models are known to implement the split lock detect feature even though they don't enumerate IA32_CORE_CAPABILITIES [ tglx: Use switch() for readability and massage comments ] Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com --- arch/x86/kernel/cpu/intel.c | 45 +++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index ec0d8c74932f..c23ad481347e 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1120,10 +1120,17 @@ void switch_to_sld(unsigned long tifn) } /* - * The following processors have the split lock detection feature. But - * since they don't have the IA32_CORE_CAPABILITIES MSR, the feature cannot - * be enumerated. Enable it by family and model matching on these - * processors. + * Bits in the IA32_CORE_CAPABILITIES are not architectural, so they should + * only be trusted if it is confirmed that a CPU model implements a + * specific feature at a particular bit position. + * + * The possible driver data field values: + * + * - 0: CPU models that are known to have the per-core split-lock detection + * feature even though they do not enumerate IA32_CORE_CAPABILITIES. + * + * - 1: CPU models which may enumerate IA32_CORE_CAPABILITIES and if so use + * bit 5 to enumerate the per-core split-lock detection feature. */ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), @@ -1133,19 +1140,29 @@ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) { - u64 ia32_core_caps = 0; + const struct x86_cpu_id *m; + u64 ia32_core_caps; - if (c->x86_vendor != X86_VENDOR_INTEL) + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) return; - if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) { - /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */ + + m = x86_match_cpu(split_lock_cpu_ids); + if (!m) + return; + + switch (m->driver_data) { + case 0: + break; + case 1: + if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) + return; rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps); - } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { - /* Enumerate split lock detection by family and model. */ - if (x86_match_cpu(split_lock_cpu_ids)) - ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT; + if (!(ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)) + return; + break; + default: + return; } - if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT) - split_lock_setup(); + split_lock_setup(); } From 8b9a18a9f2494144fe23fe630d0734310fa65301 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 16 Apr 2020 13:57:54 -0700 Subject: [PATCH 300/331] x86/split_lock: Add Tremont family CPU models Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether specific SKUs have support for split lock detection. Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com --- arch/x86/kernel/cpu/intel.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index c23ad481347e..a19a680542ce 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1135,6 +1135,9 @@ void switch_to_sld(unsigned long tifn) static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, 0), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT, 1), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, 1), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_L, 1), {} }; From c843b382e61b5f28a3d917712c69a344f632387c Mon Sep 17 00:00:00 2001 From: Sascha Hauer Date: Fri, 17 Apr 2020 11:28:53 +0200 Subject: [PATCH 301/331] hwmon: (jc42) Fix name to have no illegal characters The jc42 driver passes I2C client's name as hwmon device name. In case of device tree probed devices this ends up being part of the compatible string, "jc-42.4-temp". This name contains hyphens and the hwmon core doesn't like this: jc42 2-0018: hwmon: 'jc-42.4-temp' is not a valid name attribute, please fix This changes the name to "jc42" which doesn't have any illegal characters. Signed-off-by: Sascha Hauer Link: https://lore.kernel.org/r/20200417092853.31206-1-s.hauer@pengutronix.de Signed-off-by: Guenter Roeck --- drivers/hwmon/jc42.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c index f2d81b0558e5..e3f1ebee7130 100644 --- a/drivers/hwmon/jc42.c +++ b/drivers/hwmon/jc42.c @@ -506,7 +506,7 @@ static int jc42_probe(struct i2c_client *client, const struct i2c_device_id *id) } data->config = config; - hwmon_dev = devm_hwmon_device_register_with_info(dev, client->name, + hwmon_dev = devm_hwmon_device_register_with_info(dev, "jc42", data, &jc42_chip_info, NULL); return PTR_ERR_OR_ZERO(hwmon_dev); From 0a368bf00e3a7c57a57efc1bf79b79facb97639c Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 16:40:21 -0500 Subject: [PATCH 302/331] bio: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index c1c0f9ea4e63..a0ee494a6329 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -319,7 +319,7 @@ struct bio_integrity_payload { struct work_struct bip_work; /* I/O completion */ struct bio_vec *bip_vec; - struct bio_vec bip_inline_vecs[0];/* embedded bvec array */ + struct bio_vec bip_inline_vecs[];/* embedded bvec array */ }; #if defined(CONFIG_BLK_DEV_INTEGRITY) From f36aaf8be421099103193c49796a14213d3be315 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 16:43:39 -0500 Subject: [PATCH 303/331] blk-mq: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/blk-mq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f389d7c724bd..b45148ba3291 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -173,7 +173,7 @@ struct blk_mq_hw_ctx { * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also * blk_mq_hw_ctx_size(). */ - struct srcu_struct srcu[0]; + struct srcu_struct srcu[]; }; /** From 5a58ec8cfc8621f5bdbd610202f62f817e5da204 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 16:45:36 -0500 Subject: [PATCH 304/331] blk_types: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/blk_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 70254ae11769..31eb92876be7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -198,7 +198,7 @@ struct bio { * double allocations for a small number of bio_vecs. This member * MUST obviously be kept at the very end of the bio. */ - struct bio_vec bi_inline_vecs[0]; + struct bio_vec bi_inline_vecs[]; }; #define BIO_RESET_BYTES offsetof(struct bio, bi_max_vecs) From e76018cb604ace486de9cf85898c14bb2b47faff Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 16:48:10 -0500 Subject: [PATCH 305/331] can: dev: peak_canfd.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/can/dev/peak_canfd.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/can/dev/peak_canfd.h b/include/linux/can/dev/peak_canfd.h index 511a37302fea..5fd627e9da19 100644 --- a/include/linux/can/dev/peak_canfd.h +++ b/include/linux/can/dev/peak_canfd.h @@ -189,7 +189,7 @@ struct __packed pucan_rx_msg { u8 client; __le16 flags; __le32 can_id; - u8 d[0]; + u8 d[]; }; /* uCAN error types */ @@ -266,7 +266,7 @@ struct __packed pucan_tx_msg { u8 client; __le16 flags; __le32 can_id; - u8 d[0]; + u8 d[]; }; /* build the cmd opcode_channel field with respect to the correct endianness */ From 1fa0949bede6de2b595da535c3ce69de8e130db2 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:03:49 -0500 Subject: [PATCH 306/331] digsig.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/digsig.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/digsig.h b/include/linux/digsig.h index 594fc66a395a..2ace69e41088 100644 --- a/include/linux/digsig.h +++ b/include/linux/digsig.h @@ -29,7 +29,7 @@ struct pubkey_hdr { uint32_t timestamp; /* key made, always 0 for now */ uint8_t algo; uint8_t nmpi; - char mpi[0]; + char mpi[]; } __packed; struct signature_hdr { @@ -39,7 +39,7 @@ struct signature_hdr { uint8_t hash; uint8_t keyid[8]; uint8_t nmpi; - char mpi[0]; + char mpi[]; } __packed; #if defined(CONFIG_SIGNATURE) || defined(CONFIG_SIGNATURE_MODULE) From a2008395fe2ebd9cd82f220d034d36cc887f35fe Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:17:52 -0500 Subject: [PATCH 307/331] dirent.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/dirent.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dirent.h b/include/linux/dirent.h index fc61f3cff72f..99002220cd45 100644 --- a/include/linux/dirent.h +++ b/include/linux/dirent.h @@ -7,7 +7,7 @@ struct linux_dirent64 { s64 d_off; unsigned short d_reclen; unsigned char d_type; - char d_name[0]; + char d_name[]; }; #endif From 192199464d6cccb084356add54b3a48d6dde9f96 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:21:19 -0500 Subject: [PATCH 308/331] enclosure.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/enclosure.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h index 564e96f625ff..1c630e2c2756 100644 --- a/include/linux/enclosure.h +++ b/include/linux/enclosure.h @@ -101,7 +101,7 @@ struct enclosure_device { struct device edev; struct enclosure_component_callbacks *cb; int components; - struct enclosure_component component[0]; + struct enclosure_component component[]; }; static inline struct enclosure_device * From beb69f15a095245c5cc62389eea93002b41d2eb9 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:23:01 -0500 Subject: [PATCH 309/331] energy_model.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/energy_model.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index d249b88a4d5a..ade6486a3382 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -36,7 +36,7 @@ struct em_cap_state { struct em_perf_domain { struct em_cap_state *table; int nr_cap_states; - unsigned long cpus[0]; + unsigned long cpus[]; }; #ifdef CONFIG_ENERGY_MODEL From 5299a11a9378e8c68e3b8e2040f7aa7e401d50b7 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:24:53 -0500 Subject: [PATCH 310/331] ethtool.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/ethtool.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index c1d379bf6ee1..a23b26eab479 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -35,7 +35,7 @@ struct compat_ethtool_rxnfc { compat_u64 data; struct compat_ethtool_rx_flow_spec fs; u32 rule_cnt; - u32 rule_locs[0]; + u32 rule_locs[]; }; #endif /* CONFIG_COMPAT */ @@ -462,7 +462,7 @@ int ethtool_check_ops(const struct ethtool_ops *ops); struct ethtool_rx_flow_rule { struct flow_rule *rule; - unsigned long priv[0]; + unsigned long priv[]; }; struct ethtool_rx_flow_spec_input { From 89f60a5d9bf5a6b9b16dfdd56a91c4a2d7b8830d Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:43:59 -0500 Subject: [PATCH 311/331] genalloc.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/genalloc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 5b14a0f38124..0bd581003cd5 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -76,7 +76,7 @@ struct gen_pool_chunk { void *owner; /* private data to retrieve at alloc time */ unsigned long start_addr; /* start address of memory chunk */ unsigned long end_addr; /* end address of memory chunk (inclusive) */ - unsigned long bits[0]; /* bitmap for allocating memory chunk */ + unsigned long bits[]; /* bitmap for allocating memory chunk */ }; /* From 0ead33642f1df89699f2e4dda8eea59c326b68f6 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 17:59:00 -0500 Subject: [PATCH 312/331] igmp.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/igmp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 463047d0190b..faa6586a5783 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -38,7 +38,7 @@ struct ip_sf_socklist { unsigned int sl_max; unsigned int sl_count; struct rcu_head rcu; - __be32 sl_addr[0]; + __be32 sl_addr[]; }; #define IP_SFLSIZE(count) (sizeof(struct ip_sf_socklist) + \ From 1d9e13e8ef05029c61d52ad9a6f48f14771d14b7 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 18:00:04 -0500 Subject: [PATCH 313/331] ihex.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/ihex.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/ihex.h b/include/linux/ihex.h index 98cb5ce0b0a0..b824877e6d1b 100644 --- a/include/linux/ihex.h +++ b/include/linux/ihex.h @@ -18,7 +18,7 @@ struct ihex_binrec { __be32 addr; __be16 len; - uint8_t data[0]; + uint8_t data[]; } __attribute__((packed)); static inline uint16_t ihex_binrec_size(const struct ihex_binrec *p) From 7856e9f12f1f59cc6abb25f92b336528d0660ebb Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 18:01:11 -0500 Subject: [PATCH 314/331] irq.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/irq.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 9315fbb87db3..fa8ad93029ad 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -1043,7 +1043,7 @@ struct irq_chip_generic { unsigned long unused; struct irq_domain *domain; struct list_head list; - struct irq_chip_type chip_types[0]; + struct irq_chip_type chip_types[]; }; /** @@ -1079,7 +1079,7 @@ struct irq_domain_chip_generic { unsigned int irq_flags_to_clear; unsigned int irq_flags_to_set; enum irq_gc_flags gc_flags; - struct irq_chip_generic *gc[0]; + struct irq_chip_generic *gc[]; }; /* Generic chip callback functions */ From 312322722872324939f0d0347a6e41807c2d4c56 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 16:58:49 -0500 Subject: [PATCH 315/331] lib: cpu_rmap: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/cpu_rmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h index 02edeafcb2bf..be8aea04d023 100644 --- a/include/linux/cpu_rmap.h +++ b/include/linux/cpu_rmap.h @@ -28,7 +28,7 @@ struct cpu_rmap { struct { u16 index; u16 dist; - } near[0]; + } near[]; }; #define CPU_RMAP_DIST_INF 0xffff From 859b494111b196853fd8c1852c6b57ef33738b50 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 18:32:01 -0500 Subject: [PATCH 316/331] list_lru.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/list_lru.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index d5ceb2839a2d..9dcaa3e582c9 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -34,7 +34,7 @@ struct list_lru_one { struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_one *lru[0]; + struct list_lru_one *lru[]; }; struct list_lru_node { From 307ed94c37f842676d336cf5f2162022f4d7cdc4 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 18:36:10 -0500 Subject: [PATCH 317/331] memcontrol.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/memcontrol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1b4150ff64be..d275c72c4f8e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -106,7 +106,7 @@ struct lruvec_stat { */ struct memcg_shrinker_map { struct rcu_head rcu; - unsigned long map[0]; + unsigned long map[]; }; /* @@ -148,7 +148,7 @@ struct mem_cgroup_threshold_ary { /* Size of entries[] */ unsigned int size; /* Array of thresholds */ - struct mem_cgroup_threshold entries[0]; + struct mem_cgroup_threshold entries[]; }; struct mem_cgroup_thresholds { From 1223f3db71ba7bbcf2e77c7a5d4f440c2a2fa9c3 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:07:49 -0500 Subject: [PATCH 318/331] platform_data: wilco-ec.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/platform_data/wilco-ec.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/platform_data/wilco-ec.h b/include/linux/platform_data/wilco-ec.h index 25f46a939637..3e268e636b5b 100644 --- a/include/linux/platform_data/wilco-ec.h +++ b/include/linux/platform_data/wilco-ec.h @@ -83,7 +83,7 @@ struct wilco_ec_response { u16 result; u16 data_size; u8 reserved[2]; - u8 data[0]; + u8 data[]; } __packed; /** From 70f1451ec98ee43d2c66d2caa5ae6935ee97f90a Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:08:58 -0500 Subject: [PATCH 319/331] posix_acl.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/posix_acl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 540595a321a7..90797f1b421d 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -28,7 +28,7 @@ struct posix_acl { refcount_t a_refcount; struct rcu_head a_rcu; unsigned int a_count; - struct posix_acl_entry a_entries[0]; + struct posix_acl_entry a_entries[]; }; #define FOREACH_ACL_ENTRY(pa, acl, pe) \ From a1c4b9247ddfb62fe3a23eb53d250382e82fae77 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:12:17 -0500 Subject: [PATCH 320/331] rio.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/rio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/rio.h b/include/linux/rio.h index 317bace5ac64..2cd637268b4f 100644 --- a/include/linux/rio.h +++ b/include/linux/rio.h @@ -100,7 +100,7 @@ struct rio_switch { u32 port_ok; struct rio_switch_ops *ops; spinlock_t lock; - struct rio_dev *nextdev[0]; + struct rio_dev *nextdev[]; }; /** @@ -201,7 +201,7 @@ struct rio_dev { u8 hopcount; struct rio_dev *prev; atomic_t state; - struct rio_switch rswitch[0]; /* RIO switch info */ + struct rio_switch rswitch[]; /* RIO switch info */ }; #define rio_dev_g(n) list_entry(n, struct rio_dev, global_list) From 9dd8bb5f8c449e87cc0084a118673c6d4182bab2 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:13:20 -0500 Subject: [PATCH 321/331] rslib.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/rslib.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/rslib.h b/include/linux/rslib.h index 5974cedd008c..238bb85243d3 100644 --- a/include/linux/rslib.h +++ b/include/linux/rslib.h @@ -54,7 +54,7 @@ struct rs_codec { */ struct rs_control { struct rs_codec *codec; - uint16_t buffers[0]; + uint16_t buffers[]; }; /* General purpose RS codec, 8-bit data width, symbol width 1-15 bit */ From fe946db6ca851a0cd8c2f9c9dd96ef74e051cf2f Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:14:37 -0500 Subject: [PATCH 322/331] sched: topology.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/sched/topology.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index af9319e4cfb9..95253ad792b0 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -142,7 +142,7 @@ struct sched_domain { * by attaching extra space to the end of the structure, * depending on how many CPUs the kernel has booted up with) */ - unsigned long span[0]; + unsigned long span[]; }; static inline struct cpumask *sched_domain_span(struct sched_domain *sd) From 5c91aa1df00ec4fa283c35e92736392df3137d81 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:22:24 -0500 Subject: [PATCH 323/331] skbuff.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/skbuff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3a2ac7072dbb..3000c526f552 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4162,7 +4162,7 @@ struct skb_ext { refcount_t refcnt; u8 offset[SKB_EXT_NUM]; /* in chunks of 8 bytes */ u8 chunks; /* same */ - char data[0] __aligned(8); + char data[] __aligned(8); }; struct skb_ext *__skb_ext_alloc(void); From 16c3380f8c2e7ed3d75a30776a89aabf5512027a Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:23:10 -0500 Subject: [PATCH 324/331] swap.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/swap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index b835d8dbea0e..e1bbf7a16b27 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -275,7 +275,7 @@ struct swap_info_struct { */ struct work_struct discard_work; /* discard worker */ struct swap_cluster_list discard_clusters; /* discard clusters list */ - struct plist_node avail_lists[0]; /* + struct plist_node avail_lists[]; /* * entries in swap_avail_heads, one * entry per node. * Must be last as the number of the From 4ea19ecf322c2f98ef87fc980b3851625b082ac2 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:25:06 -0500 Subject: [PATCH 325/331] ti_wilink_st.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/ti_wilink_st.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/ti_wilink_st.h b/include/linux/ti_wilink_st.h index eb6cbdf10e50..44a7f9169ac6 100644 --- a/include/linux/ti_wilink_st.h +++ b/include/linux/ti_wilink_st.h @@ -295,7 +295,7 @@ struct bts_header { u32 magic; u32 version; u8 future[24]; - u8 actions[0]; + u8 actions[]; } __attribute__ ((packed)); /** @@ -305,7 +305,7 @@ struct bts_header { struct bts_action { u16 type; u16 size; - u8 data[0]; + u8 data[]; } __attribute__ ((packed)); struct bts_action_send { @@ -315,7 +315,7 @@ struct bts_action_send { struct bts_action_wait { u32 msec; u32 size; - u8 data[0]; + u8 data[]; } __attribute__ ((packed)); struct bts_action_delay { From 06ccf63da5d8e90e4dff8b741972a9b279b5bf4c Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:38:18 -0500 Subject: [PATCH 326/331] tpm_eventlog.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/tpm_eventlog.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h index 131ea1bad458..c253461b1c4e 100644 --- a/include/linux/tpm_eventlog.h +++ b/include/linux/tpm_eventlog.h @@ -28,7 +28,7 @@ struct tcpa_event { u32 event_type; u8 pcr_value[20]; /* SHA1 */ u32 event_size; - u8 event_data[0]; + u8 event_data[]; }; enum tcpa_event_types { @@ -55,7 +55,7 @@ enum tcpa_event_types { struct tcpa_pc_event { u32 event_id; u32 event_size; - u8 event_data[0]; + u8 event_data[]; }; enum tcpa_pc_event_ids { @@ -102,7 +102,7 @@ struct tcg_pcr_event { struct tcg_event_field { u32 event_size; - u8 event[0]; + u8 event[]; } __packed; struct tcg_pcr_event2_head { From d6cdad870358128c1e753e6258e295ab8a5a2429 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:51:46 -0500 Subject: [PATCH 327/331] uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/uapi/linux/dlm_device.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/dlm_device.h b/include/uapi/linux/dlm_device.h index f880d2831160..e83954c69fff 100644 --- a/include/uapi/linux/dlm_device.h +++ b/include/uapi/linux/dlm_device.h @@ -45,13 +45,13 @@ struct dlm_lock_params { void __user *bastaddr; struct dlm_lksb __user *lksb; char lvb[DLM_USER_LVB_LEN]; - char name[0]; + char name[]; }; struct dlm_lspace_params { __u32 flags; __u32 minor; - char name[0]; + char name[]; }; struct dlm_purge_params { From 6e88abb862898f55d083071e4423000983dcfe63 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 21:30:22 -0500 Subject: [PATCH 328/331] uapi: linux: fiemap.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/uapi/linux/fiemap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/fiemap.h b/include/uapi/linux/fiemap.h index 8c0bc24d5d95..7a900b2377b6 100644 --- a/include/uapi/linux/fiemap.h +++ b/include/uapi/linux/fiemap.h @@ -34,7 +34,7 @@ struct fiemap { __u32 fm_mapped_extents;/* number of extents that were mapped (out) */ __u32 fm_extent_count; /* size of fm_extents array (in) */ __u32 fm_reserved; - struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ + struct fiemap_extent fm_extents[]; /* array of mapped extents (out) */ }; #define FIEMAP_MAX_OFFSET (~0ULL) From 43951585e1308b322c8ee31a4aafd08213f5c5d7 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 23 Mar 2020 19:41:14 -0500 Subject: [PATCH 329/331] xattr.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva --- include/linux/xattr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 4cf6e11f4a3c..47eaa34f8761 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -73,7 +73,7 @@ struct simple_xattr { struct list_head list; char *name; size_t size; - char value[0]; + char value[]; }; /* From dadbd85f2afc8ccd1dd1f0131781c740c91edd96 Mon Sep 17 00:00:00 2001 From: Brian Geffon Date: Fri, 17 Apr 2020 10:25:56 -0700 Subject: [PATCH 330/331] mm: Fix MREMAP_DONTUNMAP accounting on VMA merge When remapping a mapping where a portion of a VMA is remapped into another portion of the VMA it can cause the VMA to become split. During the copy_vma operation the VMA can actually be remerged if it's an anonymous VMA whose pages have not yet been faulted. This isn't normally a problem because at the end of the remap the original portion is unmapped causing it to become split again. However, MREMAP_DONTUNMAP leaves that original portion in place which means that the VMA which was split and then remerged is not actually split at the end of the mremap. This patch fixes a bug where we don't detect that the VMAs got remerged and we end up putting back VM_ACCOUNT on the next mapping which is completely unreleated. When that next mapping is unmapped it results in incorrectly unaccounting for the memory which was never accounted, and eventually we will underflow on the memory comittment. There is also another issue which is similar, we're currently accouting for the number of pages in the new_vma but that's wrong. We need to account for the length of the remap operation as that's all that is being added. If there was a mapping already at that location its comittment would have been adjusted as part of the munmap at the start of the mremap. A really simple repro can be seen in: https://gist.github.com/bgaff/e101ce99da7d9a8c60acc641d07f312c Fixes: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Reported-by: syzbot Signed-off-by: Brian Geffon Signed-off-by: Linus Torvalds --- mm/mremap.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/mremap.c b/mm/mremap.c index a7e282ead438..c881abeba0bf 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -413,9 +413,20 @@ static unsigned long move_vma(struct vm_area_struct *vma, /* Always put back VM_ACCOUNT since we won't unmap */ vma->vm_flags |= VM_ACCOUNT; - vm_acct_memory(vma_pages(new_vma)); + vm_acct_memory(new_len >> PAGE_SHIFT); } + /* + * VMAs can actually be merged back together in copy_vma + * calling merge_vma. This can happen with anonymous vmas + * which have not yet been faulted, so if we were to consider + * this VMA split we'll end up adding VM_ACCOUNT on the + * next VMA, which is completely unrelated if this VMA + * was re-merged. + */ + if (split && new_vma == vma) + split = 0; + /* We always clear VM_LOCKED[ONFAULT] on the old vma */ vma->vm_flags &= VM_LOCKED_CLEAR_MASK; From ae83d0b416db002fe95601e7f97f64b59514d936 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 19 Apr 2020 14:35:30 -0700 Subject: [PATCH 331/331] Linux 5.7-rc2 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 70def4907036..49b2709ff44e 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 7 SUBLEVEL = 0 -EXTRAVERSION = -rc1 +EXTRAVERSION = -rc2 NAME = Kleptomaniac Octopus # *DOCUMENTATION*