Go to file
John Fastabend e9db4ef6bf bpf: sockhash fix omitted bucket lock in sock_close
First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.

Next, in sock_hash_delete_elem() the pattern was as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.

To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with map lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing locking around  map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:32 +02:00
Documentation Move all the dma-mapping code to kernel/dma 2018-06-20 16:30:01 +09:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
arch bpf, s390: fix potential memleak when later bpf_jit_prog fails 2018-06-29 10:47:35 -07:00
block for-linus-20180616 2018-06-17 05:37:55 +09:00
certs docs: Fix some broken references 2018-06-15 18:10:01 -03:00
crypto docs: Fix some broken references 2018-06-15 18:10:01 -03:00
drivers bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF 2018-06-26 11:28:38 +02:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs proc: fix missing final NUL in get_mm_cmdline() rewrite 2018-06-20 15:38:28 +09:00
include bpf: undo prog rejection on read-only lock failure 2018-06-29 10:47:35 -07:00
init dma-mapping: move all DMA mapping code to kernel/dma 2018-06-14 08:50:37 +02:00
ipc ipc: use new return type vm_fault_t 2018-06-15 07:55:25 +09:00
kernel bpf: sockhash fix omitted bucket lock in sock_close 2018-07-01 01:21:32 +02:00
lib test_bpf: flag tests that cannot be jited on s390 2018-06-28 23:58:39 +02:00
mm revert "mm/memblock: add missing include <linux/bootmem.h>" 2018-06-19 07:43:44 +09:00
net bpf: Change bpf_fib_lookup to return lookup status 2018-06-29 00:02:02 +02:00
samples bpf: Change bpf_fib_lookup to return lookup status 2018-06-29 00:02:02 +02:00
scripts scripts/documentation-file-ref-check: check tools/*/Documentation 2018-06-15 18:10:01 -03:00
security docs: Fix some broken references 2018-06-15 18:10:01 -03:00
sound docs: Fix some broken references 2018-06-15 18:10:01 -03:00
tools selftests: bpf: notification about privilege required to run test_lwt_seg6local.sh testing script 2018-06-26 12:16:36 +02:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
.clang-format clang-format: add configuration file 2018-04-11 10:28:35 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap Merge branch 'asoc-4.17' into asoc-4.18 for compress dependencies 2018-04-26 12:24:28 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS/CREDITS: Drop METAG ARCHITECTURE 2018-03-05 16:34:24 +00:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: add basic helper macros to scripts/Kconfig.include 2018-05-29 03:31:19 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-06-21 07:13:42 +09:00
Makefile Linux 4.18-rc1 2018-06-17 08:04:49 +09:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.