linux/net
John Fastabend 58f45daa2d bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made
[ Upstream commit c9c89dcd87 ]

If copy_page_to_iter() fails or even partially completes, but with fewer
bytes copied than expected we currently reset sg.start and return EFAULT.
This proves problematic if we already copied data into the user buffer
before we return an error. Because we leave the copied data in the user
buffer and fail to unwind the scatterlist so kernel side believes data
has been copied and user side believes data has _not_ been received.

Expected behavior should be to return number of bytes copied and then
on the next read we need to return the error assuming its still there. This
can happen if we have a copy length spanning multiple scatterlist elements
and one or more complete before the error is hit.

The error is rare enough though that my normal testing with server side
programs, such as nginx, httpd, envoy, etc., I have never seen this. The
only reliable way to reproduce that I've found is to stream movies over
my browser for a day or so and wait for it to hang. Not very scientific,
but with a few extra WARN_ON()s in the code the bug was obvious.

When we review the errors from copy_page_to_iter() it seems we are hitting
a page fault from copy_page_to_iter_iovec() where the code checks
fault_in_pages_writeable(buf, copy) where buf is the user buffer. It
also seems typical server applications don't hit this case.

The other way to try and reproduce this is run the sockmap selftest tool
test_sockmap with data verification enabled, but it doesn't reproduce the
fault. Perhaps we can trigger this case artificially somehow from the
test tools. I haven't sorted out a way to do that yet though.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:08 +01:00
..
6lowpan
9p net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid 2020-11-05 11:43:20 +01:00
802
8021q
appletalk
atm
ax25
batman-adv
bluetooth Bluetooth: Only mark socket zapped after unlocking 2020-10-29 09:58:06 +01:00
bpf
bpfilter
bridge net: bridge: add missing counters to ndo_get_stats64 callback 2020-11-24 13:28:57 +01:00
caif
can can: af_can: prevent potential access of uninitialized member in canfd_rcv() 2020-11-24 13:29:05 +01:00
ceph libceph: clear con->out_msg on Policy::stateful_server faults 2020-11-05 11:43:34 +01:00
core net: Have netpoll bring-up DSA management interface 2020-11-24 13:28:57 +01:00
dcb
dccp
decnet
dns_resolver
dsa
ethernet
hsr
ieee802154
ife
ipv4 bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made 2020-11-24 13:29:08 +01:00
ipv6 ipv6: Fix error path to cancel the meseage 2020-11-24 13:28:56 +01:00
iucv net/af_iucv: fix null pointer dereference on shutdown 2020-11-18 19:20:32 +01:00
kcm
key
l2tp
l3mdev
lapb
llc
mac80211 mac80211: always wind down STA state 2020-11-22 10:14:12 +01:00
mac802154
mpls
ncsi net/ncsi: Fix netlink registration 2020-11-24 13:29:00 +01:00
netfilter netfilter: ipset: Update byte and packet counters regardless of whether they match 2020-11-18 19:20:17 +01:00
netlabel netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist() 2020-11-24 13:28:57 +01:00
netlink
netrom
nfc
nsh
openvswitch
packet
phonet
psample
qrtr
rds
rfkill rfkill: Fix use-after-free in rfkill_resume() 2020-11-24 13:29:05 +01:00
rose
rxrpc
sched net: sch_generic: fix the missing new qdisc assignment bug 2020-11-18 19:20:33 +01:00
sctp sctp: change to hold/put transport for proto_unreach_timer 2020-11-24 13:28:59 +01:00
smc net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid() 2020-11-24 13:28:58 +01:00
strparser
sunrpc SUNRPC: Mitigate cond_resched() in xprt_transmit() 2020-11-05 11:43:18 +01:00
switchdev
tipc tipc: fix memory leak in tipc_topsrv_start() 2020-11-18 19:20:33 +01:00
tls net/tls: fix corrupted data in recvmsg 2020-11-24 13:28:58 +01:00
unix
vmw_vsock vsock: use ns_capable_noaudit() on socket create 2020-11-10 12:37:30 +01:00
wimax
wireless cfg80211: regulatory: Fix inconsistent format argument 2020-11-18 19:20:23 +01:00
x25 net: x25: Increase refcnt of "struct x25_neigh" in x25_rx_call_request 2020-11-24 13:28:58 +01:00
xdp
xfrm net: xfrm: fix a race condition during allocing spi 2020-11-18 19:20:17 +01:00
Kconfig
Makefile
compat.c
socket.c
sysctl_net.c