linux/net/core
Daniel Borkmann 36bbef52c7 bpf: direct packet write and access for helpers for clsact progs
This work implements direct packet access for helpers and direct packet
write in a similar fashion as already available for XDP types via commits
4acf6c0b84 ("bpf: enable direct packet data write for xdp progs") and
6841de8b0d ("bpf: allow helpers access the packet directly"), and as a
complementary feature to the already available direct packet read for tc
(cls/act) programs.

For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
and bpf_csum_update(). The first is generally needed for both, read and
write, because they would otherwise only be limited to the current linear
skb head. Usually, when the data_end test fails, programs just bail out,
or, in the direct read case, use bpf_skb_load_bytes() as an alternative
to overcome this limitation. If such data sits in non-linear parts, we
can just pull them in once with the new helper, retest and eventually
access them.

At the same time, this also makes sure the skb is uncloned, which is, of
course, a necessary condition for direct write. As this needs to be an
invariant for the write part only, the verifier detects writes and adds
a prologue that is calling bpf_skb_pull_data() to effectively unclone the
skb from the very beginning in case it is indeed cloned. The heuristic
makes use of a similar trick that was done in 233577a220 ("net: filter:
constify detection of pkt_type_offset"). This comes at zero cost for other
programs that do not use the direct write feature. Should a program use
this feature only sparsely and has read access for the most parts with,
for example, drop return codes, then such write action can be delegated
to a tail called program for mitigating this cost of potential uncloning
to a late point in time where it would have been paid similarly with the
bpf_skb_store_bytes() as well. Advantage of direct write is that the
writes are inlined whereas the helper cannot make any length assumptions
and thus needs to generate a call to memcpy() also for small sizes, as well
as cost of helper call itself with sanity checks are avoided. Plus, when
direct read is already used, we don't need to cache or perform rechecks
on the data boundaries (due to verifier invalidating previous checks for
helpers that change skb->data), so more complex programs using rewrites
can benefit from switching to direct read plus write.

For direct packet access to helpers, we save the otherwise needed copy into
a temp struct sitting on stack memory when use-case allows. Both facilities
are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
this to map helpers and csum_diff, and can successively enable other helpers
where we find it makes sense. Helpers that definitely cannot be allowed for
this are those part of bpf_helper_changes_skb_data() since they can change
underlying data, and those that write into memory as this could happen for
packet typed args when still cloned. bpf_csum_update() helper accommodates
for the fact that we need to fixup checksum_complete when using direct write
instead of bpf_skb_store_bytes(), meaning the programs can use available
helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
csum_block_add(), csum_block_sub() equivalents in eBPF together with the
new helper. A usage example will be provided for iproute2's examples/bpf/
directory.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 23:32:11 -04:00
..
Makefile net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
datagram.c udp: enable MSG_PEEK at non-zero offset 2016-04-05 16:29:37 -04:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
dev_addr_lists.c
dev_ioctl.c
devlink.c devlink: add hardware messages tracing facility 2016-07-12 14:20:18 -07:00
drop_monitor.c drop_monitor: make genl_multicast_group const 2016-09-01 14:09:00 -07:00
dst.c net: add dst_cache to ovs vxlan lwtunnel 2016-02-16 20:21:48 -05:00
dst_cache.c net: dst_cache_per_cpu_dst_set() can be static 2016-03-18 17:45:08 -04:00
ethtool.c sctp: Add GSO support 2016-06-03 19:37:21 -04:00
fib_rules.c fib_rules: Added NLM_F_EXCL support to fib_nl_newrule 2016-06-30 08:23:19 -04:00
filter.c bpf: direct packet write and access for helpers for clsact progs 2016-09-20 23:32:11 -04:00
flow.c flowcache: Avoid OOM condition under preasure 2016-03-17 10:28:42 +01:00
flow_dissector.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
gen_estimator.c net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
gen_stats.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-10 11:52:24 -07:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c
lwtunnel.c net: lwtunnel: Handle fragmentation 2016-08-30 22:27:18 -07:00
neighbour.c neigh: allow admin to set NUD_STALE 2016-08-08 15:36:38 -07:00
net-procfs.c net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
net-sysfs.c net: introduce NETDEV_CHANGE_TX_QUEUE_LEN 2016-07-01 05:32:17 -04:00
net-sysfs.h
net-traces.c
net_namespace.c netns: avoid disabling irq for netns id 2016-09-04 11:39:59 -07:00
netclassid_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
netevent.c
netpoll.c net: tracepoint napi:napi_poll add work and budget 2016-07-09 18:05:02 -04:00
netprio_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
pktgen.c net: pktgen: support injecting packets for qdisc testing 2016-07-04 16:07:34 -07:00
ptp_classifier.c
request_sock.c
rtnetlink.c net: core: Add offload stats to if_stats_msg 2016-09-18 22:33:42 -04:00
scm.c unix: correctly track in-flight fds in sending process user_struct 2016-02-08 10:30:42 -05:00
secure_seq.c
skbuff.c gso: Support partial splitting at the frag_list pointer 2016-09-19 20:59:34 -04:00
sock.c net: remove clear_sk() method 2016-08-23 23:25:29 -07:00
sock_diag.c sock_diag: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
sock_reuseport.c soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
stream.c
sysctl_net_core.c bpf: add generic constant blinding for use in jits 2016-05-16 13:49:32 -04:00
timestamping.c
tso.c
utils.c net: the space is required before the open parenthesis '(' 2016-06-29 05:15:14 -04:00