linux

Author	SHA1	Message	Date
Alexei Starovoitov	7ae457c1e5	net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_' prefix - everything that is pure BPF is changed to 'bpf_' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern orig_prog; unsigned int (bpf_func)(const struct sk_buff skb, const struct bpf_insn filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter ' and BPF_PROG_RUN to be used with 'struct bpf_prog ' __sk_filter_release(struct sk_filter ) gains __bpf_prog_release(struct bpf_prog ) helper function also perform related renames for the functions that work with 'struct bpf_prog ', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock )/sk_detach_filter(struct sock ) and SK_RUN_FILTER(struct sk_filter , ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog )/bpf_prog_destroy(struct bpf_prog ) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:03:58 -07:00
Daniel Borkmann	3480593131	net: filter: get rid of BPF_S_* enum This patch finally allows us to get rid of the BPF_S_* enum. Currently, the code performs unnecessary encode and decode workarounds in seccomp and filter migration itself when a filter is being attached in order to overcome BPF_S_* encoding which is not used anymore by the new interpreter resp. JIT compilers. Keeping it around would mean that also in future we would need to extend and maintain this enum and related encoders/decoders. We can get rid of all that and save us these operations during filter attaching. Naturally, also JIT compilers need to be updated by this. Before JIT conversion is being done, each compiler checks if A is being loaded at startup to obtain information if it needs to emit instructions to clear A first. Since BPF extensions are a subset of BPF_LD \| BPF_{W,H,B} \| BPF_ABS variants, case statements for extensions can be removed at that point. To ease and minimalize code changes in the classic JITs, we have introduced bpf_anc_helper(). Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int), arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we unfortunately didn't have access, but changes are analogous to the rest. Joint work with Alexei Starovoitov. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:16:58 -07:00
Daniel Borkmann	f8bbbfc3b9	net: filter: add jited flag to indicate jit compiled filters This patch adds a jited flag into sk_filter struct in order to indicate whether a filter is currently jited or not. The size of sk_filter is not being expanded as the 32 bit 'len' member allows upper bits to be reused since a filter can currently only grow as large as BPF_MAXINSNS. Therefore, there's enough room also for other in future needed flags to reuse 'len' field if necessary. The jited flag also allows for having alternative interpreter functions running as currently, we can only detect jit compiled filters by testing fp->bpf_func to not equal the address of sk_run_filter(). Joint work with Alexei Starovoitov. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-31 00:45:08 -04:00
Tom Herbert	61b905da33	net: Rename skb->rxhash to skb->hash The packet hash can be considered a property of the packet, not just on RX path. This patch changes name of rxhash and l4_rxhash skbuff fields to be hash and l4_hash respectively. This includes changing uses of the field in the code which don't call the access functions. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-26 15:58:20 -04:00
Eric Dumazet	aee636c480	bpf: do not use reciprocal divide At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 17:02:08 -08:00
Russell King	df762eccba	Merge branch 'devel-stable' into for-next Conflicts: arch/arm/include/asm/atomic.h arch/arm/include/asm/hardirq.h arch/arm/kernel/smp.c	2013-11-12 10:58:59 +00:00
Ben Dooks	3460743e02	ARM: net: fix arm instruction endian-ness in bpf_jit_32.c Use <asm/opcodes.h> to correctly transform instruction byte ordering into in-memory ordering. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Dave Martin <Dave.Martin@arm.com>	2013-10-19 20:46:35 +01:00
Alexei Starovoitov	d45ed4a4e3	net: fix unsafe set_memory_rw from softirq on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] * DEADLOCK * [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-07 15:16:45 -04:00
Daniel Borkmann	aafc787e41	arm: bpf_jit: can call module_free() from any context Follow-up on module_free()/vfree() that takes care of the rest, so no longer this workaround with work_struct needed. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-20 14:03:50 -07:00
Daniel Borkmann	79617801ea	filter: bpf_jit_comp: refactor and unify BPF JIT image dump output If bpf_jit_enable > 1, then we dump the emitted JIT compiled image after creation. Currently, only SPARC and PowerPC has similar output as in the reference implementation on x86_64. Make a small helper function in order to reduce duplicated code and make the dump output uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM flen, pass and proglen are currently not shown, but would be interesting to know as well), also for future BPF JIT implementations on other archs. Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Matt Evans <matt@ozlabs.org> Cc: Eric Dumazet <eric.dumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 17:25:56 -04:00
Chen Gang	45549a68a5	ARM:net: an issue for k which is u32, never < 0 k is u32 which never < 0, need type cast, or cause issue. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 11:33:05 -04:00
Nicolas Schichan	462738f4f0	ARM: net: bpf_jit: fix emit_swap16() for non ARMv6+. The original code was generating an lsl instructions using the value of ARM_R8 (skb_headlen, possibly uninitialized if no skb_headlen access was required) as a shift amount. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-14 13:26:44 -05:00
Linus Torvalds	6be35c700f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...	2012-12-12 18:07:07 -08:00
Schichan Nicolas	fe15f3f106	ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets. The offset must be multiplied by 4 to be sure to access the correct 32bit word in the stack scratch space. For instance, a store at scratch memory cell #1 was generating the following: st r4, [sp, #1] While the correct code for this is: st r4, [sp, #4] To reproduce the bug (assuming your system has a NIC with the mac address 52:54:00:12:34:56): echo 0 > /proc/sys/net/core/bpf_jit_enable tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \ == -0x3AA" # this will capture packets as expected echo 1 > /proc/sys/net/core/bpf_jit_enable tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \ == -0x3AA" # this will not. This bug was present since the original inclusion of bpf_jit for ARM (`ddecdfce`: ARM: 7259/3: net: JIT compiler for packet filters). Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2012-12-11 00:19:29 +00:00
Schichan Nicolas	89c2e00978	ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch. Official prototype for kzalloc is: void kzalloc(size_t, gfp_t); The ARM bpf_jit code was having the assumption that it was: void kzalloc(gfp_t, size); This was resulting the use of some random GFP flags depending on the size requested and some random overflows once the really needed size was more than the value of GFP_KERNEL. This bug was present since the original inclusion of bpf_jit for ARM (`ddecdfce`: ARM: 7259/3: net: JIT compiler for packet filters). Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2012-12-11 00:16:59 +00:00
Daniel Borkmann	bf0098f22c	ARM: net: bpf_jit_32: add VLAN instructions for BPF JIT This patch is a follow-up for patch "net: filter: add vlan tag access" to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-13 18:21:10 -05:00
Daniel Borkmann	3cbe20412e	ARM: net: bpf_jit_32: add XOR instruction for BPF JIT This patch is a follow-up for patch "filter: add XOR instruction for use with X/K" that implements BPF ARM JIT parts for the BPF XOR operation. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-13 18:21:10 -05:00
Mircea Gherzan	2bea29b774	ARM: 7421/1: bpf_jit: BPF_S_ANC_ALU_XOR_X support JIT support for the XOR operation introduced by the commit `ffe06c17af`. Signed-off-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2012-06-14 15:12:13 +01:00
Mircea Gherzan	ddecdfcea0	ARM: 7259/3: net: JIT compiler for packet filters Based of Matt Evans's PPC64 implementation. The compiler generates ARM instructions but interworking is supported for Thumb2 kernels. Supports both little and big endian. Unaligned loads are emitted for ARMv6+. Not all the BPF opcodes that deal with ancillary data are supported. The scratch memory of the filter lives on the stack. Hardware integer division is used if it is available. Enabled in the same way as for x86-64 and PPC64: echo 1 > /proc/sys/net/core/bpf_jit_enable A value greater than 1 enables opcode output. Signed-off-by: Mircea Gherzan <mgherzan@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2012-03-24 09:38:56 +00:00

19 Commits