Commit Graph

10750 Commits

Author SHA1 Message Date
Teng Qin 8fe4592438 bpf: map_get_next_key to return first key on NULL
When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25 11:57:45 -04:00
Mike Maloney 472ecf084a selftests/net: Fix broken test case in psock_fanout
The error return falue form sock_fanout_open is -1, not zero.  One test
case was checking for 0 instead of -1.

Tested: Built and tested in clean client.
Signed-off-by: Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25 11:56:17 -04:00
Mike Maloney 28be04f5c1 selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID
Create two groups with PACKET_FANOUT_FLAG_UNIQUEID, add a socket to one.
Ensure that the groups can only be joined if all options are consistent
with the original except for this flag.

Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 12:46:16 -04:00
Mike Maloney 2e7a721714 selftests/net: cleanup unused parameter in psock_fanout
sock_fanout_open no longer sets the size of packet_socket ring, so stop
passing the parameter.

Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 12:46:00 -04:00
David S. Miller b0c47807d3 bpf: Add sparc support to tools and samples.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-04-22 13:01:52 -07:00
David S. Miller fb796707d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 20:23:53 -07:00
David Miller 89087c456f bpf: Fix values type used in test_maps
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.

This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.

To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.

If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.

Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2017-04-21 15:16:46 -04:00
Daniel Borkmann b1d9fc41aa bpf: add napi_id read access to __sk_buff
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).

Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.

Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.

  [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 13:53:27 -04:00
Mike Maloney c1f8d0f98c selftests/net: Fixes psock_fanout CBPF test case
'psock_fanout' has been failing since commit 4d7b9dc1f3 ("tools:
psock_lib: harden socket filter used by psock tests").  That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW.  But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.

Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame.  Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.

Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.

Fixes: 4d7b9dc1f3 ("tools: psock_lib: harden socket filter used by psock tests")
Signed-off-by: 'Mike Maloney <maloneykernel@gmail.com>'
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:39:19 -04:00
David S. Miller 7b9f6da175 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 10:35:33 -04:00
Linus Torvalds fb5e2154b7 While testing my development branch, without the fix for the pid use
after free bug, the selftest that Namhyung added triggers it. I figured
 it would be good to add the test for the bug after the fix, such that
 it does not exist without the fix.
 
 I added another patch that lets the test only test part of the pid
 filtering, and ignores the function-fork (filtering on children as well)
 if the function-fork feature does not exist. This feature is added by
 Namhyung just before he added this test. But since the test tests both
 with and without the feature, it would be good to let it not fail if
 the feature does not exist.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJY9kVEFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 nZQIAMJN51sNAnJHodKieAx6NUdnFbih7XknFZePGsGX2CHaRpPJuYRTEMIJrtds
 FSGCKOWjmmZ57xB/WYsCdH2H4cqd2TCFIeCT+6Pglk4+L2Y97idg5tzJ0+QGnDqT
 zBMd1kcmLathH5OoNsUEO5FR0QplBTb+3kVRu9XaAUgJhIlLwbF58BdtOv0l0avb
 saV/cVLosUjb4TXxwPgRZnmH9YElQ7RElf0S60JKbFTHCzyvoG0U17seFAklZOQl
 Ux0nn+LFWM+M7e7LYR3nSXnOzofDMz9r1bGGo9bgkng0Csl2Op1MFttofcsi3PvT
 FUxUGPZSEjxj3XrxXrkzzK8pRuI=
 =NEh1
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace testcase update from Steven Rostedt:
 "While testing my development branch, without the fix for the pid use
  after free bug, the selftest that Namhyung added triggers it. I
  figured it would be good to add the test for the bug after the fix,
  such that it does not exist without the fix.

  I added another patch that lets the test only test part of the pid
  filtering, and ignores the function-fork (filtering on children as
  well) if the function-fork feature does not exist. This feature is
  added by Namhyung just before he added this test. But since the test
  tests both with and without the feature, it would be good to let it
  not fail if the feature does not exist"

* tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  selftests: ftrace: Add check for function-fork before running pid filter test
  selftests: ftrace: Add a testcase for function PID filter
2017-04-18 10:19:47 -07:00
Steven Rostedt (VMware) 9ed19c7695 selftests: ftrace: Add check for function-fork before running pid filter test
Have the func-filter-pid test check for the function-fork option before
testing it. It can still test the pid filtering, but will stop before
testing the function-fork option for children inheriting the pids.
This allows the test to be added before the function-fork feature, but after
a bug fix that triggers one of the bugs the test can cause.

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 12:46:11 -04:00
Namhyung Kim 093be89a12 selftests: ftrace: Add a testcase for function PID filter
Like event pid filtering test, add function pid filtering test with the
new "function-fork" option.  It also tests it on an instance directory
so that it can verify the bug related pid filtering on instances.

Link: http://lkml.kernel.org/r/20170417024430.21194-5-namhyung@kernel.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18 12:02:36 -04:00
Martin KaFai Lau 695ba2651a bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
19644783 (19M/s)
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
6283930 (6.28M/s)

By inactive, it usually means the element is not in cache.  Hence,
there is a need to tune the PERCPU_NR_SCANS value.

This patch finds a better number of elements to
scan during each list rotation.  The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements.  This change only
affects the BPF_F_NO_COMMON_LRU map.

The test_lru_dist does not show meaningful difference
between 16 and 4.  Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate.  Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.

After this change:
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
9240324 (9.2M/s)

i.e. 6.28M/s -> 9.2M/s

The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)

> test_lru_dist zipf.100k.a0_01.out 40000 1
nr_misses: 67036 (Before) vs 67031 (After)

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau 6467acbc70 bpf: lru: Cleanup test_lru_map.c
This patch does the following cleanup on test_lru_map.c
1) Fix indentation (Replace spaces by tabs)
2) Remove redundant BPF_F_NO_COMMON_LRU test
3) Simplify some comments

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau 9746f85686 bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU
test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
It just happens to work when PERCPU_FREE_TARGET == 16.

This patch:
1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
2) Add test_lru_sanity6 to test list rotation for
   the BPF_F_NO_COMMON_LRU map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
David S. Miller 6b6cbc1471 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15 21:16:30 -04:00
Linus Torvalds 07c7016de7 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two small fixes for perf:

   - the move to support cross arch annotation introduced per arch
     initialization requirements, fullfill them for s/390 (Christian
     Borntraeger)

   - add the missing initialization to the LBR entries to avoid exposing
     random or stale data"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
  perf annotate s390: Fix perf annotate error -95 (4.10 regression)
2017-04-14 16:58:38 -07:00
David Daney b6518e6a00 tools: bpf_jit_disasm: Add option to dump JIT image to a file.
When debugging the JIT on an embedded platform or cross build
environment, libbfd may not be available, making it impossible to run
bpf_jit_disasm natively.

Add an option to emit a binary image of the JIT code to a file.  This
file can then be disassembled off line.  Typical usage in this case
might be (pasting mips64 dmesg output to cat command):

   $ cat > jit.raw
   $ bpf_jit_disasm -f jit.raw -O jit.bin
   $ mips64-linux-gnu-objdump -D -b binary -m mips:isa64r2 -EB jit.bin

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:04:03 -04:00
Ben Hutchings 4cca045768 cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
The switch that conditionally sets CPUPOWER_CAP_HAS_TURBO_RATIO and
CPUPOWER_CAP_IS_SNB flags is missing a break, so all cores get both
flags set and an assumed base clock of 100 MHz for turbo values.

Reported-by: GSR <gsr.bugs@infernal-iceberg.com>
Tested-by: GSR <gsr.bugs@infernal-iceberg.com>
References: https://bugs.debian.org/859978
Fixes: 8fb2e440b2 (cpupower: Show Intel turbo ratio support via ...)
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-04-13 14:51:10 +02:00
Rafael J. Wysocki ad0d9c3bca Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat utility fixes for v4.11 from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: update version number
  tools/power turbostat: fix impossibly large CPU%c1 value
  tools/power turbostat: turbostat.8 add missing column definitions
  tools/power turbostat: update HWP dump to decimal from hex
  tools/power turbostat: enable package THERM_INTERRUPT dump
  tools/power turbostat: show missing Core and GFX power on SKL and KBL
  tools/power turbostat: bugfix: GFXMHz column not changing
2017-04-13 14:50:11 +02:00
Len Brown 5f9bf02a58 tools/power turbostat: update version number
Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:50 -04:00
Len Brown 95149369c1 tools/power turbostat: fix impossibly large CPU%c1 value
Most CPUs do not have a hardware c1 counter,
and so turbostat derives c1 residency:

c1 = TSC - MPERF - other_core_cstate_counters

As it is not possible to atomically read these coutners,
measurement jitter can case this calcuation to "go negative"
when very close to 0.  Turbostat detect that case and
simply prints c1 = 0.00%

But that check neglected to account for systems where the TSC
crystal clock domain and the MPERF BCLK domain are differ by
a small amount.  That allowed very small negative c1 numbers
to escape this check and be printed as huge positve numbers.

This code begs for a bit of cleanup, but this patch
is the minimal change to fix the issue.

Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:50 -04:00
Doug Smythies ab23d1146a tools/power turbostat: turbostat.8 add missing column definitions
Add GFX%rc6 and GFXMHz to the column descriptions section
of the turbostat man page.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:49 -04:00
Len Brown 6dbd25a245 tools/power turbostat: update HWP dump to decimal from hex
Syntax only.

The HWP CAPABILTIES and REQUEST ratios are more easily
viewed in decimal -- just multiply by 100 and you get MHz...

new:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 35 guar 27 eff 12 low 1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 1 max 35 des 0 epp 0x80 window 0x0 pkg 0x0)

old:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 0x23 guar 0x1b eff 0xc low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 0x1 max 0x23 des 0x0 epp 0x80 window 0x0 pkg 0x0)

Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:35 -04:00
Len Brown f4896fa502 tools/power turbostat: enable package THERM_INTERRUPT dump
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884b0800 (25 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)

Enable the same per-core output, but hide it behind --debug
because it is too verbose on big systems.

Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:34 -04:00
Len Brown 818249216d tools/power turbostat: show missing Core and GFX power on SKL and KBL
While the current SDM is silent on the matter, the Core and GFX
RAPL power meters on SKL and KBL appear to work -- so show them.

Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2017-04-12 20:03:19 -04:00
Ingo Molnar 0718b33406 perf/urgent annotate fix for s390:
- The move to support cross arch annotation introduced per arch
   initialization requirements, fullfill them for s/390 (Christian Borntraeger)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJY7P2YAAoJENZQFvNTUqpAeoYP/0n3UQ+ed2QU0dbOcYTXLKOW
 TUqrb4OzXtfSE3Z9FERC7Q6Giq6GB88aFbWxDCup0F3e4U7bzHvAXnrBFNoqH+yx
 1cG9GB/3Oh84CH811ZPdGWkPhvpx8Lvg5rn0vXM23oqGZ5B1tzZwX6WYW319gHqV
 AyGqZ6NvYhJOe2xdpIxfK0KzFtGpqZQrnJE3qhy50SscG5y9R2qk8IaY23ibBZgd
 5XAE37CEU1T6IkeI9rlspDR2noeMFt1fQ6TezmQNr4YhSkHvn5Buww7mEhceIn9H
 raA/rEBI/NPhDfkBo10WDCBGOc4K1KBH3hcSY5Jtj29awbCEUxX8QmuBlhZHgBWd
 Ef9tEX/8NkxHQXojmW6gzPTzwcs0cyPA16tJvVZRTkJKVdgFofsUqEZIhu89LCcZ
 ay1HD/sOUA+d6szVKs8YrjAf2RKhfl2wQwDRP4Gzykysaz02jaWwIiqK0dI3ZN0q
 HP/5PQfaDOsGiNAJlGXhThutY36tmp5+W5VJrvBN3x8YoMIzeqH88HllgMgdQXFQ
 OqobTLDHnd33ROPkmsbehuPi2axbonYSP7TkCTLRWJCGYyYorZkFpRSk5D88nW07
 1zn8C8/mzcQY+dE3eRevF162+60LijkNIQPH1dXkpcwiRzpzBrcG36LulZ6c6l8o
 adYXzJ3vl04eYjpLphHx
 =Te4H
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.11-20170411' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull 'perf annotate' fix for s390:

- The move to support cross arch annotation introduced per arch
  initialization requirements, fullfill them for s/390 (Christian Borntraeger)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 21:41:39 +02:00
Alexander Alemayhu 3c60a531b9 bpf: fix comment typo
o s/bpf_bpf_get_socket_cookie/bpf_get_socket_cookie

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-09 18:26:08 -07:00
Linus Torvalds 894ca30cf6 powerpc fixes for 4.11 #7
Headed to stable:
  - disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash
    triggered by a hostile guest, but only in configurations that no one uses
  - don't try to fix up misaligned load-with-reservation instructions
  - fix flush_(d|i)cache_range() called from modules on little endian kernels
  - add missing global TLB invalidate if cxl is active
  - fix missing preempt_disable() in crc32c-vpmsum
 
 And a fix for selftests build changes that went in this release:
  - selftests/powerpc: Fix standalone powerpc build
 
 Thanks to:
   Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJY6LIKAAoJEFHr6jzI4aWAhfcQAKORHx/tJf9w8KqcfSfKfeEL
 O8cZEl5/N3ArNXVM5J5QK5KnMVHnoWWR3FWYwntOjt3RJywjJYJ02YvhOVvt4q+M
 YinRS34KzAhnT1f526zx97v0BGqi//UJamrcFBUBTd4rLuHGbol7fdtWHVrsMYa0
 KWQ+ooPLEpGDk4I3sDz37yeJBQXVpyhC/UF8vzHpvHGPvIQ8Dw8rfWwOZ0HooJuZ
 ewKdkeIsYF8SrM461c1GhOI0VXB0q+CMn9mzIaEKMuZMhHDKyiaM5rm8mWXapzcT
 HsCQKlF9X9YHAbhbSbz9DGvNCEYaW7T4vnudSNHjQaAJlA4HsmeRwWXy4+zqZuPc
 rIbRIFZAyV3wYowN7j3P6Se3lLBDMmlHZvVkygJnwoaR4rmoujePGwdAv8ZH4Udn
 hrbieC41HKVxcm5t3whIDOcHmxaAo1MDqmrVhyxJSjgnkdBtN/gnZXvHDb0VeOJV
 9wFGGE8WvMXnTKEcjM2l+a14CuOrV/wRbHQ1B1O0Kfk613cPrukMYab6eLPqyJzF
 lmkCm1o46bib5oBOmvlqK+5oVuwNyfHmJSzvL+VOylhLVbJPmFJUhHQFssCvsTUf
 k36ZAUxH4fbz1TzAPipXl+wrkE/yzthGmA9FTC9hLkYE/rzvrZt9IKowFw1mq5n/
 2zFabXQBl5JBQ4hdL54f
 =bTuf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.11:

  Headed to stable:

   - disable HFSCR[TM] if TM is not supported, fixes a potential host
     kernel crash triggered by a hostile guest, but only in
     configurations that no one uses

   - don't try to fix up misaligned load-with-reservation instructions

   - fix flush_(d|i)cache_range() called from modules on little endian
     kernels

   - add missing global TLB invalidate if cxl is active

   - fix missing preempt_disable() in crc32c-vpmsum

  And a fix for selftests build changes that went in this release:

   - selftests/powerpc: Fix standalone powerpc build

  Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
  Paul Mackerras"

* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
  powerpc/mm: Add missing global TLB invalidate if cxl is active
  powerpc/64: Fix flush_(d|i)cache_range() called from modules
  powerpc: Don't try to fix up misaligned load-with-reservation instructions
  powerpc: Disable HFSCR[TM] if TM is not supported
  selftests/powerpc: Fix standalone powerpc build
2017-04-08 11:06:12 -07:00
Christian Borntraeger 3c1a427954 perf annotate s390: Fix perf annotate error -95 (4.10 regression)
since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b5184 ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Fixes: 786c1b5184 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-07 12:33:10 -03:00
Alexei Starovoitov 89c0a36130 selftests/bpf: fix merge conflict
fix artifact of merge resolution

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 12:21:59 -07:00
David S. Miller 6f14f443d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 08:24:51 -07:00
LABBE Corentin af0e54619d selftests: add a generic testsuite for ethernet device
This patch add a generic testsuite for testing ethernet network device driver.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-05 08:30:11 -07:00
Alexei Starovoitov 3782161362 selftests/bpf: add l4 load balancer test based on sched_cls
this l4lb demo is a comprehensive test case for LLVM codegen and
kernel verifier. It's using fully inlined jhash(), complex packet
parsing and multiple map lookups of different types to stress
llvm and verifier.
The map sizes, map population and test vectors are artificial to
exercise different paths through the bpf program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:45:57 -07:00
Alexei Starovoitov 8d48f5e427 selftests/bpf: add a test for basic XDP functionality
add C test for xdp_adjust_head(), packet rewrite and map lookups

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:45:57 -07:00
Alexei Starovoitov 6882804c91 selftests/bpf: add a test for overlapping packet range checks
add simple C test case for llvm and verifier range check fix from
commit b1977682a3 ("bpf: improve verifier packet range checks")

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:45:57 -07:00
Alexei Starovoitov dd26b7f54a tools/lib/bpf: expose bpf_program__set_type()
expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:45:57 -07:00
Alexei Starovoitov 3084887378 tools/lib/bpf: add support for BPF_PROG_TEST_RUN command
add support for BPF_PROG_TEST_RUN command to libbpf.a

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:45:57 -07:00
Daniel Borkmann 02ea80b185 bpf: add various verifier test cases for self-tests
Add a couple of test cases, for example, probing for xadd on a spilled
pointer to packet and map_value_adj register, various other map_value_adj
tests including the unaligned load/store, and trying out pointer arithmetic
on map_value_adj register itself. For the unaligned load/store, we need
to figure out whether the architecture has efficient unaligned access and
need to mark affected tests accordingly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:36:37 -07:00
Michael Ellerman 2db2c250dd selftests/powerpc: Fix standalone powerpc build
The changes to enable building with a separate output directory, in
commit a8ba798bc8 ("selftests: enable O and KBUILD_OUTPUT") broke
building the powerpc selftests on their own, eg:

 $ cd tools/testing/selftests/powerpc; make

It was partially fixed in commit e53aff45c4 ("selftests: lib.mk Fix
individual test builds"), which defined OUTPUT for standalone tests. But
that only defines OUTPUT within the Makefile, the value is not exported
so sub-shells can't see it. We could export OUTPUT, but it's actually
cleaner to just expand the value of OUTPUT before we invoke the shell.

Fixes: a8ba798bc8 ("selftests: enable O and KBUILD_OUTPUT")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-27 15:11:44 +11:00
Alexei Starovoitov b1977682a3 bpf: improve verifier packet range checks
llvm can optimize the 'if (ptr > data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 > data_end)
  return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 > data_end)
  return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.

Reported-by: Huapeng Zhou <hzhou@fb.com>
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 20:51:28 -07:00
Chenbo Feng 6acc5c2910 Add a eBPF helper function to retrieve socket uid
Returns the owner uid of the socket inside a sk_buff. This is useful to
perform per-UID accounting of network traffic or per-UID packet
filtering. The socket need to be a fullsock otherwise overflowuid is
returned.

Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 17:01:02 -07:00
Chenbo Feng 91b8270f2a Add a helper function to get socket cookie in eBPF
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 17:01:02 -07:00
David S. Miller 16ae1f2236 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmmii.c
	drivers/net/hyperv/netvsc.c
	kernel/bpf/hashtab.c

Almost entirely overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 16:41:27 -07:00
Linus Torvalds f341d9f08a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several netfilter fixes from Pablo and the crew:
      - Handle fragmented packets properly in netfilter conntrack, from
        Florian Westphal.
      - Fix SCTP ICMP packet handling, from Ying Xue.
      - Fix big-endian bug in nftables, from Liping Zhang.
      - Fix alignment of fake conntrack entry, from Steven Rostedt.

 2) Fix feature flags setting in fjes driver, from Taku Izumi.

 3) Openvswitch ipv6 tunnel source address not set properly, from Or
    Gerlitz.

 4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky.

 5) sk->sk_frag.page not released properly in some cases, from Eric
    Dumazet.

 6) Fix RTNL deadlocks in nl80211, from Johannes Berg.

 7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu.

 8) Cure improper inflight handling during AF_UNIX GC, from Andrey
    Ulanov.

 9) sch_dsmark doesn't write to packet headers properly, from Eric
    Dumazet.

10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas
    Yeganeh.

11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren.

12) Fix nametbl deadlock in tipc, from Ying Xue.

13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal
    Pressman.

14) Fix reset of internal PHYs in bcmgenet, from Doug Berger.

15) Fix hashmap allocation handling, from Alexei Starovoitov.

16) nl_fib_input() needs stronger netlink message length checking, from
    Eric Dumazet.

17) Fix double-free of sk->sk_filter during sock clone, from Daniel
    Borkmann.

18) Fix RX checksum offloading in aquantia driver, from Pavel Belous.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  net:ethernet:aquantia: Fix for RX checksum offload.
  amd-xgbe: Fix the ECC-related bit position definitions
  sfc: cleanup a condition in efx_udp_tunnel_del()
  Bluetooth: btqcomsmd: fix compile-test dependency
  inet: frag: release spinlock before calling icmp_send()
  tcp: initialize icsk_ack.lrcvtime at session start time
  genetlink: fix counting regression on ctrl_dumpfamily()
  socket, bpf: fix sk_filter use after free in sk_clone_lock
  ipv4: provide stronger user input validation in nl_fib_input()
  bpf: fix hashmap extra_elems logic
  enic: update enic maintainers
  net: bcmgenet: remove bcmgenet_internal_phy_setup()
  ipv6: make sure to initialize sockc.tsflags before first use
  fjes: Do not load fjes driver if extended socket device is not power on.
  fjes: Do not load fjes driver if system does not have extended socket device.
  net/mlx5e: Count LRO packets correctly
  net/mlx5e: Count GSO packets correctly
  net/mlx5: Increase number of max QPs in default profile
  net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
  net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
  ...
2017-03-23 11:29:49 -07:00
Martin KaFai Lau fb30d4b712 bpf: Add tests for map-in-map
Test cases for array of maps and hash of maps.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:45:45 -07:00
Alexei Starovoitov 8c290e60fa bpf: fix hashmap extra_elems logic
In both kmalloc and prealloc mode the bpf_map_update_elem() is using
per-cpu extra_elems to do atomic update when the map is full.
There are two issues with it. The logic can be misused, since it allows
max_entries+num_cpus elements to be present in the map. And alloc_extra_elems()
at map creation time can fail percpu alloc for large map values with a warn:
WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60
illegal size (32824) or align (8) for percpu allocation

The fixes for both of these issues are different for kmalloc and prealloc modes.
For prealloc mode allocate extra num_possible_cpus elements and store
their pointers into extra_elems array instead of actual elements.
Hence we can use these hidden(spare) elements not only when the map is full
but during bpf_map_update_elem() that replaces existing element too.
That also improves performance, since pcpu_freelist_pop/push is avoided.
Unfortunately this approach cannot be used for kmalloc mode which needs
to kfree elements after rcu grace period. Therefore switch it back to normal
kmalloc even when full and old element exists like it was prior to
commit 6c90598174 ("bpf: pre-allocate hash map elements").

Add tests to check for over max_entries and large map values.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 14:12:18 -07:00
Zi Shen Lim e8f1f34a34 selftests/bpf: fix broken build, take 2
Merge of 'linux-kselftest-4.11-rc1':

1. Partially removed use of 'test_objs' target, breaking force rebuild of
BPFOBJ, introduced in commit d498f8719a ("bpf: Rebuild bpf.o for any
dependency update").

  Update target so dependency on BPFOBJ is restored.

2. Introduced commit 2047f1d8ba ("selftests: Fix the .c linking rule")
which fixes order of LDLIBS.

  Commit d02d8986a7 ("bpf: Always test unprivileged programs") added
libcap dependency into CFLAGS. Use LDLIBS instead to fix linking of
test_verifier.

3. Introduced commit d83c3ba0b9 ("selftests: Fix selftests build to
just build, not run tests").

  Reordering the Makefile allows us to remove the 'all' target.

Tested both:
    selftests/bpf$ make
and
    selftests$ make TARGETS=bpf
on Ubuntu 16.04.2.

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 18:57:58 -07:00
Linus Torvalds a7fc726bb2 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of perf related fixes:

   - fix a CR4.PCE propagation issue caused by usage of mm instead of
     active_mm and therefore propagated the wrong value.

   - perf core fixes, which plug a use-after-free issue and make the
     event inheritance on fork more robust.

   - a tooling fix for symbol handling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
  x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
  perf/core: Better explain the inherit magic
  perf/core: Simplify perf_event_free_task()
  perf/core: Fix event inheritance on fork()
  perf/core: Fix use-after-free in perf_release()
2017-03-17 13:59:52 -07:00