Commit Graph

843798 Commits

Author SHA1 Message Date
Sudarsana Reddy Kalluru
cedeac9df4 qed: Add support for Timestamping the unicast PTP packets.
This patch adds driver changes to detect/timestamp the unicast PTP packets.

Changes from previous version:
-------------------------------
v2: Defined a macro for unicast ptp param mask.

Please consider applying this to "net-next".

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:30:41 -07:00
Catherine Sullivan
3c13ce74b6 gve: Fix u64_stats_sync to initialize start
u64_stats_fetch_begin needs to initialize start.

Signed-off-by: Catherine Sullivan <csully@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:27:46 -07:00
Mahesh Bandewar
d62962b37c loopback: fix lockdep splat
dev_init_scheduler() and dev_activate() expect the caller to
hold RTNL. Since we don't want blackhole device to be initialized
per ns, we are initializing at init.

[    3.855027] Call Trace:
[    3.855034]  dump_stack+0x67/0x95
[    3.855037]  lockdep_rcu_suspicious+0xd5/0x110
[    3.855044]  dev_init_scheduler+0xe3/0x120
[    3.855048]  ? net_olddevs_init+0x60/0x60
[    3.855050]  blackhole_netdev_init+0x45/0x6e
[    3.855052]  do_one_initcall+0x6c/0x2fa
[    3.855058]  ? rcu_read_lock_sched_held+0x8c/0xa0
[    3.855066]  kernel_init_freeable+0x1e5/0x288
[    3.855071]  ? rest_init+0x260/0x260
[    3.855074]  kernel_init+0xf/0x180
[    3.855076]  ? rest_init+0x260/0x260
[    3.855078]  ret_from_fork+0x24/0x30

Fixes: 4de83b88c6 ("loopback: create blackhole net device similar to loopack.")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:24:38 -07:00
Yishai Hadas
e4075c4428 net/mlx5: Expose device definitions for object events
Expose an extra device definitions for objects events.

It includes: object_type values for legacy objects and generic data
header for any other object.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:02:45 +03:00
Yishai Hadas
4e0e2ea188 net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:00:20 +03:00
Yishai Hadas
70a43d3fd4 net/mlx5: Report a CQ error event only when a handler was set
Report a CQ error event only when a handler was set.

This enables mlx5_ib to not set a handler upon CQ creation and use some
other mechanism to get this event as of other events by the
mlx5_eq_notifier_register API.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:40 +03:00
Yishai Hadas
38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Yishai Hadas
c0670781f5 net/mlx5: Expose the API to register for ANY event
Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:56:29 +03:00
Yishai Hadas
b9a7ba5562 net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:55:45 +03:00
Yishai Hadas
1d49ce1e05 net/mlx5: Fix mlx5_core_destroy_cq() error flow
The firmware command to destroy a CQ might fail when the object is
referenced by other object and the ref count is managed by the firmware.

To enable a second successful destruction post the first failure need to
change  mlx5_eq_del_cq() to be a void function.

As an error in mlx5_eq_del_cq() is quite fatal from the option to
recover, a debug message inside it should be good enougth and it was
changed to be void.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:54:57 +03:00
Daniel Borkmann
e5a3e259ef Merge branch 'bpf-tcp-rtt-hook'
Stanislav Fomichev says:

====================
Congestion control team would like to have a periodic callback to
track some TCP statistics. Let's add a sock_ops callback that can be
selectively enabled on a socket by socket basis and is executed for
every RTT. BPF program frequency can be further controlled by calling
bpf_ktime_get_ns and bailing out early.

I run neper tcp_stream and tcp_rr tests with the sample program
from the last patch and didn't observe any noticeable performance
difference.

v2:
* add a comment about second accept() in selftest (Yonghong Song)
* refer to tcp_bpf.readme in sample program (Yonghong Song)
====================

Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:03 +02:00
Stanislav Fomichev
d78e3f0614 samples/bpf: fix tcp_bpf.readme detach command
Copy-paste, should be detach, not attach.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00
Stanislav Fomichev
395338843d samples/bpf: add sample program that periodically dumps TCP stats
Uses new RTT callback to dump stats every second.

$ mkdir -p /tmp/cgroupv2
$ mount -t cgroup2 none /tmp/cgroupv2
$ mkdir -p /tmp/cgroupv2/foo
$ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
$ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
$ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
$ bpftool prog tracelog
$ # run neper/netperf/etc

Used neper to compare performance with and without this program attached
and didn't see any noticeable performance impact.

Sample output:
  <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
  <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
  <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
  <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
  <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
  <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00
Stanislav Fomichev
b55873984d selftests/bpf: test BPF_SOCK_OPS_RTT_CB
Make sure the callback is invoked for syn-ack and data packet.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00
Stanislav Fomichev
692cbaa99f bpf/tools: sync bpf.h
Sync new bpf_tcp_sock fields and new BPF_PROG_TYPE_SOCK_OPS RTT callback.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00
Stanislav Fomichev
c2cb5e82a7 bpf: add icsk_retransmits to bpf_tcp_sock
Add some inet_connection_sock fields to bpf_tcp_sock that might be useful
for debugging congestion control issues.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00
Stanislav Fomichev
0357746d1e bpf: add dsack_dups/delivered{, _ce} to bpf_tcp_sock
Add more fields to bpf_tcp_sock that might be useful for debugging
congestion control issues.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:01 +02:00
Stanislav Fomichev
2377b81de5 bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
We've added bpf_tcp_sock member to bpf_sock_ops and don't expect
any new tcp_sock fields in bpf_sock_ops. Let's remove
CONVERT_COMMON_TCP_SOCK_FIELDS so bpf_tcp_sock can be independently
extended.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:01 +02:00
Stanislav Fomichev
23729ff231 bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
Performance impact should be minimal because it's under a new
BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled.

Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:01 +02:00
Jiri Benc
d2f5bbbc35 selftests: bpf: standardize to static __always_inline
The progs for bpf selftests use several different notations to force
function inlining. Standardize to what most of them use,
static __always_inline.

Suggested-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 15:06:33 +02:00
brakmo
71634d7f92 bpf: Add support for fq's EDT to HBM
Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth
Manager). Includes a new BPF program supporting EDT, and also updates
corresponding programs.

It will drop packets with an EDT of more than 500us in the future
unless the packet belongs to a flow with less than 2 packets in flight.
This is done so each flow has at least 2 packets in flight, so they
will not starve, and also to help prevent delayed ACK timeouts.

It will also work with ECN enabled traffic, where the packets will be
CE marked if their EDT is more than 50us in the future.

The table below shows some performance numbers. The flows are back to
back RPCS. One server sending to another, either 2 or 4 flows.
One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more
than one flow of a given RPC size, the numbers represent averages.

The rate limit applies to all flows (they are in the same cgroup).
Tests ending with "-edt" ran with the new BPF program supporting EDT.
Tests ending with "-hbt" ran on top HBT qdisc with the specified rate
(i.e. no HBM). The other tests ran with the HBM BPF program included
in the HBM patch-set.

EDT has limited value when using DCTCP, but it helps in many cases when
using Cubic. It usually achieves larger link utilization and lower
99% latencies for the 1MB RPCs.
HBM ends up queueing a lot of packets with its default parameter values,
reducing the goodput of the 10KB RPCs and increasing their latency. Also,
the RTTs seen by the flows are quite large.

                         Aggr              10K  10K  10K   1MB  1MB  1MB
         Limit           rate drops  RTT  rate  P90  P99  rate  P90  P99
Test      rate  Flows    Mbps   %     us  Mbps   us   us  Mbps   ms   ms
--------  ----  -----    ---- -----  ---  ---- ---- ----  ---- ---- ----
cubic       1G    2       904  0.02  108   257  511  539   647 13.4 24.5
cubic-edt   1G    2       982  0.01  156   239  656  967   743 14.0 17.2
dctcp       1G    2       977  0.00  105   324  408  744   653 14.5 15.9
dctcp-edt   1G    2       981  0.01  142   321  417  811   660 15.7 17.0
cubic-htb   1G    2       919  0.00 1825    40 2822 4140   879  9.7  9.9

cubic     200M    2       155  0.30  220    81  532  655    74  283  450
cubic-edt 200M    2       188  0.02  222    87 1035 1095   101   84   85
dctcp     200M    2       188  0.03  111    77  912  939   111   76  325
dctcp-edt 200M    2       188  0.03  217    74 1416 1738   114   76   79
cubic-htb 200M    2       188  0.00 5015     8 14ms 15ms   180   48   50

cubic       1G    4       952  0.03  110   165  516  546   262   38  154
cubic-edt   1G    4       973  0.01  190   111 1034 1314   287   65   79
dctcp       1G    4       951  0.00  103   180  617  905   257   37   38
dctcp-edt   1G    4       967  0.00  163   151  732 1126   272   43   55
cubic-htb   1G    4       914  0.00 3249    13  7ms  8ms   300   29   34

cubic       5G    4      4236  0.00  134   305  490  624  1310   10   17
cubic-edt   5G    4      4865  0.00  156   306  425  759  1520   10   16
dctcp       5G    4      4936  0.00  128   485  221  409  1484    7    9
dctcp-edt   5G    4      4924  0.00  148   390  392  623  1508   11   26

v1 -> v2: Incorporated Andrii's suggestions
v2 -> v3: Incorporated Yonghong's suggestions
v3 -> v4: Removed credit update that is not needed

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 15:03:00 +02:00
Leo Yan
33bae185f7 bpf, libbpf, smatch: Fix potential NULL pointer dereference
Based on the following report from Smatch, fix the potential NULL
pointer dereference check:

  tools/lib/bpf/libbpf.c:3493
  bpf_prog_load_xattr() warn: variable dereferenced before check 'attr'
  (see line 3483)

  3479 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
  3480                         struct bpf_object **pobj, int *prog_fd)
  3481 {
  3482         struct bpf_object_open_attr open_attr = {
  3483                 .file           = attr->file,
  3484                 .prog_type      = attr->prog_type,
                                         ^^^^^^
  3485         };

At the head of function, it directly access 'attr' without checking
if it's NULL pointer. This patch moves the values assignment after
validating 'attr' and 'attr->file'.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 12:17:29 +02:00
Andrii Nakryiko
cdfc7f888c libbpf: fix GCC8 warning for strncpy
GCC8 started emitting warning about using strncpy with number of bytes
exactly equal destination size, which is generally unsafe, as can lead
to non-zero terminated string being copied. Use IFNAMSIZ - 1 as number
of bytes to ensure name is always zero-terminated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 11:52:45 +02:00
Alexei Starovoitov
a3ce685dd0 bpf: fix precision tracking
When equivalent state is found the current state needs to propagate precision marks.
Otherwise the verifier will prune the search incorrectly.

There is a price for correctness:
                      before      before    broken    fixed
                      cnst spill  precise   precise
bpf_lb-DLB_L3.o       1923        8128      1863      1898
bpf_lb-DLB_L4.o       3077        6707      2468      2666
bpf_lb-DUNKNOWN.o     1062        1062      544       544
bpf_lxc-DDROP_ALL.o   166729      380712    22629     36823
bpf_lxc-DUNKNOWN.o    174607      440652    28805     45325
bpf_netdev.o          8407        31904     6801      7002
bpf_overlay.o         5420        23569     4754      4858
bpf_lxc_jit.o         39389       359445    50925     69631
Overall precision tracking is still very effective.

Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Reported-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 11:12:14 +02:00
Petr Machata
dbcdb61aaf mlxsw: spectrum_ptp: Fix validation in mlxsw_sp1_ptp_packet_finish()
Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates
whether the corresponding port is still valid. However the condition is
incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to
compare it to skb->dev.

The condition needs to check whether the port is present and skb->dev still
refers to that port (or else is NULL). If that does not hold, bail out.
Add a pair of parentheses to fix the condition.

Fixes: d92e4e6e33 ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:31:20 -07:00
Heiner Kallweit
c782e204f7 r8169: add random MAC address fallback
It was reported that the GPD MicroPC is broken in a way that no valid
MAC address can be read from the network chip. The vendor driver deals
with this by assigning a random MAC address as fallback. So let's do
the same.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:28:19 -07:00
Heiner Kallweit
7424edbb55 Revert "r8169: improve handling VLAN tag"
This reverts commit 759d095741.

The patch was based on a misunderstanding. As Al Viro pointed out [0]
it's simply wrong on big endian. So let's revert it.

[0] https://marc.info/?t=156200975600004&r=1&w=2

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:27:33 -07:00
Martin Blumenstingl
cc5e92c223 net: stmmac: make "snps,reset-delays-us" optional again
Commit 760f1dc295 ("net: stmmac: add sanity check to
device_property_read_u32_array call") introduced error checking of the
device_property_read_u32_array() call in stmmac_mdio_reset().
This results in the following error when the "snps,reset-delays-us"
property is not defined in devicetree:
  invalid property snps,reset-delays-us

This sanity check made sense until commit 84ce4d0f9f ("net: stmmac:
initialize the reset delay array") ensured that there are fallback
values for the reset delay if the "snps,reset-delays-us" property is
absent. That was at the cost of making that property mandatory though.

Drop the sanity check for device_property_read_u32_array() and thus make
the "snps,reset-delays-us" property optional again (avoiding the error
message while loading the stmmac driver with a .dtb where the property
is absent).

Fixes: 760f1dc295 ("net: stmmac: add sanity check to device_property_read_u32_array call")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:23:43 -07:00
Eric Dumazet
b8bd72d317 bonding/main: fix NULL dereference in bond_select_active_slave()
A bonding master can be up while best_slave is NULL.

[12105.636318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[12105.638204] mlx4_en: eth1: Linkstate event 1 -> 1
[12105.648984] IP: bond_select_active_slave+0x125/0x250
[12105.653977] PGD 0 P4D 0
[12105.656572] Oops: 0000 [#1] SMP PTI
[12105.660487] gsmi: Log Shutdown Reason 0x03
[12105.664620] Modules linked in: kvm_intel loop act_mirred uhaul vfat fat stg_standard_ftl stg_megablocks stg_idt stg_hdi stg elephant_dev_num stg_idt_eeprom w1_therm wire i2c_mux_pca954x i2c_mux mlx4_i2c i2c_usb cdc_acm ehci_pci ehci_hcd i2c_iimc mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core [last unloaded: kvm_intel]
[12105.685686] mlx4_core 0000:03:00.0: dispatching link up event for port 2
[12105.685700] mlx4_en: eth2: Linkstate event 2 -> 1
[12105.685700] mlx4_en: eth2: Link Up (linkstate)
[12105.724452] Workqueue: bond0 bond_mii_monitor
[12105.728854] RIP: 0010:bond_select_active_slave+0x125/0x250
[12105.734355] RSP: 0018:ffffaf146a81fd88 EFLAGS: 00010246
[12105.739637] RAX: 0000000000000003 RBX: ffff8c62b03c6900 RCX: 0000000000000000
[12105.746838] RDX: 0000000000000000 RSI: ffffaf146a81fd08 RDI: ffff8c62b03c6000
[12105.754054] RBP: ffffaf146a81fdb8 R08: 0000000000000001 R09: ffff8c517d387600
[12105.761299] R10: 00000000001075d9 R11: ffffffffaceba92f R12: 0000000000000000
[12105.768553] R13: ffff8c8240ae4800 R14: 0000000000000000 R15: 0000000000000000
[12105.775748] FS:  0000000000000000(0000) GS:ffff8c62bfa40000(0000) knlGS:0000000000000000
[12105.783892] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12105.789716] CR2: 0000000000000000 CR3: 0000000d0520e001 CR4: 00000000001626f0
[12105.796976] Call Trace:
[12105.799446]  [<ffffffffac31d387>] bond_mii_monitor+0x497/0x6f0
[12105.805317]  [<ffffffffabd42643>] process_one_work+0x143/0x370
[12105.811225]  [<ffffffffabd42c7a>] worker_thread+0x4a/0x360
[12105.816761]  [<ffffffffabd48bc5>] kthread+0x105/0x140
[12105.821865]  [<ffffffffabd42c30>] ? rescuer_thread+0x380/0x380
[12105.827757]  [<ffffffffabd48ac0>] ? kthread_associate_blkcg+0xc0/0xc0
[12105.834266]  [<ffffffffac600241>] ret_from_fork+0x51/0x60

Fixes: e2a7420df2 ("bonding/main: convert to using slave printk macros")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Cc: Jarod Wilson <jarod@redhat.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:14:48 -07:00
Xin Long
d2c3a4ba25 tipc: remove ub->ubsock checks
Both tipc_udp_enable and tipc_udp_disable are called under rtnl_lock,
ub->ubsock could never be NULL in tipc_udp_disable and cleanup_bearer,
so remove the check.

Also remove the one in tipc_udp_enable by adding "free" label.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 15:13:15 -07:00
Stefano Brivio
885b8b4dbb ipv4: Fix off-by-one in route dump counter without netlink strict checking
In commit ee28906fd7 ("ipv4: Dump route exceptions if requested") I
added a counter of per-node dumped routes (including actual routes and
exceptions), analogous to the existing counter for dumped nodes. Dumping
exceptions means we need to also keep track of how many routes are dumped
for each node: this would be just one route per node, without exceptions.

When netlink strict checking is not enabled, we dump both routes and
exceptions at the same time: the RTM_F_CLONED flag is not used as a
filter. In this case, the per-node counter 'i_fa' is incremented by one
to track the single dumped route, then also incremented by one for each
exception dumped, and then stored as netlink callback argument as skip
counter, 's_fa', to be used when a partial dump operation restarts.

The per-node counter needs to be increased by one also when we skip a
route (exception) due to a previous non-zero skip counter, because it
needs to match the existing skip counter, if we are dumping both routes
and exceptions. I missed this, and only incremented the counter, for
regular routes, if the previous skip counter was zero. This means that,
in case of a mixed dump, partial dump operations after the first one
will start with a mismatching skip counter value, one less than expected.

This means in turn that the first exception for a given node is skipped
every time a partial dump operation restarts, if netlink strict checking
is not enabled (iproute < 5.0).

It turns out I didn't repeat the test in its final version, commit
de755a8513 ("selftests: pmtu: Introduce list_flush_ipv4_exception test
case"), which also counts the number of route exceptions returned, with
iproute2 versions < 5.0 -- I was instead using the equivalent of the IPv6
test as it was before commit b964641e99 ("selftests: pmtu: Make
list_flush_ipv6_exception test more demanding").

Always increment the per-node counter by one if we previously dumped
a regular route, so that it matches the current skip counter.

Fixes: ee28906fd7 ("ipv4: Dump route exceptions if requested")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 14:06:47 -07:00
René van Dorst
cce581a0c3 net: ethernet: mediatek: Allow non TRGMII mode with MT7621 DDR2 devices
No reason to error out on a MT7621 device with DDR2 memory when non
TRGMII mode is selected.
Only MT7621 DDR2 clock setup is not supported for TRGMII mode.
But non TRGMII mode doesn't need any special clock setup.

Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 14:05:44 -07:00
David Howells
3427beb637 rxrpc: Fix uninitialized error code in rxrpc_send_data_packet()
With gcc 4.1:

    net/rxrpc/output.c: In function ‘rxrpc_send_data_packet’:
    net/rxrpc/output.c:338: warning: ‘ret’ may be used uninitialized in this function

Indeed, if the first jump to the send_fragmentable label is made, and
the address family is not handled in the switch() statement, ret will be
used uninitialized.

Fix this by BUG()'ing as is done in other places in rxrpc where internal
support for future address families will need adding.  It should not be
possible to reach this normally as the address families are checked
up-front.

Fixes: 5a924b8951 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 12:09:09 -07:00
Colin Ian King
23ec8eaf46 nfc: st-nci: remove redundant assignment to variable r
The variable r is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 12:00:50 -07:00
Xue Chaojing
83b6a85bbb hinic: remove standard netdev stats
This patch removes standard netdev stats in ethtool -S.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 11:12:13 -07:00
Jose Abreu
b432bdb6c6 net: stmmac: Re-word Kconfig entry
We support many speeds and it doesn't make much sense to list them all
in the Kconfig. Let's just call it Multi-Gigabit.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-02 11:06:12 -07:00
David S. Miller
337d1ccb3d Merge branch 'Add-gve-driver'
Catherine Sullivan says:

====================
Add gve driver

This patch series adds the gve driver which will support the
Compute Engine Virtual NIC that will be available in the future.

v2:
- Patch 1:
  - Remove gve_size_assert.h and use static_assert instead.
  - Loop forever instead of bugging if the device won't reset
  - Use module_pci_driver
- Patch 2:
  - Use be16_to_cpu in the RX Seq No define
  - Remove unneeded ndo_change_mtu
- Patch 3:
  - No Changes
- Patch 4:
  - Instead of checking netif_carrier_ok in ethtool stats, just make sure

v3:
- Patch 1:
  - Remove X86 dep
- Patch 2:
  - No changes
- Patch 3:
  - No changes
- Patch 4:
  - Remove unneeded memsets in ethtool stats

v4:
- Patch 1:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
  - Explicitly add padding to gve_adminq_set_driver_parameter
  - Use static where appropriate
- Patch 2:
  - Use u64_stats_sync
  - Explicity add padding to gve_adminq_create_rx_queue
  - Fix some enianness typing issues found by kbuild
  - Use static where appropriate
  - Remove unused variables
- Patch 3:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
- Patch 4:
  - Use u64_stats_sync
  - Use static where appropriate
Warnings reported by:
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
e5b845dc79 gve: Add ethtool support
Add support for the following ethtool commands:

ethtool -s|--change devname [msglvl N] [msglevel type on|off]
ethtool -S|--statistics devname
ethtool -i|--driver devname
ethtool -l|--show-channels devname
ethtool -L|--set-channels devname
ethtool -g|--show-ring devname
ethtool --reset devname

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
9e5f7d26a4 gve: Add workqueue and reset support
Add support for the workqueue to handle management interrupts and
support for resets.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
f5cedc84a3 gve: Add transmit and receive support
Add support for passing traffic.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
Catherine Sullivan
893ce44df5 gve: Add basic driver framework for Compute Engine Virtual NIC
Add a driver framework for the Compute Engine Virtual NIC that will be
available in the future.

At this point the only functionality is loading the driver.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:36:35 -07:00
David S. Miller
2a8d8e0fec Merge branch 'blackhole-device-to-invalidate-dst'
Mahesh Bandewar says:

====================
blackhole device to invalidate dst

When we invalidate dst or mark it "dead", we assign 'lo' to
dst->dev. First of all this assignment is racy and more over,
it has MTU implications.

The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP
code when dereferencing the dst don't check if the dst is valid
or not. TCP when dereferencing a dead-dst while negotiating a
new connection, may use dst device which is 'lo' instead of
using the correct device. Consider the following scenario:

A SYN arrives on an interface and tcp-layer while processing
SYNACK finds a dst and associates it with SYNACK skb. Now before
skb gets passed to L3 for processing, if that dst gets "dead"
(because of the virtual device getting disappeared & then reappeared),
the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume
the SYN has ADV_MSS set as 9k while the output device through
which this SYNACK is going to go out has standard MTU of 1500.
The MTU check during the route check passes since MIN(9K, 64K)
is 9k and TCP successfully negotiates 9k MSS. The subsequent
data packet; bigger in size gets passed to the device and it
won't be marked as GSO since the assumed MTU of the device is
9k.

This either crashes the NIC and we have seen fixes that went
into drivers to handle this scenario. 8914a59511 ('bnx2x:
disable GSO where gso_size is too big for hardware') and
2b16f04872 ('net: create skb_gso_validate_mac_len()') and
with those fixes TCP eventually recovers but not before
few dropped segments.

Well, I'm not a TCP expert and though we have experienced
these corner cases in our environment, I could not reproduce
this case reliably in my test setup to try this fix myself.
However, Michael Chan <michael.chan@broadcom.com> had a setup
where these fixes helped him mitigate the issue and not cause
the crash.

The idea here is to not alter the data-path with additional
locks or smb()/rmb() barriers to avoid racy assignments but
to create a new device that has really low MTU that has
.ndo_start_xmit essentially a kfree_skb(). Make use of this
device instead of 'lo' when marking the dst dead.

First patch implements the blackhole device and second
patch uses it in IPv4 and IPv6 stack while the third patch
is the self test that ensures the sanity of this device.

v1->v2
  fixed the self-test patch to handle the conflict

v2 -> v3
  fixed Kconfig text/string.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
509e56b37c blackhole_dev: add a selftest
Since this is not really a device with all capabilities, this test
ensures that it has *enough* to make it through the data path
without causing unwanted side-effects (read crash!).

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
8d7017fd62 blackhole_netdev: use blackhole_netdev to invalidate dst entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking
dst "dead".

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Mahesh Bandewar
4de83b88c6 loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Hariprasad Kelam
8909783cb5 net: ethernet: broadcom: bcm63xx_enet: Remove unneeded memset
Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
__GFP_ZERO flag

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:29:19 -07:00
David S. Miller
fec3b9ec47 Merge branch 'net-netsec-Add-XDP-Support'
Ilias Apalodimas says:

====================
net: netsec: Add XDP Support

This is a respin of https://www.spinics.net/lists/netdev/msg526066.html
Since page_pool API fixes are merged into net-next we can now safely use
it's DMA mapping capabilities.

First patch changes the buffer allocation from napi/netdev_alloc_frag()
to page_pool API. Although this will lead to slightly reduced performance
(on raw packet drops only) we can use the API for XDP buffer recycling.
Another side effect is a slight increase in memory usage, due to using a
single page per packet.

The second patch adds XDP support on the driver.
There's a bunch of interesting options that come up due to the single
Tx queue.
Locking is needed(to avoid messing up the Tx queues since ndo_xdp_xmit
and the normal stack can co-exist). We also need to track down the
'buffer type' for TX and properly free or recycle the packet depending
on it's nature.

Changes since RFC:
- Bug fixes from Jesper and Maciej
- Added page pool API to retrieve the DMA direction

Changes since v1:
- Use page_pool_free correctly if xdp_rxq_info_reg() failed
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
ba2b232108 net: netsec: add XDP support
The interface only supports 1 Tx queue so locking is introduced on
the Tx queue if XDP is enabled to make sure .ndo_start_xmit and
.ndo_xdp_xmit won't corrupt Tx ring

- Performance (SMMU off)

Benchmark   XDP_SKB     XDP_DRV
xdp1        291kpps     344kpps
rxdrop      282kpps     342kpps

- Performance (SMMU on)
Benchmark   XDP_SKB     XDP_DRV
xdp1        167kpps     324kpps
rxdrop      164kpps     323kpps

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
bb005f2a70 net: page_pool: add helper function for retrieving dma direction
Since the dma direction is stored in page pool params, offer an API
helper for driver that choose not to keep track of it locally

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas
5c67bf0ec4 net: netsec: Use page_pool API
Use page_pool and it's DMA mapping capabilities for Rx buffers instead
of netdev/napi_alloc_frag()

Although this will result in a slight performance penalty on small sized
packets (~10%) the use of the API will allow to easily add XDP support.
The penalty won't be visible in network testing i.e ipef/netperf etc, it
only happens during raw packet drops.
Furthermore we intend to add recycling capabilities on the API
in the future. Once the recycling is added the performance penalty will
go away.
The only 'real' penalty is the slightly increased memory usage, since we
now allocate a page per packet instead of the amount of bytes we need +
skb metadata (difference is roughly 2kb per packet).
With a minimum of 4BG of RAM on the only SoC that has this NIC the
extra memory usage is negligible (a bit more on 64K pages)

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00