Commit Graph

799789 Commits

Author SHA1 Message Date
Peter Oskolkov 23b0269e58 net: udp6: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
Peter Oskolkov 4cdeeee925 net: udp: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
yupeng 8e2ea53a83 add snmp counters document
Add explainations for some general IP counters, SACK and DSACK related
counters

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:50:14 -08:00
David S. Miller 384aee46ca Merge branch 'neighbor-More-gc_list-changes'
David Ahern says:

====================
neighbor: More gc_list changes

More gc_list changes and cleanups.

The first 2 patches are bug fixes from the first gc_list change.
Specifically, fix the locking order to be consistent - table lock
followed by neighbor lock, and then entries in the FAILED state
should always be candidates for forced_gc without waiting for any
time span (return to the eviction logic prior to the separate gc_list).

Patch 3 removes 2 now unnecessary arguments to neigh_del.

Patch 4 moves a helper from a header file to core code in preparation
for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list.
These entries are already exempt from forced_gc; patch 5 removes them
from consideration and makes them on par with PERMANENT entries given
that they are also managed by userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern e997f8a20a neighbor: Remove externally learned entries from gc_list
Externally learned entries are similar to PERMANENT entries in the
sense they are managed by userspace and can not be garbage collected.
As such remove them from the gc_list, remove the flags check from
neigh_forced_gc and skip threshold checks in neigh_alloc. As with
PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED
entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern 526f1b587c neighbor: Move neigh_update_ext_learned to core file
neigh_update_ext_learned has one caller in neighbour.c so does not need
to be defined in the header. Move it and in the process remove the
intialization of ndm_flags and just set it based on the flags check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern 7e6f182bec neighbor: Remove state and flags arguments to neigh_del
neigh_del now only has 1 caller, and the state and flags arguments
are both 0. Remove them and simplify neigh_del.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern 758a7f0b32 neighbor: Fix state check in neigh_forced_gc
PERMANENT entries are not on the gc_list so the state check is now
redundant. Also, the move to not purge entries until after 5 seconds
should not apply to FAILED entries; those can be removed immediately
to make way for newer ones. This restores the previous logic prior to
the gc_list.

Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern 9c29a2f55e neighbor: Fix locking order for gc_list changes
Lock checker noted an inverted lock order between neigh_change_state
(neighbor lock then table lock) and neigh_periodic_work (table lock and
then neighbor lock) resulting in:

[  121.057652] ======================================================
[  121.058740] WARNING: possible circular locking dependency detected
[  121.059861] 4.20.0-rc6+ #43 Not tainted
[  121.060546] ------------------------------------------------------
[  121.061630] kworker/0:2/65 is trying to acquire lock:
[  121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324
[  121.063894]
[  121.063894] but task is already holding lock:
[  121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324
[  121.066274]
[  121.066274] which lock already depends on the new lock.
[  121.066274]
[  121.067693]
[  121.067693] the existing dependency chain (in reverse order) is:
...

Fix by renaming neigh_change_state to neigh_update_gc_list, changing
it to only manage whether an entry should be on the gc_list and taking
locks in the same order as neigh_periodic_work. Invoke at the end of
neigh_update only if diff between old or new states has the PERMANENT
flag set.

Fixes: 8cc196d6ef ("neighbor: gc_list changes should be protected by table lock")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
Cong Wang aeb3fecde8 net_sched: fold tcf_block_cb_call() into tc_setup_cb_call()
After commit 69bd48404f ("net/sched: Remove egdev mechanism"),
tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
and remove its unused parameter 'exts'.

Fixes: 69bd48404f ("net/sched: Remove egdev mechanism")
Cc: Oz Shlomo <ozsh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:32:19 -08:00
Wen Yang 390de19404 net/ibmvnic: Remove tests of member address
The driver was checking for non-NULL address.
- adapter->napi[i]

This is pointless as these will be always non-NULL, since the
'dapter->napi' is allocated in init_napi().
It is safe to get rid of useless checks for addresses to fix the
coccinelle warning:
>>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address
Since such statements always return true, they are redundant.

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thomas Falcon <tlfalcon@linux.ibm.com>
CC: John Allen <jallen@linux.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linuxppc-dev@lists.ozlabs.org
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:36:38 -08:00
Prashant Bhole 6342ca6447 tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled
tun_xdp_one() runs with local bh disabled. So there is no need to
disable preemption by calling get_cpu_ptr while updating stats. This
patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
micro-optimization. Also removes related put_cpu_ptr call.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:36:26 -08:00
Aviv Heller 9582466640 net/mlx5: Handle LAG FW commands failure gracefully
When create_lag or destroy_lag FW commands fail, display an appropriate
error message, and try to recover, if possible.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:56 -08:00
Aviv Heller 7c34ec19e1 net/mlx5: Make RoCE and SR-IOV LAG modes explicit
With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:55 -08:00
Aviv Heller 292612d68c net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()
The new name better represents the function's aim, and sets a precedent
for a '__' prefix for internal, non-locked versions of LAG APIs.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou 8aaca1976e net/mlx5: Allow co-enablement of uplink LAG and SRIOV
Enable setting uplink LAG if sriov is enabled on both ports in switchdev
mode.

Once the sriov mode is changed from switchdev for any of the ports, the
LAG instance is disabled.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou eff849b2c6 net/mlx5: Allow/disallow LAG according to pre-req only
Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou 3b5ff59fd8 net/mlx5: Adjustments for the activate LAG logic to run under sriov
When HW lag is set/unset, roce must not be enabled on the port, as such
we wrap such changes with roce enable/disable either directly or through
re-creation of IB device.

Currently, lag and sriov are mutually exclusive, so by definition this
code doesn't run under sriov.

Towards changing this exclusion, we need to make sure that roce will not
be enabled on the eswitch manager port under sriov since this is
requirement of the switchdev mode.

We are going strict here and avoiding this all together under sriov.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Aviv Heller 1418ddd96a net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG
Under uplink LAG, packets that match a flow might arrive on both uplink
ports and transmitted through both as part of supporting aggregation and
high-availability.

When the netdevs representing the uplinks are set into LAG (bonding,
teaming), duplicate the TC flow offloading into each of the per-uplink
e-switches.

Duplication is not required if the source is the bond device, since in
this case it is assumed that the bond and the uplink netdevs share the
same TC block, and thus duplication will occur naturally by the stack.

Note that under encapsulation scheme, both flows will use the same
neighbour and hence both will contribute to the last-used feedback
towards the stack.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou 7ba58ba7ba net/mlx5e: Offload TC e-switch rules with egress LAG device
When parsing TC FDB actions, if the egress device is a bond/team
net-device which enslaved the uplink representor of the e-switch,
use the uplink representor as the destination in the HW rule.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou 491c37e49b net/mlx5e: In case of LAG, one switch parent id is used for all representors
When the uplink representors are put into lag, set all the
representors (VFs and uplinks) of the same NIC to return the same
switchdev id.

Currently, the route lookup code on the encapsulation offload path
assumes that same switchdev id for the source and dest devices means
that the dest is also mlx5 HW netdev. This doesn't hold anymore when we
align the switchdev Id of the uplinks to be same, which in turn causes
the bond/team to return that id to the caller. As such, enhance the
relevant check to take into account the uplink lag case.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Shahar Klein f9392795e2 net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules
Assign a counter dev attribute according to device capability and use
it for management of counters related to offloaded eswitch TC flows.

With upcoming support for uplink LAG, we have two HW rules per one
logical SW (TC) rule. Although the HW supports attaching one counter
to multiple rules, we are allocating counter per HW rule.

We need this separation for two reasons:

1. "flow eswitch" counter affinity HW require the counter to be
allocated on the device where the eswitch rule is set.

2. for some use-cases (multi-path routing) each HW flow relates to
different neighbour, hence our neigh update logic must have a per-rule
HW accountant in order to provide the proper feedback to the kernel.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan 04de7dda73 net/mlx5e: Infrastructure for duplicated offloading of TC flows
Under uplink LAG or multipath schemes, traffic that matches one flow
might arrive on both uplink ports and transmitted through both
as part of supporting aggregation and high-availability.

To cope with the fact that the SW model might use logical SW port
(e.g uplink team or bond) but we have two HW ports with e-switch on
each, there are cases where in order to offload a SW TC rule we
need to duplicate it to two HW flows.

Since each HW rule has its own counter we also aggregate the counter
of both rules when a flow stats query is executed from user-space.

Introduce the changes for the different elements (add/delete/stats),
currently nothing is duplicated.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan ac004b8321 net/mlx5e: E-Switch, Add peer miss rules
In the sriov offloads mode, packets that are not matched by any
other rule are sent towards the e-switch vport manager for further
processing.

Under upcoming patches (e.g for uplink LAG), packets sent from VF
vports belonging to esw0 (e-switch related to PF0) might end up in
esw1 (e-switch related to PF1) due to muxing logic applied by the
FW.

In such a case we still want the missed packet to be sent to the
"base" esw manager vport in order to present the control plane a
consistent view of the source (VF reresentor) port.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Aviv Heller fadd59fc50 net/mlx5: Introduce inter-device communication mechanism
This introduces devcom, a generic mechanism for performing operations
on both physical functions of the same Connect-X card.

The first user of this API is merged eswitch, which will be introduced
in subsequent patches.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Arnd Bergmann c2c79a32fb hamradio, ppp: change semaphore to completion
ppp and hamradio have copies of the same code that uses a semaphore
in place of a completion for historic reasons. Make it use the
proper interface instead in all copies.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:27:10 -08:00
Arnd Bergmann 2aa55dccf8 hns3: prevent building without CONFIG_INET
We now get a link failure when CONFIG_INET is disabled, since
tcp_gro_complete is unavailable:

drivers/net/ethernet/hisilicon/hns3/hns3_enet.o: In function `hns3_set_gro_param':
hns3_enet.c:(.text+0x230c): undefined reference to `tcp_gro_complete'

Add an explicit CONFIG_INET dependency here to avoid the broken
configuration.

Fixes: a6d53b97a2 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:17:50 -08:00
Saeed Mahameed 64e4cf0dab Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 11:15:25 -08:00
Shahar Klein 4c283e6155 net/mlx5: Fold the modify lag code into function
Handle the code of modifying the lag affinity within a separate function.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan 3cfe432e1b net/mlx5: Add lag affinity info to log
Report the initial LAG port affinity upon LAG creation.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan 8252cf728c net/mlx5: Split the activate lag function into two routines
Split the activate lag function in order to be symmetric with
the deactivate lag function.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Shahar Klein 8bb957d255 net/mlx5: E-Switch, Introduce flow counter affinity
This dictates the device affinity for eswitch flow counters, set by the FW
according to the HW device capabilities.

Under "source eswitch" affinity, the counter should be allocated on the
device related to the source vport in the match. This covers both non
merged e-switch mode as well as old FW that does not advertise this cap.

Under "flow eswitch" affinity, the counter should be allocated on the
device where the eswitch rule is set.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Mark Bloch 06cc74af05 IB/mlx5: Unify e-switch representors load approach between uplink and VFs
When in switchdev mode and the add function is called by the core
level driver, make sure we only register the callbacks, but don't
create the mlx5 IB device or initialize anything. With this change
all the IB devices in switchdev mode are created only once the load
callback is invoked by the e-switch core sub-module. This follows
the design paradigm under which the all the Eth representors must
be loaded before any of IB reprs is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Saeed Mahameed 4c8b85187c net/mlx5: Use lowercase 'X' for hex values
Apparently gcc is cool with upper case '0X' but it is not commonly used.
Replace '0X' with lowercase '0x' in mlx5_ifc.h file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
David S. Miller 522185d5cb Merge branch 'Introduce-NETDEV_PRE_CHANGEADDR'
Petr Machata says:

====================
Introduce NETDEV_PRE_CHANGEADDR

Spectrum devices have a limitation that all router interfaces need to
have the same address prefix. In Spectrum-1, the requirement is for the
initial 38 bits of all RIFs to be the same, in Spectrum-2 the limit is
36 bits. Currently violations of this requirement are not diagnosed. At
the same time, if the condition is not upheld, the mismatched MAC
address ends up overwriting the common prefix, and all RIF MAC addresses
silently change to the new prefix.

It is therefore desirable to be able at least to diagnose the issue, and
better to reject attempts to change MAC addresses in ways that is
incompatible with the device.

Currently MAC address changes are notified through emission of
NETDEV_CHANGEADDR, which is done after the change. Extending this
message to allow vetoing is certainly possible, but several other
notification types have instead adopted a simple two-stage approach:
first a "pre" notification is sent to make sure all interested parties
are OK with the change that's about to be done. Then the change is done,
and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore this patchset introduces it for NETDEV_CHANGEADDR as well.

One prominent path to emitting NETDEV_CHANGEADDR is through
dev_set_mac_address(). Therefore in patch #1, give this function an
extack argument, so that a textual reason for rejection (or a warning)
can be communicated back to the user.

In patch #2, add the new notification type. In patch #3, have dev.c emit
the notification for instances of dev_addr change, or addition of an
address to dev_addrs list.

In patches #4 and #5, extend the bridge driver to handle and emit the
new notifier.

In patch #6, change IPVLAN to emit the new notifier.

Likewise for bonding driver in patches #7 and #8. Note that the team
driver doesn't need this treatment, as it goes through
dev_set_mac_address().

In patches #9, #10 and #11 adapt mlxsw to veto MAC addresses on router
interfaces, if they violate the requirement that all RIF MAC addresses
have the same prefix.

Finally in patches #12 and #13, add a test for vetoing of a direct
change of a port device MAC, and indirect change of a bridge MAC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 9651ee10ce selftests: mlxsw: Test FID RIF MAC vetoing
When a FID RIF is created for a bridge with IP address, its MAC address
must obey the same requirements as other RIFs. Test that attempts to
change the address incompatibly by attaching a device are vetoed with
extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 555afaae12 selftests: mlxsw: Test RIF MAC vetoing
Test that attempts to change address in a way that violates Spectrum
requirements are vetoed with extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 74bc993974 mlxsw: spectrum_router: Veto unsupported RIF MAC addresses
On NETDEV_PRE_CHANGEADDR, if the change is related to a RIF interface,
verify that it satisfies the criterion that all RIF interfaces have the
same MAC address prefix, as indicated by mlxsw_sp.mac_mask.

Additionally, besides explicit address changes, check that the address
of an interface for which a RIF is about to be added matches the
required pattern as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 9329b8162b mlxsw: spectrum: Add mlxsw_sp.mac_mask
The Spectrum hardware demands that all router interfaces in the system
have the same first 38 resp. 36 bits of MAC address: the former limit
holds on Spectrum, the latter on Spectrum-2. Add a field that refers to
the required prefix mask and initialize in mlxsw_sp1_init() and
mlxsw_sp2_init().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 9735f2d2fe mlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event()
Prepare mlxsw_sp_netdevice_router_port_event() for handling of
NETDEV_PRE_CHANGEADDR. Split out the part that deals with the actual
changes and call it for the two events currently handled.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 1caf40dec1 net: bonding: Issue NETDEV_PRE_CHANGEADDR
Give interested parties an opportunity to veto an impending HW address
change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata b924591428 net: bonding: Give bond_set_dev_addr() a return value
Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR,
and allow consumers to veto the address change. To propagate further the
return code from NETDEV_PRE_CHANGEADDR, give the function that
implements address change a return value.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata 61345fab48 net: ipvlan: Issue NETDEV_PRE_CHANGEADDR
A NETDEV_CHANGEADDR event implies a change of address of each of the
IPVLANs of this IPVLAN device. Therefore propagate NETDEV_PRE_CHANGEADDR
to all the IPVLANs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
Petr Machata b89df65c5e net: bridge: Handle NETDEV_PRE_CHANGEADDR from ports
When a port device seeks approval of a potential new MAC address, make
sure that should the bridge device end up using this address, all
interested parties would agree with it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
Petr Machata ca935da7f4 net: bridge: Issue NETDEV_PRE_CHANGEADDR
When a port is attached to a bridge, the address of the bridge in
question may change as well. Even if it would not change at this
point (because the current bridge address is lower), it might end up
changing later as a result of detach of another port, which can't be
vetoed.

Therefore issue NETDEV_PRE_CHANGEADDR regardless of whether the address
will be used at this point or not, and make sure all involved parties
would agree with the change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
Petr Machata d59cdf9475 net: dev: Issue NETDEV_PRE_CHANGEADDR
When a device address is about to be changed, or an address added to the
list of device HW addresses, it is necessary to ensure that all
interested parties can support the address. Therefore, send the
NETDEV_PRE_CHANGEADDR notification, and if anyone bails on it, do not
change the address.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
Petr Machata 1570415f08 net: dev: Add NETDEV_PRE_CHANGEADDR
The NETDEV_CHANGEADDR notification is emitted after a device address
changes. Extending this message to allow vetoing is certainly possible,
but several other notification types have instead adopted a simple
two-stage approach: first a "pre" notification is sent to make sure all
interested parties are OK with a change that's about to be done. Then
the change is done, and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore adopt it for NETDEV_CHANGEADDR as well.

To that end, add NETDEV_PRE_CHANGEADDR and an info structure to go along
with it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
Petr Machata 3a37a9636c net: dev: Add extack argument to dev_set_mac_address()
A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:38 -08:00
David S. Miller 95302c394c mlx5e-updates-2018-12-11
From Eli Britstein,
 Patches 1-10 adds remote mirroring support.
 Patches 1-4 refactor encap related code as pre-steps for using per
 destination encapsulation properties.
 Patches 5-7 use extended destination feature for single/multi
 destination scenarios that have a single encap destination.
 Patches 8-10 enable multiple encap destinations for a TC flow.
 
 From, Daniel Jurgens,
 Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24%
 improvement in small packet throughput
 
 From Eyal Davidovich,
 patches 12-14, FW monitor counter support
 FW monitor counters feature came to solve the delayed reporting of
 FW stats in the atomic get_stats64 ndo, since we can't access the
 FW at that stage, this feature will enable immediate FW stats updates
 in the driver via fw events on specific stats updates.
 
 Patch 12, cleanup to avoid querying a FW counter when it is not
 supported
 Patch 13, Monitor counters FW commands support
 Patch 14, Use monitor counters in ethernet netdevice to update FW
 stats reported in the atomic get_stats64 ndo.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcEEYmAAoJEEg/ir3gV/o+XlIIAKIQhvHnCoGMPsZU+HbA1h3O
 10i2hnmQqx2F5Z7tSON9jr9HVzLj6FqQ3pSnTyw8xiMBqOldF6abmfpMPQ/XeWXk
 PDBCzfOUMAlIsGFAJ06tcz3VLkWrEdoQHmi7Jb4IW+LC9CnZBobzrcLaxQ95txYP
 LkCfO8Y70D5tzBrzL1o1RfKsCSFJma5zhXDPF/B3rifLyHOmo9aLzZuDvKwvvSkC
 Y3QM5ulnLqVhxaXA0xzB+iHLiI0nX2H17CBPseLzM+Ip0vZS/nA77rnQ49PUgjiZ
 d1BoXo9MUa3eoQwXNl+qSAfnGjCwqzplXxzjT2bKKXVc2ZOWmkECgRXmKI6JDDU=
 =EuqQ
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-11

From Eli Britstein,
Patches 1-10 adds remote mirroring support.
Patches 1-4 refactor encap related code as pre-steps for using per
destination encapsulation properties.
Patches 5-7 use extended destination feature for single/multi
destination scenarios that have a single encap destination.
Patches 8-10 enable multiple encap destinations for a TC flow.

From, Daniel Jurgens,
Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24%
improvement in small packet throughput

From Eyal Davidovich,
patches 12-14, FW monitor counter support
FW monitor counters feature came to solve the delayed reporting of
FW stats in the atomic get_stats64 ndo, since we can't access the
FW at that stage, this feature will enable immediate FW stats updates
in the driver via fw events on specific stats updates.

Patch 12, cleanup to avoid querying a FW counter when it is not
supported
Patch 13, Monitor counters FW commands support
Patch 14, Use monitor counters in ethernet netdevice to update FW
stats reported in the atomic get_stats64 ndo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 22:22:53 -08:00
Biao Huang 43d4b29718 net-next: stmmac: dwmac-mediatek: add module license info
Add MODULE_LICENSE info to fix this:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.o

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:42:55 -08:00