Commit Graph

706618 Commits

Author SHA1 Message Date
Vivien Didelot 62fc958762 net: dsa: use temporary dsa_device_ops variable
When resolving the DSA tagging protocol used by a CPU switch, use a
temporary "tag_ops" variable to store the dsa_device_ops instead of
using directly dst->tag_ops. This will make the future patches moving
this pointer around easier to read.

There is no functional changes.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:15:07 +01:00
Vivien Didelot 7ec764eef9 net: dsa: use cpu_dp in master code
Make it clear that the master device is linked to a CPU port by using
"cpu_dp" for the dsa_port variable in master.c instead of "port", then
use a "port" variable to describe the port index, as usually seen in
other places of DSA core.

This will make the future patch touching dsa_ptr more readable. There is
no functional changes.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:15:07 +01:00
Vivien Didelot 3775b1b7f0 net: dsa: add master helper to look up slaves
The DSA tagging code does not need to know about the DSA architecture,
it only needs to return the slave device corresponding to the source
port index (and eventually the source device index for cascade-capable
switches) parsed from the frame received on the master device.

For this purpose, provide an inline dsa_master_get_slave helper which
validates the device and port indexes and look up the slave device.

This makes the tagging rcv functions more concise and robust, and also
makes dsa_get_cpu_port obsolete.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:15:07 +01:00
Colin Ian King 075cfdd659 net: hns3: fix null pointer dereference before null check
pointer ndev is being dereferenced with the call to netif_running
before it is being null checked.  Re-order the code to only dereference
ndev after it has been null checked.

Detected by CoverityScan, CID#1457206 ("Dereference before null check")

Fixes: 9df8f79a4d ("net: hns3: Add DCB support when interacting with network stack")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:12:45 +01:00
Simon Xiao 09af87d18f hv_netvsc: report stop_queue and wake_queue
Report the numbers of events for stop_queue and wake_queue in
ethtool stats.

Example:
ethtool -S eth0
NIC statistics:
	...
	stop_queue: 7
	wake_queue: 7
	...

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:10:30 +01:00
Martin KaFai Lau 721e08dad1 bpf: Fix compiler warning on info.map_ids for 32bit platform
This patch uses u64_to_user_ptr() to cast info.map_ids to a userspace ptr.
It also tags the user_map_ids with '__user' for sparse check.

Fixes: cb4d2b3f03 ("bpf: Add name, load_time, uid and map_ids to bpf_prog_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:09:42 +01:00
Colin Ian King b1c49d1420 net_sched: remove redundant assignment to ret
The assignment of -EINVAL to variable ret is redundant as it
is being overwritten on the following error exit paths or
to the return value from the following call to basic_set_parms.
Fix this up by removing it. Cleans up clang warning message:

net/sched/cls_basic.c:185:2: warning: Value stored to 'err' is never read

Fixes: 1d8134fea2 ("net_sched: use idr to allocate basic filter handles")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:08:00 +01:00
Colin Ian King ef739d8aab net: ipmr: make function ipmr_notifier_init static
The function ipmr_notifier_init is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
warning: symbol 'ipmr_notifier_init' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:06:29 +01:00
Mick Tarsel e876a8a7e9 ibmvnic: Set state UP
State is initially reported as UNKNOWN. Before register call
netif_carrier_off(). Once the device is opened, call netif_carrier_on() in
order to set the state to UP.

Signed-off-by: Mick Tarsel <mjtarsel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:02:35 +01:00
Florian Fainelli 21a2774ef5 Revert "net: dsa: bcm_sf2: Defer port enabling to calling port_enable"
This reverts commit e85ec74ace ("net: dsa: bcm_sf2: Defer port
enabling to calling port_enable") because this now makes an unbind
followed by a bind to fail connecting to the ingrated PHY.

What this patch missed is that we need the PHY to be enabled with
bcm_sf2_gphy_enable_set() before probing it on the MDIO bus. This is
correctly done in the ops->setup() function, but by the time
ops->port_enable() runs, this is too late. Upon unbind we would power
down the PHY, and so when we would bind again, the PHY would be left
powered off.

Fixes: e85ec74ace ("net: dsa: bcm_sf2: Defer port enabling to calling port_enable")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 04:01:29 +01:00
David S. Miller a6992ebee4 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-09-29

This series contains updates to i40e and i40evf only.

Jake provides several of the changes starting with the renaming of a
variable to clarify what the value is actually calculating. Found we
were misusing the __I40E_RECOVERY_PENDING bit to determine when we
should actually request a new IRQ in i40e_setup_misc_vector(), which
lead to a design mistake, so to resolve the issue, use a separate
state bit for miscellaneous IRQ setup and fix up the design while we
are at it.  Cleaned up the old legacy PM support in the driver since
we support the newer generic PM callbacks.  Fixed a failure to
hibernate issue, where on some platforms with a large number of CPUs,
we would allocate many IRQ vectors which we would try to migrate to
CPU0 when hibernating.

Sudheer cleans up a check for unqualified module inside i40e_up_complete()
because the link state information is in flux at time, so log messages
are getting logged with incorrect link state information.  Also provided
additional log message cleanups and simplify member variable access in
the printing of the link messages.

Mariusz relaxes the firmware check since Fortville and Fort Park NICs
can and do have different firmware versions, so only warn for older
Fortville firmware.  Fixed an errata with a flow director statistic that
was not wrapping as expected, simply reset after reading.

Mitch prevents consternation by lowering the log level to debug on a
message seen regularly on VF reset or unload, which is meaningless under
normal circumstances.  Refactor the firmware version checking since
Fortville and Fort Park devices can have different firmware versions.

Alan fixes a ring to vector mapping, where the past implementation
attempted to map each Tx and Rx ring to its own vector, however we use
combined queues so we should be mapping the Tx/Rx rings together on one
vector.  Adds the ability for the VF to request a different number of
queues allocated to it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 03:31:17 +01:00
Colin Ian King 45c1fd61d5 mkiss: remove redundant check on len being zero
The check on len is redundant as it is always greater than 1,
so just remove it and make the printk less complex.

Detected by CoverityScan, CID#1226729 ("Logically dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30 05:44:28 +01:00
Maciej Żenczykowski 84e14fe353 net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND)
So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
even on IPv6 sockets.

However, it turns out it is perfectly reasonable to want to set freebind
on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
socket option on such a socket (they're all blindly errored out).

One use case for this is to allow spoofing src ip on a raw socket
via sendmsg cmsg.

Tested:
  built, and booted
  # python
  >>> import socket
  >>> SOL_IP = socket.SOL_IP
  >>> SOL_IPV6 = socket.IPPROTO_IPV6
  >>> IP_FREEBIND = 15
  >>> IPV6_FREEBIND = 78
  >>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  0
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  0
  >>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  1
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  1

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30 05:30:52 +01:00
Mike Manning 1f372c7bfb net: ipv6: send NS for DAD when link operationally up
The NS for DAD are sent on admin up as long as a valid qdisc is found.
A race condition exists by which these packets will not egress the
interface if the operational state of the lower device is not yet up.
The solution is to delay DAD until the link is operationally up
according to RFC2863. Rather than only doing this, follow the existing
code checks by deferring IPv6 device initialization altogether. The fix
allows DAD on devices like tunnels that are controlled by userspace
control plane. The fix has no impact on regular deployments, but means
that there is no IPv6 connectivity until the port has been opened in
the case of port-based network access control, which should be
desirable.

Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30 05:24:45 +01:00
Mitch Williams 22b96551f2 i40e: refactor FW version checking
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:02 -07:00
Alan Brady a3f5aa9073 i40e: Enable VF to negotiate number of allocated queues
Currently the PF allocates a default number of queues for each VF and
cannot be changed.  This patch enables the VF to request a different
number of queues allocated to it.  This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.

After the PF receives a request message, it will set a requested number
of queues for that VF.  Then when the VF resets, its VSI will get a new
number of queues allocated to it.

This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Alan Brady c97fc9b6a7 i40evf: fix ring to vector mapping
The current implementation for mapping queues to vectors is broken
because it attempts to map each Tx and Rx ring to its own vector,
however we use combined queues so we should actually be mapping the
Tx/Rx rings together on one vector.

Also in the current implementation, in the case where we have more
queues than vectors, we attempt to group the queues together into
'chunks' and map each 'chunk' of queues to a vector.  Chunking them
together would be more ideal if, and only if, we only had RSS because of
the way the hashing algorithm works but in the case of a future patch
that enables VF ADq, round robin assignment is better and still works
with RSS.

This patch resolves both those issues and simplifies the code needed to
accomplish this.  Instead of treating the case where we have more queues
than vectors as special, if we notice our vector index is greater than
vectors, reset the vector index to zero and continue mapping.  This
should ensure that in both cases, whether we have enough vectors for
each queue or not, the queues get appropriately mapped.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller b980c0634f i40e: shutdown all IRQs and disable MSI-X when suspended
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.

The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.

However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.

It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.

Testing-hints:
  This affects systems with a large number of CPUs, and with multiple
  devices enabled. Without this change, those platforms are unable to
  hibernate at all.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller 5c49922880 i40e: prevent service task from running while we're suspended
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:01 -07:00
Jacob Keller 401586c2b9 i40e: don't clear suspended state until we finish resuming
When handling suspend and resume callbacks we want to make sure that (a)
we don't suspend again if we're already suspended and (b) we don't
resume again if we're already resuming. Lets make sure we test_and_set
the __I40E_SUSPENDED bit in i40e_suspend which ensures that a suspend
call when already suspended will exit early. Additionally, if
__I40E_SUSPENDED is not set when we begin resuming, exit early as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Jacob Keller 0e5d3da400 i40e: use newer generic PM support instead of legacy PM callbacks
Stop using the old legacy PM support, since we now have stable support
for the newer generic PM callbacks.

This has several advantages. First, we no longer have to manage our
own pci_save_state() and power changes, as it's preferred to have the
PCI stack do this. Second, these routines get called for both hibernate
and suspend to ram, so we can have the driver properly handle all the
suspend/resume flows that it needs to.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Jacob Keller c17401a1dd i40e: use separate state bit for miscellaneous IRQ setup
We currently (mis)use the __I40E_RECOVERY_PENDING bit to determine when
we should actually request a new IRQ in i40e_setup_misc_vector().

This led to a design mistake where we open-coded the re-setup of the
miscellaneous vector in i40e_resume() instead of using the function
provided. If we did not open-code this and instead tried to use the
i40e_setup_misc_vector() function, it would lead to never reallocating
the IRQ.

This would lead to a second i40e_suspend() call failing to free the
vector due to a NULL pointer dereference.

A future patch is going to re-work how the i40e_suspend() and
i40e_resume() flows work to clear all IRQ vectors, which would require
us to use i40e_setup_misc_vector() directly. Since during this time the
__I40E_RECOVERY_PENDING bit is set, we'll never re-allocate the vector.

Rather than leaving the open-coded setup in i40e_resume() lets just fix
the problem properly in i40e_setup_misc_vector().

Introduce a new state bit which indicates when the IRQ has been
assigned, which will be set when i40e_setup_misc_vector is first called.
This ultimately resolves the issue of re-requesting the vector, without
overloading the __I40E_RECOVERY_PENDING state. This ensures that the
suspend/resume cycle can use the setup function instead of open-coding
the re-request during resume.

Additionally, since the only callers of i40e_stop_misc_vector also want
to free it, move this code directly into the function to avoid
duplication. Due to the new functionality, rename it to
i40e_free_misc_vector().

This lets us drop the extra calls to free and re-enable the vector
during i40e_suspend() and i40e_resume(). We don't need to call
i40e_setup_misc_Vector() in i40e_resume() because it gets called by the
i40e_rebuild() call.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Mitch Williams 905770fa3e i40evf: lower message level
We see this message regularly on VF reset or unload (which invokes a
reset). It's essentially meaningless unless it's happening constantly.
To prevent consternation, lower the log level to debug so it's not seen
under normal circumstance.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:51:00 -07:00
Mariusz Stachura 0dc8692e91 i40e: fix for flow director counters not wrapping as expected
An errata with GLQF_PCNT causes it to not wrap as expected. This
can cause an error in flow director statistics. This patch resets
affected counters just after reading.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Mariusz Stachura e04ea00217 i40e: relax warning message in case of version mismatch
Fortville and Fort Park devices are often on different firmware release
schedules. This change relaxes the minor version warning message,
so it is only displayed for older FW warning version for old
firmware Fortville 3 or earlier.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari 3fded4663b i40e: simplify member variable accesses
This commit replaces usage of vsi->back in i40e_print_link_message()
(which is actually a PF pointer) with temp variable.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari 9a03449d3e i40e: Fix link down message when interface is brought up
i40e_print_link_message() is intended to compare new
link state with current link state and print log message
only if the new state is different from current state.

However in current driver the new state does not get updated
when link is going down because of the if condition. When an
interface is brought down, vsi->state is set to I40E_VSI_DOWN
in i40e_vsi_close() and later i40e_print_link_message() does
not get invoked in i40e_link_event due to if condition. Hence
link down message doesn't appear when link is going down. The
down state is seen  later during i40e_open() and old state
gets printed. The actual link state doesn't get updated in
i40e_close() or i40e_open() but when i40e_handle_link_event is
called inside i40e_clean_adminq_subtask.

This change allows i40e_print_link_message() to be called when
interface is going down and keeps the state information updated.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Sudheer Mogilappagari 16badf758b i40e: Fix unqualified module message while bringing link up
In current driver, when ifconfig ethx up is done, the link state
doesn't transition to UP inside i40e_open(). It changes after AQ
command response is handled in i40e_handle_link_event().

When pf->hw.phy.link_info.link_info is DOWN inside i40e_open(),
The state is transient and invalid. So log message gets printed
based on incorrect info (i.e link_info and an_info).

This commit removes check for unqualified module inside
i40e_up_complete(). The existing check in i40e_handle_link_event()
logs the error message based on correct link state information.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:59 -07:00
Jacob Keller 2b634bb068 i40e/i40evf: rename bytes_per_int to bytes_per_usec
This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.

Rename the variable for clarity so that future developers understand
what the value is actually calculating.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:58 -07:00
David Ahern fa8fefaa67 net: ipv4: remove fib_info arg to fib_check_nh
fib_check_nh does not use the fib_info arg; remove t.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:19:32 +01:00
David Ahern c7c3e5913b net: ipv4: remove fib_weight
fib_weight in fib_info is set but not used. Remove it and the
helpers for setting it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:19:32 +01:00
David S. Miller fadad670a8 Merge branch 'bpf-extend-info'
Martin KaFai Lau says:

====================
bpf: Extend bpf_{prog,map}_info

This patch series adds more fields to bpf_prog_info and bpf_map_info.
Please see individual patch for details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:06 +01:00
Martin KaFai Lau 3a8ad560a9 bpf: Test new fields in bpf_attr and bpf_{prog, map}_info
This patch tests newly added fields of the bpf_attr,
bpf_prog_info and bpf_map_info.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Martin KaFai Lau 6e525d0667 bpf: Swap the order of checking prog_info and map_info
This patch swaps the checking order.  It now checks the map_info
first and then prog_info.  It is a prep work for adding
test to the newly added fields (the map_ids of prog_info field
in particular).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Martin KaFai Lau 88cda1c9da bpf: libbpf: Provide basic API support to specify BPF obj name
This patch extends the libbpf to provide API support to
allow specifying BPF object name.

In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used.  Regarding section name, all maps are
under the same section named "maps".  Hence, section name
is not a good choice for map's name.  To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.

This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.

The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument.  For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.

In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code.  For bpf_prog, bpf_load.c does
not collect the function symbol name.  We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Martin KaFai Lau ad5b177bd7 bpf: Add map_name to bpf_map_info
This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.

The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Martin KaFai Lau cb4d2b3f03 bpf: Add name, load_time, uid and map_ids to bpf_prog_info
The patch adds name and load_time to struct bpf_prog_aux.  They
are also exported to bpf_prog_info.

The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.

The kernel will reject name that contains characters other than
isalnum() and '_'.  It will also reject name that is not null
terminated.

The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.

The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info.  On the input, nr_map_ids tells how
big the userspace's map_ids buffer is.  On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Hoang Tran cf5d74b85e tcp: fix under-evaluated ssthresh in TCP Vegas
With the commit 76174004a0 (tcp: do not slow start when cwnd equals
ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would
under-evaluate the ssthresh.

Signed-off-by: Hoang Tran <hoang.tran@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:07:00 +01:00
Nikolay Aleksandrov 5af48b59f3 net: bridge: add per-port group_fwd_mask with less restrictions
We need to be able to transparently forward most link-local frames via
tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
mask which restricts the forwarding of STP and LACP, but we need to be able
to forward these over tunnels and control that forwarding on a per-port
basis thus add a new per-port group_fwd_mask option which only disallows
mac pause frames to be forwarded (they're always dropped anyway).
The patch does not change the current default situation - all of the others
are still restricted unless configured for forwarding.
We have successfully tested this patch with LACP and STP forwarding over
VxLAN and qinq tunnels.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:02:55 +01:00
David S. Miller de9c8a6a5f Merge branch 'hns3-dcb'
Yunsheng Lin says:

====================
Add support for DCB feature in hns3 driver

The patchset contains some enhancement related to DCB before
adding support for DCB feature.

This patchset depends on the following patchset:
https://patchwork.ozlabs.org/cover/815646/
https://patchwork.ozlabs.org/cover/816145/

High Level Architecture:

                   [ lldpad ]
                       |
                       |
                       |
                 [ hns3_dcbnl ]
                       |
                       |
                       |
                 [ hclge_dcb ]
                   /      \
                /            \
             /                  \
     [ hclge_main ]        [ hclge_tm ]

Current patch-set support following functionality:
   Use of lldptool to configure the tc schedule mode, tc
   bandwidth(if schedule mode is ETS), prio_tc_map and
   PFC parameter.

V3: Drop mqprio support

V2: Fix for not defining variables in local loop.

V1: Initial Submit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:12 -07:00
Yunsheng Lin 9df8f79a4d net: hns3: Add DCB support when interacting with network stack
When using lldptool to configure DCB parameter, hclge_dcb module
call the client_ops->setup_tc to tell network stack which queue
and priority is using for specific tc.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:12 -07:00
Yunsheng Lin 7979a22330 net: hns3: Setting for fc_mode and dcb enable flag in TM module
After the DCB feature is supported, fc_mode and dcb enable flag
must be set according to the DCB parameter.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:12 -07:00
Yunsheng Lin 986743dbf0 net: hns3: Add dcb netlink interface for the support of DCB feature
This patch add dcb netlink interface by calling the interface from
hclge_dcb module.

This patch also update Makefile in order to build hns3_dcbnl module.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:12 -07:00
Yunsheng Lin cacde272dd net: hns3: Add hclge_dcb module for the support of DCB feature
The hclge_dcb module calls the interface from hclge_main/tm
and provide interface for the dcb netlink interface.

This patch also update Makefiles required to build the DCB
supported code in HNS3 Ethernet driver and update the existing
Kconfig file in the hisilicon folder.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:12 -07:00
Yunsheng Lin 77f255c1c6 net: hns3: Add some interface for the support of DCB feature
This patch add some interface and export some interface from
hclge_tm and hclgc_main to support the upcoming DCB feature.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00
Yunsheng Lin cc9bb43ab3 net: hns3: Add tc-based TM support for sriov enabled port
When sriov is enabled and TM is in tc-based mode, vf's TM
parameters is not set in TM initialization process.
This patch add the tc_based TM support for sriov enabled
using the information in vport struct.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00
Yunsheng Lin 0a5677d39e net: hns3: Add support for port shaper setting in TM module
This patch add a tm_port_shaper cmd and set port shaper
to HCLGE_ETHER_MAX_RATE on TM initialization process.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00
Yunsheng Lin 9dc2145d91 net: hns3: Add support for PFC setting in TM module
This patch add a pfc_pause_en cmd, and use it to configure
PFC option according to fc_mode in hdev->tm_info.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00
Yunsheng Lin acf61ecd44 net: hns3: Add support for dynamically buffer reallocation
Current buffer allocation can only happen at init, when
doing buffer reallocation after init, care must be taken
care of memory which priv_buf points to.
This patch fixes it by using a dynamic allocated temporary
memory. Because we only do buffer reallocation at init or
when setting up the DCB parameter, and priv_buf is only
used at buffer allocation process, so it is ok to use a
dynamic allocated temporary memory.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00
Yunsheng Lin 9ffe79a9c2 net: hns3: Support for dynamically assigning tx buffer to TC
This patch add support of dynamically assigning tx buffer to
TC when the TC is enabled.
It will save buffer for rx direction to avoid packet loss.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28 10:35:11 -07:00