Commit Graph

709195 Commits

Author SHA1 Message Date
Kees Cook
7aa1402e2e net: ethernet/sfc: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 12:57:33 +09:00
Kees Cook
fc8bcaa051 net: LLC: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 12:06:25 +09:00
Kees Cook
8dbd05ff5c net: ax25: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Joerg Reuter <jreuter@yaina.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 12:03:56 +09:00
Kees Cook
9c3b575183 net: sctp: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 12:02:09 +09:00
David S. Miller
a4cdd9ff1f Merge branch 'net-remove-rtmsg_ifinfo-used-in-bridge-and-bonding'
Xin Long says:

====================
net: remove rtmsg_ifinfo used in bridge and bonding

It's better to send notifications to userspace by the events in
rtnetlink_event, instead of calling rtmsg_ifinfo directly.

This patcheset is to remove rtmsg_ifinfo called in bonding and
bridge, the notifications can be handled by NETDEV_CHANGEUPPER
and NETDEV_CHANGELOWERSTATE events in rtnetlink_event.

It could also fix some redundant notifications from bonding and
bridge.

v1->v2:
  - post to net-next.git instead of net.git, for it's more like an
    improvement for bonding
v2->v3:
  - add patch 1/4 to remove rtmsg_ifinfo called in add_del_if
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:54:40 +09:00
Xin Long
ef5201c83d bonding: remove rtmsg_ifinfo called after bond_lower_state_changed
After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event
process back to rtnetlink_event', bond_lower_state_changed would
generate NETDEV_CHANGEUPPER event which would send a notification
to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification
any more. So this patch is to remove it from these places after
bond_lower_state_changed.

Besides, after this, rtmsg_ifinfo is not needed to be exported.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:54:39 +09:00
Xin Long
eeda3fb9e1 rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event
This patch is to bring NETDEV_CHANGELOWERSTATE event process back
to rtnetlink_event so that bonding could use it instead of calling
rtmsg_ifinfo to send a notification to userspace after netdev lower
state is changed in the later patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:54:39 +09:00
Xin Long
4597efe312 bonding: remove rtmsg_ifinfo called in bond_master_upper_dev_link
Since commit 42e52bf9e3 ("net: add netnotifier event for upper device
change"), netdev_master_upper_dev_link has generated NETDEV_CHANGEUPPER
event which would send a notification to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification any more.
So this patch is to remove it from bond_master_upper_dev_link as well
as bond_upper_dev_unlink to avoid the redundant notifications.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:54:39 +09:00
Xin Long
fbb85b3c01 bridge: remove rtmsg_ifinfo called in add_del_if
Since commit dc709f3757 ("rtnetlink: bring NETDEV_CHANGEUPPER event
process back in rtnetlink_event"), rtnetlink_event would process the
NETDEV_CHANGEUPPER event and send a notification to userspace.

In add_del_if, it would generate NETDEV_CHANGEUPPER event by whether
netdev_master_upper_dev_link or netdev_upper_dev_unlink. There's
no need to call rtmsg_ifinfo to send the notification any more.

So this patch is to remove it from add_del_if also to avoid redundant
notifications.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:54:39 +09:00
David S. Miller
d4588211c6 Merge branch 'bpf-permit-multiple-bpf-attachments-for-a-single-perf-tracepoint-event'
Yonghong Song says:

====================
bpf: permit multiple bpf attachments for a single perf tracepoint event

This patch set adds support to permit multiple bpf prog attachments
for a single perf tracepoint event. Patch 1 does some cleanup such
that perf_event_{set|free}_bpf_handler is called under the
same condition. Patch 2 has the core implementation, and
Patch 3 adds a test case.

Changelogs:
v2 -> v3:
  . fix compilation error.
v1 -> v2:
  . fix a potential deadlock issue discovered by Daniel.
  . fix some coding style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:47:47 +09:00
Yonghong Song
a678be5cc7 bpf: add a test case to test single tp multiple bpf attachment
The bpf sample program syscall_tp is modified to
show attachment of more than bpf programs
for a particular kernel tracepoint.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:47:47 +09:00
Yonghong Song
e87c6bc385 bpf: permit multiple bpf attachments for a single perf event
This patch enables multiple bpf attachments for a
kprobe/uprobe/tracepoint single trace event.
Each trace_event keeps a list of attached perf events.
When an event happens, all attached bpf programs will
be executed based on the order of attachment.

A global bpf_event_mutex lock is introduced to protect
prog_array attaching and detaching. An alternative will
be introduce a mutex lock in every trace_event_call
structure, but it takes a lot of extra memory.
So a global bpf_event_mutex lock is a good compromise.

The bpf prog detachment involves allocation of memory.
If the allocation fails, a dummy do-nothing program
will replace to-be-detached program in-place.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:47:47 +09:00
Yonghong Song
0b4c6841fe bpf: use the same condition in perf event set/free bpf handler
This is a cleanup such that doing the same check in
perf_event_free_bpf_prog as we already do in
perf_event_set_bpf_prog step.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:47:46 +09:00
Shmulik Ladkani
908d140a87 ip6_tunnel: Allow rcv/xmit even if remote address is a local address
Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote
address (outer v6 destination) is one of host's locally configured
addresses.
Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address
(outer v6 source) is a local address.

This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote
endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such
configurations.

An example where this proves useful is a system where entities are
identified by their unique v6 addresses, and use tunnels to encapsulate
traffic between them. The limitation prevents placing several entities
on same host.

Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this
restriction.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:33:27 +09:00
David S. Miller
6a331e1513 Merge branch 'mlxsw-Various-fixes'
Jiri Pirko says:

====================
mlxsw: Various fixes

Yotam says:

Cleanup some issues reported by sparse.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
ea00aa3a27 mlxsw: spectrum: mr_tcam: Include the mr_tcam header file
Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
6a30dc29a4 mlxsw: spectrum: mr: Make the function mlxsw_sp_mr_dev_vif_lookup static
The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
de3872cd18 mlxsw: spectrum: mr: Fix various endianness issues
Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.

Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:212:31:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31:    got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:214:32:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32:    got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Arkadi Sharshevsky
69715dd50d mlxsw: spectrum_dpipe: Fix entries dump of the adjacency table
During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.

Fixes: 190d38a52a ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:02:02 +09:00
Or Gerlitz
9d452cebd7 net/sched: Fix actions list corruption when adding offloaded tc flows
Prior to commit b3f55bdda8, the networking core doesn't wire an in-place
actions list the when the low level driver is called to offload the flow,
but all low level drivers do that (call tcf_exts_to_list()) in their
offloading "add" logic.

Now, the in-place list is set in the core which goes over the list in a loop,
but also by the hw driver when their offloading code is invoked indirectly:

	cls_xxx add flow -> tc_setup_cb_call -> tc_exts_setup_cb_egdev_call -> hw driver

which messes up the core list instance upon driver return. Fix that by avoiding
in-place list on the net core code that deals with adding flows.

Fixes: b3f55bdda8 ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:00:54 +09:00
Veerasenareddy Burru
907aaa6bab liquidio: pass date and time info to NIC firmware
Pass date and time information to NIC at the time of loading
firmware and periodically update the host time to NIC firmware.
This is to make NIC firmware use the same time reference as Host,
so that it is easy to correlate logs from firmware and host for debugging.

Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:57:10 +09:00
Wei Wang
87b1af8dcc ipv6: add ip6_null_entry check in rt6_select()
In rt6_select(), fn->leaf could be pointing to net->ipv6.ip6_null_entry.
In this case, we should directly return instead of trying to carry on
with the rest of the process.
If not, we could crash at:
  spin_lock_bh(&leaf->rt6i_table->rt6_lock);
because net->ipv6.ip6_null_entry does not have rt6i_table set.

Syzkaller recently reported following issue on net-next:
Use struct sctp_sack_info instead
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
sctp: [Deprecated]: syz-executor4 (pid 26496) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
CPU: 1 PID: 26523 Comm: syz-executor6 Not tainted 4.14.0-rc4+ #85
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d147e3c0 task.stack: ffff8801a4328000
RIP: 0010:debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
RIP: 0010:do_raw_spin_lock+0x23/0x1e0 kernel/locking/spinlock_debug.c:112
RSP: 0018:ffff8801a432ed70 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 000000000000001c
RBP: ffff8801a432ed90 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff8482b279 R12: ffff8801ce2ff3a0
sctp: [Deprecated]: syz-executor1 (pid 26546) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
R13: dffffc0000000000 R14: ffff8801d971e000 R15: ffff8801ce2ff0d8
FS:  00007f56e82f5700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001ddbc22000 CR3: 00000001a4a04000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
 _raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:175
 spin_lock_bh include/linux/spinlock.h:321 [inline]
 rt6_select net/ipv6/route.c:786 [inline]
 ip6_pol_route+0x1be3/0x3bd0 net/ipv6/route.c:1650
sctp: [Deprecated]: syz-executor1 (pid 26576) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1843
 fib6_rule_lookup+0x9e/0x2a0 net/ipv6/ip6_fib.c:309
 ip6_route_output_flags+0x1f1/0x2b0 net/ipv6/route.c:1871
 ip6_route_output include/net/ip6_route.h:80 [inline]
 ip6_dst_lookup_tail+0x4ea/0x970 net/ipv6/ip6_output.c:953
 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1076
 sctp_v6_get_dst+0x675/0x1c30 net/sctp/ipv6.c:274
 sctp_transport_route+0xa8/0x430 net/sctp/transport.c:287
 sctp_assoc_add_peer+0x4fe/0x1100 net/sctp/associola.c:656
 __sctp_connect+0x251/0xc80 net/sctp/socket.c:1187
 sctp_connect+0xb4/0xf0 net/sctp/socket.c:4209
 inet_dgram_connect+0x16b/0x1f0 net/ipv4/af_inet.c:541
 SYSC_connect+0x20a/0x480 net/socket.c:1642
 SyS_connect+0x24/0x30 net/socket.c:1623
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: 66f5d6ce53 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:51:26 +09:00
Christoph Paasch
71c02379c7 tcp: Configure TFO without cookie per socket and/or per route
We already allow to enable TFO without a cookie by using the
fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
TFO_CLIENT_NO_COOKIE).
This is safe to do in certain environments where we know that there
isn't a malicous host (aka., data-centers) or when the
application-protocol already provides an authentication mechanism in the
first flight of data.

A server however might be providing multiple services or talking to both
sides (public Internet and data-center). So, this server would want to
enable cookie-less TFO for certain services and/or for connections that
go to the data-center.

This patch exposes a socket-option and a per-route attribute to enable such
fine-grained configurations.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:48:08 +09:00
Tim Hansen
b6f4f8484d net/sock: Update sk rcu iterator macro.
Mark hlist node in sk rcu iterator as protected by the rcu.
hlist_next_rcu accomplishes this and silences the warnings
sparse throws.

Found with make C=1 net/ipv4/udp.o on linux-next tag
next-20171009.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:46:22 +09:00
Gustavo A. R. Silva
49ca1943a7 ipv4: tcp_minisocks: use BUG_ON instead of if condition followed by BUG
Use BUG_ON instead of if condition followed by BUG in tcp_time_wait.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:44:42 +09:00
Gustavo A. R. Silva
1528540255 ipv4: icmp: use BUG_ON instead of if condition followed by BUG
Use BUG_ON instead of if condition followed by BUG in icmp_timestamp.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:44:42 +09:00
Jesper Dangaard Brouer
31749468c3 bpf: cpumap fix potential lost wake-up problem
As pointed out by Michael, commit 1c601d829a ("bpf: cpumap xdp_buff
to skb conversion and allocation") contains a classical example of the
potential lost wake-up problem.

We need to recheck the condition __ptr_ring_empty() after changing
current->state to TASK_INTERRUPTIBLE, this avoids a race between
wake_up_process() and schedule(). After this, a race with
wake_up_process() will simply change the state to TASK_RUNNING, and
the schedule() call not really put us to sleep.

Fixes: 1c601d829a ("bpf: cpumap xdp_buff to skb conversion and allocation")
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:40:22 +09:00
Gustavo A. R. Silva
7f6b437e9b net: smc_close: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:29:39 +09:00
Gustavo A. R. Silva
e3cf39706b net: rxrpc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:27:06 +09:00
David S. Miller
6a413f5cf6 Merge branch 'ipv6-addrconf-hash-improvements-and-cleanups'
Eric Dumazet says:

====================
ipv6: addrconf: hash improvements and cleanups

Remove unecessary BH blocking, and bring IPv6 addrconf to modern world,
with per netns hash perturbation and decent hash size.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:20 +09:00
Eric Dumazet
4e5f47ab97 ipv6: addrconf: do not block BH in ipv6_chk_home_addr()
rcu_read_lock() is enough here, no need to block BH.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
a5c1d98f8c ipv6: addrconf: do not block BH in /proc/net/if_inet6 handling
Table is really RCU protected, no need to block BH

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
24f226da96 ipv6: addrconf: do not block BH in ipv6_get_ifaddr()
rcu_read_lock() is enough here, no need to block BH.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
480318a0a4 ipv6: addrconf: do not block BH in ipv6_chk_addr_and_flags()
rcu_read_lock() is enough here, as inet6_ifa_finish_destroy()
uses kfree_rcu()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
3f27fb2321 ipv6: addrconf: add per netns perturbation in inet6_addr_hash()
Bring IPv6 in par with IPv4 :

- Use net_hash_mix() to spread addresses a bit more.
- Use 256 slots hash table instead of 16

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
752a92927e ipv6: addrconf: factorize inet6_addr_hash() call
ipv6_add_addr_hash() can compute the hash value outside of
locked section and pass it to ipv6_chk_same_addr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
Eric Dumazet
56fc709b7a ipv6: addrconf: move ipv6_chk_same_addr() to avoid forward declaration
ipv6_chk_same_addr() is only used by ipv6_add_addr_hash(),
so moving it avoids a forward declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:54:19 +09:00
David S. Miller
fa6e23e2b2 Merge branch 'nfp-bpf-stack-support-in-offload'
Jakub Kicinski says:

====================
nfp: bpf: stack support in offload

This series brings stack support for offload.

We use the LMEM (Local memory) register file as memory to store
the stack.  Since this is a register file we need to do appropriate
shifts on unaligned accesses.  Verifier's state tracking helps us
with that.

LMEM can't be accessed directly, so we add support for setting
pointer registers through which one can read/write LMEM.

This set does not support accessing the stack when the alignment
is not known.  This can be added later (most likely using the byte_align
instructions).  There is also a number of optimizations which have been
left out:
 - in more complex non aligned accesses, double shift and rotation
   can save us a cycle.  This, however, leads to code explosion
   since all access sizes have to be coded separately;
 - since setting LM pointers costs around 5 cycles, we should be
   tracking their values to make sure we don't move them when
   they're already set correctly for earlier access;
 - in case of 8 byte access aligned to 4 bytes and crossing
   32 byte boundary but not crossing a 64 byte boundary we don't
   have to increment the pointer, but this seems like a pretty
   rare case to justify the added complexity.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:38 +09:00
Jakub Kicinski
9f16c8abcd nfp: bpf: optimize mov64 a little
Loading 64bit constants require up to 4 load immediates, since
we can only load 16 bits at a time.  If the 32bit halves of
the 64bit constant are the same, however, we can save a cycle
by doing a register move instead of two loads of 16 bits.

Note that we don't optimize the normal ALU64 load because even
though it's a 64 bit load the upper half of the register is
a coming from sign extension so we can load it in one cycle
anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
b14157eeed nfp: bpf: support stack accesses via non-constant pointers
If stack pointer has a different value on different paths
but the alignment to words (4B) remains the same, we can
set a new LMEM access pointer to the calculated value and
access whichever word it's pointing to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
2df03a50f1 nfp: bpf: support accessing the stack beyond 64 bytes
To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers).  Add a function for encoding local CSR access
instruction.  Use stack pointer number 3.

Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted).  This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.

We depend on the datapath placing the stack base address in
GPR A22 for our use.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
d348848063 nfp: bpf: allow stack accesses via modified stack registers
As long as the verifier tells us the stack offset exactly we
can render the LMEM reads quite easily.  Simply make sure that
the offset is constant for a given instruction and add it to
the instruction's offset.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
9a90c83c09 nfp: bpf: optimize the RMW for stack accesses
When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle.  E.g. for reading 8 bytes
from address 17:

0:  tmp    = stack[16]
1:  gprLo  = tmp >> 8
2:  tmp    = stack[20]
3:  gprLo |= tmp << 24
4:  tmp    = stack[20]
5:  gprHi  = tmp >> 8
6:  tmp    = stack[24]
7:  gprHi |= tmp << 24

The load on line 4 is unnecessary, because tmp already contains data
from stack[20].

For write we can optimize both loads and writebacks away.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
a82b23fb38 nfp: bpf: add stack read support
Add simple stack read support, similar to write in every aspect,
but data flowing the other way.  Note that unlike write which can
be done in smaller than word quantities, if registers are loaded
with less-than-word of stack contents - the values have to be
zero extended.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
ee9133a845 nfp: bpf: add stack write support
Stack is implemented by the LMEM register file.  Unaligned accesses
to LMEM are not allowed.  Accesses also have to be 4B wide.

To support stack we need to make sure offsets of pointers are known
at translation time (for now) and perform correct load/mask/shift
operations.

Since we can access first 64B of LMEM without much effort support
only stacks not bigger than 64B.  Following commits will extend
the possible sizes beyond that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
70c78fc138 nfp: bpf: refactor nfp_bpf_check_ptr()
nfp_bpf_check_ptr() mostly looks at the pointer register.
Add a temporary variable to shorten the code.

While at it make sure we print error messages if translation
fails to help users identify the problem (to be carried in
ext_ack in due course).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
ff42bb9fe3 nfp: bpf: add helper for emitting nops
The need to emitting a few nops will become more common soon
as we add stack and map support.  Add a helper.  This allows
for code to be shorter but also may be handy for marking the
nops with a "reason" to ease applying optimizations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
David S. Miller
a5dd498287 Merge branch 'bpftool-JSON'
Jakub Kicinski says:

====================
tools: bpftool: Add JSON output to bpftool

Quentin says:

This series introduces support for JSON output to all bpftool commands. It
adds option parsing, and several options are created:

  * -j, --json     Switch to JSON output.
  * -p, --pretty   Switch to JSON and print it in a human-friendly fashion.
  * -h, --help     Print generic help message.
  * -V, --version  Print version number.

This code uses a "json_writer", which is a copy of the one written by
Stephen Hemminger in iproute2.
---
I don't know if there is an easy way to share the code for json_write
without copying the file, so I am very open to suggestions on this matter.
====================

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:30:45 +01:00
Quentin Monnet
0641c3c890 tools: bpftool: update documentation for --json and --pretty usage
Update the documentation to provide help about JSON output generation,
and add an example in bpftool-prog manual page.

Also reintroduce an example that was left aside when the tool was moved
from GitHub to the kernel sources, in order to show how to mount the
bpffs file system (to pin programs) inside the bpftool-prog manual page.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:25:09 +01:00
Quentin Monnet
47ff7ac6d7 tools: bpftool: add cosmetic changes for the manual pages
Make the look-and-feel of the manual pages somewhat closer to other
manual pages, such as the ones from the utilities from iproute2, by
highlighting more keywords.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:25:09 +01:00