Commit Graph

507106 Commits

Author SHA1 Message Date
Eric Dumazet 0470c8ca1d inet: fix request sock refcounting
While testing last patch series, I found req sock refcounting was wrong.

We must set skc_refcnt to 1 for all request socks added in hashes,
but also on request sockets created by FastOpen or syncookies.

It is tricky because we need to defer this initialization so that
future RCU lookups do not try to take a refcount on a not yet
fully initialized request socket.

Also get rid of ireq_refcnt alias.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 13854e5a60 ("inet: add proper refcounting to request sock")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:02:29 -04:00
Eric Dumazet e3d95ad7da inet: avoid fastopen lock for regular accept()
It is not because a TCP listener is FastOpen ready that
all incoming sockets actually used FastOpen.

Avoid taking queue->fastopenq->lock if not needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet 9439ce00f2 tcp: rename struct tcp_request_sock listener
The listener field in struct tcp_request_sock is a pointer
back to the listener. We now have req->rsk_listener, so TCP
only needs one boolean and not a full pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet 4e9a578e5b inet: add rsk_listener field to struct request_sock
Once we'll be able to lookup request sockets in ehash table,
we'll need to get access to listener which created this request.

This avoid doing a lookup to find the listener, which benefits
for a more solid SO_REUSEPORT, and is needed once we no
longer queue request sock into a listener private queue.

Note that 'struct tcp_request_sock'->listener could be reduced
to a single bit, as TFO listener should match req->rsk_listener.
TFO will no longer need to hold a reference on the listener.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet e49bb337d7 inet: uninline inet_reqsk_alloc()
inet_reqsk_alloc() is becoming fat and should not be inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:56 -04:00
Eric Dumazet 407640de21 inet: add sk_listener argument to inet_reqsk_alloc()
listener socket can be used to set net pointer, and will
be later used to hold a reference on listener.

Add a const qualifier to first argument (struct request_sock_ops *),
and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 22:01:55 -04:00
David S. Miller 9f2dbdd9b1 Merge branch 'listener_refactor_part_11'
Eric Dumazet says:

====================
inet: tcp listener refactoring, part 11

Before inserting request sockets into general (ehash) table,
we need to prepare netfilter to cope with them, as they are
not full sockets.

I'll later change xt_socket to get full support, including for
request sockets (NEW_SYN_RECV)

Save 8 bytes in inet_request_sock on 64bit arches. We'll soon add
a pointer to the listener socket.

I included two TCP changes in this patch series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:18:12 -04:00
Eric Dumazet 7970ddc8f9 tcp: uninline tcp_oow_rate_limited()
tcp_oow_rate_limited() is hardly used in fast path, there is
no point inlining it.

Signed-of-by: Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:18:00 -04:00
Eric Dumazet 1bfc4438a7 tcp: move tcp_openreq_init() to tcp_input.c
This big helper is called once from tcp_conn_request(), there is no
point having it in an include. Compiler will inline it anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:18:00 -04:00
Eric Dumazet adc17d6a6c inet: move ir_mark to fill a hole
On 64bit arches, we can save 8 bytes in inet_request_sock
by moving ir_mark to fill a hole.

While we are at it, inet_request_mark() can get a const qualifier
for listener socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Eric Dumazet a940700003 netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.

xt_socket will be able to match them, but first we need
to make sure to not consider them as full sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Eric Dumazet 8b58014779 netfilter: tproxy: prepare TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Eric Dumazet a8399231f0 netfilter: use sk_fullsock() helper
Upcoming request sockets have TCP_NEW_SYN_RECV state and should
be special cased a bit like TCP_TIME_WAIT sockets.

Signed-off-by; Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:17:59 -04:00
Alexei Starovoitov c249739579 bpf: allow BPF programs access 'protocol' and 'vlan_tci' fields
as a follow on to patch 70006af955 ("bpf: allow eBPF access skb fields")
this patch allows 'protocol' and 'vlan_tci' fields to be accessible
from extended BPF programs.

The usage of 'protocol', 'vlan_present' and 'vlan_tci' fields is the same as
corresponding SKF_AD_PROTOCOL, SKF_AD_VLAN_TAG_PRESENT and SKF_AD_VLAN_TAG
accesses in classic BPF.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:06:31 -04:00
David S. Miller 9cf7867c24 Merge branch 'const_of_device_id'
Fabian Frederick says:

====================
drivers/net: constify of_device_id array

This small patchset adds const to of_device_id arrays in
drivers/net branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:28 -04:00
Fabian Frederick 2c71ec9963 via-velocity: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:24 -04:00
Fabian Frederick d2b75a3f7d net: via-rhine: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:23 -04:00
Fabian Frederick abae1e0718 ehea: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:23 -04:00
Fabian Frederick 47b6166793 IBM-EMAC: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:23 -04:00
Fabian Frederick 486e957033 can: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:23 -04:00
Fabian Frederick d8a7dadbdf net: phy: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:23 -04:00
Fabian Frederick 3848124b3e orinoco: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:22 -04:00
Fabian Frederick 74847f231c net: xilinx: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:22 -04:00
Fabian Frederick 73c7047464 net: greth: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:22 -04:00
Fabian Frederick 437dab40bb netdev: octeon_mgmt: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:22 -04:00
Fabian Frederick 14448e2f80 net: ethernet: apple: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:22 -04:00
Fabian Frederick a6b0dc2af4 drivers: net: xgene: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:21 -04:00
Fabian Frederick fa2b183726 net: ethoc: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:21 -04:00
Fabian Frederick 94e5a2a88a net/fsl: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:21 -04:00
Fabian Frederick 27260530db Altera TSE: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:21 -04:00
Fabian Frederick 1156c96538 net: netcp: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:00:21 -04:00
Thomas Graf 617011e7d5 rhashtable: Avoid calculating hash again to unlock
Caching the lock pointer avoids having to hash on the object
again to unlock the bucket locks.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 17:14:34 -04:00
Eric Dumazet 9f1ab18672 tcp_metrics: fix wrong lockdep annotations
Changes in tcp_metric hash table are protected by tcp_metrics_lock
only, not by genl_mutex

While we are at it use deref_locked() instead of rcu_dereference()
in tcp_new() to avoid unnecessary barrier, as we hold tcp_metrics_lock
as well.

Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 098a697b49 ("tcp_metrics: Use a single hash table for all network namespaces.")
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:32:23 -04:00
Jiri Pirko bd76a11670 dsa: change "select" to "depends on" for NET_SWITCHDEV and for NET_DSA
This would fix randconfig compile error:
net/built-in.o: In function `netdev_switch_fib_ipv4_abort':
(.text+0xf7811): undefined reference to `fib_flush_external'

Also it fixes following warnings:
warning: (NET_DSA) selects NET_SWITCHDEV which has unmet direct dependencies (NET && INET)

warning: (NET_DSA_MV88E6060 && NET_DSA_MV88E6131 && NET_DSA_MV88E6123_61_65 && NET_DSA_MV88E6171 && NET_DSA_MV88E6352 && NET_DSA_BCM_SF2) selects NET_DSA which has unmet direct dependencies (NET && HAVE_NET_DSA && NET_SWITCHDEV)

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:29:18 -04:00
Shaohui Xie 73ee544297 net/fsl: modify xgmac_mdio for little endian SoCs
MDIO controller on little endian Socs, e.g. ls2085a is similar to the
controller on big endian Socs, but the MDIO access is little endian,
we use I/O accessor function to handle endianness, so the driver can
run on little endian Socs. A property "little-endian" is used
in DTS to indicate the MDIO is little endian, if driver probes the
property, driver will access MDIO in little endian, otherwise, driver
works in big endian by default.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:27:51 -04:00
Shaohui Xie 26eee0210a net/fsl: fix a bug in xgmac_mdio
There is a bug in xgmac_wait_until_done() which mdio_stat should be used
instead of mdio_data when checking if busy bit is cleared.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:27:51 -04:00
Ying Xue c243d7e209 net: kernel socket should be released in init_net namespace
Creating a kernel socket with sock_create_kern() happens in "init_net"
namespace, however, releasing it with sk_release_kernel() occurs in
the current namespace which may be different with "init_net" namespace.
Therefore, we should guarantee that the namespace in which a kernel
socket is created is same as the socket is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:25:06 -04:00
Thomas Graf db4374f48a rhashtable: Annotate RCU locking of walkers
Fixes the following sparse warnings:

lib/rhashtable.c:767:5: warning: context imbalance in 'rhashtable_walk_start' - wrong count at exit
lib/rhashtable.c:849:6: warning: context imbalance in 'rhashtable_walk_stop' - unexpected unlock

Fixes: f2dba9c6ff ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:24:13 -04:00
Scott Feldman 04f49faf70 rocker: replace fixed stack allocation with dynamic allocation
In hast to fix some sparse warning, I hard-coded a fix-sized array on the stack
which is probably too big for kernel standards.  Fix this by converting array
to dynamic allocation.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:56:36 -04:00
David S. Miller b35f504a0c Merge branch 'listener_refactor'
Eric Dumazet says:

====================
inet: tcp listener refactoring, part 10

We are getting close to the point where request sockets will be hashed
into generic hash table. Some followups are needed for netfilter and
will be handled in next patch series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:47 -04:00
Eric Dumazet 13854e5a60 inet: add proper refcounting to request sock
reqsk_put() is the generic function that should be used
to release a refcount (and automatically call reqsk_free())

reqsk_free() might be called if refcount is known to be 0
or undefined.

refcnt is set to one in inet_csk_reqsk_queue_add()

As request socks are not yet in global ehash table,
I added temporary debugging checks in reqsk_put() and reqsk_free()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:29 -04:00
Eric Dumazet 2c13270b44 inet: factorize sock_edemux()/sock_gen_put() code
sock_edemux() is not used in fast path, and should
really call sock_gen_put() to save some code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:29 -04:00
Eric Dumazet a58917f584 inet_diag: allow sk_diag_fill() to handle request socks
inet_diag_fill_req() is renamed to inet_req_diag_fill()
and moved up, so that it can be called fom sk_diag_fill()

inet_diag_bc_sk() is ready to handle request socks.

inet_twsk_diag_dump() is no longer needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:29 -04:00
Eric Dumazet f7e4eb03f9 inet: ip early demux should avoid request sockets
When a request socket is created, we do not cache ip route
dst entry, like for timewait sockets.

Let's use sk_fullsock() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:29 -04:00
Eric Dumazet 1d0ab25387 net: add sk_fullsock() helper
We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 15:55:28 -04:00
David S. Miller f00bbd219c Merge branch 'swdev_ops'
Scott Feldman says:

====================
switchdev: add swdev ops

v3:

 - Fix missing include for DSA build

v2:

 - Per Simon's review, squash some of the dependent commits into one to
   make series git bisect safe.

v1:

Per discussions at netconf, move switchdev ndo ops to a new swdev_ops to
keep ndo namespace clean and maintain switchdev-related ops into one place.

There are no functional changes here; just shuffling ops around for better
organization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:47 -04:00
Scott Feldman 812a1c3ff3 netdev: remove ndo ops for switchdev
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:43 -04:00
Scott Feldman 98237d433b switchdev: use new swdev ops
Move swdev wrappers over to new swdev ops (from previous ndo ops).  No
functional changes to the implementation.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

rocker: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

dsa: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:43 -04:00
Scott Feldman 4170604fee switchdev: add swdev ops
As discussed at netconf, introduce swdev_ops as first step to move switchdev
ops from ndo to swdev.  This will keep switchdev from cluttering up ndo ops
space.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 00:14:42 -04:00
David S. Miller 7993d44ea1 Merge branch 'rhashtable-fixes-next'
Herbert Xu says:

====================
rhashtable: Fix two bugs caused by multiple rehash preparation

While testing some new patches over the weekend I discovered a
couple of bugs in the series that had just been merged.  These
two patches fix them:

1) A use-after-free in the walker that can cause crashes when
walking during a rehash.

2) When a second rehash starts during a single rhashtable_remove
call the remove may fail when it shouldn't.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 22:22:17 -04:00