Commit Graph

38175 Commits

Author SHA1 Message Date
Jozsef Kadlecsik f690cbaed9 netfilter: ipset: Fix cidr handling for hash:*net* types
Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.

Testsuite entry is added to verify the fix.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:14 +02:00
Sergey Popovich aff227581e netfilter: ipset: Check CIDR value only when attribute is given
There is no reason to check CIDR value regardless attribute
specifying CIDR is given.

Initialize cidr array in element structure on element structure
declaration to let more freedom to the compiler to optimize
initialization right before element structure is used.

Remove local variables cidr and cidr2 for netnet and netportnet
hashes as we do not use packed cidr value for such set types and
can store value directly in e.cidr[].

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:14 +02:00
Sergey Popovich a212e08e8e netfilter: ipset: Make sure we always return line number on batch
Even if we return with generic IPSET_ERR_PROTOCOL it is good idea
to return line number if we called in batch mode.

Moreover we are not always exiting with IPSET_ERR_PROTOCOL. For
example hash:ip,port,net may return IPSET_ERR_HASH_RANGE_UNSUPPORTED
or IPSET_ERR_INVALID_CIDR.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:13 +02:00
Sergey Popovich 2c227f278a netfilter: ipset: Permit CIDR equal to the host address CIDR in IPv6
Permit userspace to supply CIDR length equal to the host address CIDR
length in netlink message. Prohibit any other CIDR length for IPv6
variant of the set.

Also return -IPSET_ERR_HASH_RANGE_UNSUPPORTED instead of generic
-IPSET_ERR_PROTOCOL in IPv6 variant of hash:ip,port,net when
IPSET_ATTR_IP_TO attribute is given.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:13 +02:00
Sergey Popovich 7dd37bc8e6 netfilter: ipset: Check extensions attributes before getting extensions.
Make all extensions attributes checks within ip_set_get_extensions()
and reduce number of duplicated code.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:13 +02:00
Sergey Popovich edda079174 netfilter: ipset: Use SET_WITH_*() helpers to test set extensions
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:12 +02:00
David S. Miller 25c43bf13b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-13 23:56:52 -07:00
Eric Dumazet a2f0fad32b tcp: tcp_v6_connect() cleanup
Remove dead code from tcp_v6_connect()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 21:59:25 -07:00
Eric Dumazet 1e98a0f08a flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs
__skb_header_pointer() returns a pointer that must be checked.

Fixes infinite loop reported by Alexei, and add __must_check to
catch these errors earlier.

Fixes: 6a74fcf426 ("flow_dissector: add support for dst, hop-by-hop and routing ext hdrs")
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 21:58:49 -07:00
Vincent Cuissard 34ac496641 NFC: nci: remove current SLEEP mode management
NCI deactivate management was modified to support all NCI
deactivation type. Problem is that all the API are not ready
yet for it.

Problem is that with current code, when neard asks to deactivate
the tag it sends a deactivate SLEEP but nobody will then send a
IDLE deactivate. This IDLE deactivate is mandatory since NFC
controller can only be unlocked by DH.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-13 00:08:09 +02:00
Tom Herbert 6a74fcf426 flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
If dst, hop-by-hop or routing extension headers are present determine
length of the options and skip over them in flow dissection.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 14:24:28 -07:00
Tom Herbert 611d23c559 flow_dissector: Fix MPLS entropy label handling in flow dissector
Need to shift after masking to get label value for comparison.

Fixes: b3baa0fbd0 ("mpls: Add MPLS entropy label in flow_keys")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 14:24:27 -07:00
Florian Westphal b60f2f3d65 net: ipv4: un-inline ip_finish_output2
text    data     bss     dec     hex filename
old: 16527      44       0   16571    40bb net/ipv4/ip_output.o
new: 14935      44       0   14979    3a83 net/ipv4/ip_output.o

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 14:19:17 -07:00
Marcelo Ricardo Leitner ae36806a62 sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
Currently, we can ask to authenticate DATA chunks and we can send DATA
chunks on the same packet as COOKIE_ECHO, but if you try to combine
both, the DATA chunk will be sent unauthenticated and peer won't accept
it, leading to a communication failure.

This happens because even though the data was queued after it was
requested to authenticate DATA chunks, it was also queued before we
could know that remote peer can handle authenticating, so
sctp_auth_send_cid() returns false.

The fix is whenever we set up an active key, re-check send queue for
chunks that now should be authenticated. As a result, such packet will
now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

Reported-by: Liu Wei <weliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 14:18:20 -07:00
Matan Barak 8e37210b38 IB/core: Change ib_create_cq to use struct ib_cq_init_attr
Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.

Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:10 -04:00
Chuck Lever 40c6ed0c8a xprtrdma: Reduce per-transport MR allocation
Reduce resource consumption per-transport to make way for increasing
the credit limit and maximum r/wsize. Pre-allocate fewer MRs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever acb9da7a57 xprtrdma: Stack relief in fmr_op_map()
fmr_op_map() declares a 64 element array of u64 in automatic
storage. This is 512 bytes (8 * 64) on the stack.

Instead, when FMR memory registration is in use, pre-allocate a
physaddr array for each rpcrdma_mw.

This is a pre-requisite for increasing the r/wsize maximum for
FMR on platforms with 4KB pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever 58d1dcf5a8 xprtrdma: Split rb_lock
/proc/lock_stat showed contention between rpcrdma_buffer_get/put
and the MR allocation functions during I/O intensive workloads.

Now that MRs are no longer allocated in rpcrdma_buffer_get(),
there's no reason the rb_mws list has to be managed using the
same lock as the send/receive buffers. Split that lock. The
new lock does not need to disable interrupts because buffer
get/put is never called in an interrupt context.

struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws
are always in a different cacheline than rb_lock and the buffer
pointers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever 7e53df111b xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
Clean up: This field is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever 3269a94b62 xprtrdma: Remove ->ro_reset
An RPC can exit at any time. When it does so, xprt_rdma_free() is
called, and it calls ->op_unmap().

If ->ro_reset() is running due to a transport disconnect, the two
methods can race while processing the same rpcrdma_mw. The results
are unpredictable.

Because of this, in previous patches I've altered ->ro_map() to
handle MR reset. ->ro_reset() is no longer needed and can be
removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever 06b00880b0 xprtrdma: Remove unused LOCAL_INV recovery logic
Clean up: Remove functions no longer used to recover broken FRMRs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever c14d86e591 xprtrdma: Acquire MRs in rpcrdma_register_external()
Acquiring 64 MRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because most modern adapters
can transfer 100s of KBs of payload using just a single MR.

Instead, acquire MRs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.

Note: commit 539431a437 ("xprtrdma: Don't invalidate FRMRs if
registration fails") is reverted: There is now a valid case where
registration can fail (with -ENOMEM) but the QP is still in RTS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:37 -04:00
Chuck Lever 951e721ca0 xprtrdma: Introduce an FRMR recovery workqueue
After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.

Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.

A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.

Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).

This mechanism is not yet used, but will be in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever fc7fbb59e7 xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
Acquiring 64 FMRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because FMR mode can
transfer up to a 1MB payload using just a single ib_fmr.

Instead, acquire ib_fmrs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever 346aa66b2a xprtrdma: Introduce helpers for allocating MWs
We eventually want to handle allocating MWs one at a time, as
needed, instead of grabbing 64 and throwing them at each RPC in the
pipeline.

Add a helper for grabbing an MW off rb_mws, and a helper for
returning an MW to rb_mws. These will be used in a subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever 89e0d11258 xprtrdma: Use ib_device pointer safely
The connect worker can replace ri_id, but prevents ri_id->device
from changing during the lifetime of a transport instance. The old
ID is kept around until a new ID is created and the ->device is
confirmed to be the same.

Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
The cached copy can be used safely in code that does not serialize
with the connect worker.

Other code can use it to save an extra address generation (one
pointer dereference instead of two).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever 494ae30d2a xprtrdma: Remove rr_func
A posted rpcrdma_rep never has rr_func set to anything but
rpcrdma_reply_handler.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever fed171b35c xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
Clean up: Instead of carrying a pointer to the buffer pool and
the rpc_xprt, carry a pointer to the controlling rpcrdma_xprt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Chuck Lever 6d44698d54 xprtrdma: Warn when there are orphaned IB objects
WARN during transport destruction if ib_dealloc_pd() fails. This is
a sign that xprtrdma orphaned one or more RDMA API objects at some
point, which can pin lower layer kernel modules and cause shutdown
to hang.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-12 13:10:36 -04:00
Florian Westphal 482cfc3185 netfilter: xtables: avoid percpu ruleset duplication
We store the rule blob per (possible) cpu.  Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.

Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.

On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.

Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:27:10 +02:00
Florian Westphal 71ae0dff02 netfilter: xtables: use percpu rule counters
The binary arp/ip/ip6tables ruleset is stored per cpu.

The only reason left as to why we need percpu duplication are the rule
counters embedded into ipt_entry et al -- since each cpu has its own copy
of the rules, all counters can be lockless.

The downside is that the more cpus are supported, the more memory is
required.  Rules are not just duplicated per online cpu but for each
possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
not for the e.g. 64 cores present.

To save some memory and also improve utilization of shared caches it
would be preferable to only store the rule blob once.

So we first need to separate counters and the rule blob.

Instead of using entry->counters, allocate this percpu and store the
percpu address in entry->counters.pcnt on CONFIG_SMP.

This change makes no sense as-is; it is merely an intermediate step to
remove the percpu duplication of the rule set in a followup patch.

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:27:09 +02:00
Florian Westphal d7b5974215 netfilter: bridge: restore vlan tag when refragmenting
If bridge netfilter is used with both
bridge-nf-call-iptables and bridge-nf-filter-vlan-tagged enabled
then ip fragments in VLAN frames are sent without the vlan header.

This has never worked reliably.  Turns out this relied on pre-3.5
behaviour where skb frag_list was used to store ip fragments;
ip_fragment() then re-used these skbs.

But since commit 3cc4949269
("ipv4: use skb coalescing in defragmentation") this is no longer
the case.  ip_do_fragment now needs to allocate new skbs, but these
don't contain the vlan tag information anymore.

Fix it by storing vlan information of the ressembled skb in the
br netfilter percpu frag area, and restore them for each of the
fragments.

Fixes: 3cc4949269 ("ipv4: use skb coalescing in defragmentation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:16:55 +02:00
Florian Westphal 33b1f31392 net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling
since commit d6b915e29f
("ip_fragment: don't forward defragmented DF packet") the largest
fragment size is available in the IPCB.

Therefore we no longer need to care about 'encapsulation'
overhead of stripped PPPOE/VLAN headers since ip_do_fragment
doesn't use device mtu in such cases.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:16:46 +02:00
Bernhard Thaler efb6de9b4b netfilter: bridge: forward IPv6 fragmented packets
IPv6 fragmented packets are not forwarded on an ethernet bridge
with netfilter ip6_tables loaded. e.g. steps to reproduce

1) create a simple bridge like this

        modprobe br_netfilter
        brctl addbr br0
        brctl addif br0 eth0
        brctl addif br0 eth2
        ifconfig eth0 up
        ifconfig eth2 up
        ifconfig br0 up

2) place a host with an IPv6 address on each side of the bridge

        set IPv6 address on host A:
        ip -6 addr add fd01:2345:6789:1::1/64 dev eth0

        set IPv6 address on host B:
        ip -6 addr add fd01:2345:6789:1::2/64 dev eth0

3) run a simple ping command on host A with packets > MTU

        ping6 -s 4000 fd01:2345:6789:1::2

4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge

IPv6 fragmented packets traverse the bridge cleanly until somebody runs.
"ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are
loaded) IPv6 fragmented packets do not traverse the bridge any more (you
see no more responses in ping's output).

After applying this patch IPv6 fragmented packets traverse the bridge
cleanly in above scenario.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
[pablo@netfilter.org: small changes to br_nf_dev_queue_xmit]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:10:12 +02:00
Bernhard Thaler a4611d3b74 netfilter: bridge: re-order check_hbh_len()
Prepare check_hbh_len() to be called from newly introduced
br_validate_ipv6() in next commit.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:09:46 +02:00
Bernhard Thaler 77d574e728 netfilter: bridge: rename br_parse_ip_options
br_parse_ip_options() does not parse any IP options, it validates IP
packets as a whole and the function name is misleading.

Rename br_parse_ip_options() to br_validate_ipv4() and remove unneeded
commments.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:09:17 +02:00
Bernhard Thaler 411ffb4fde netfilter: bridge: refactor frag_max_size
Currently frag_max_size is member of br_input_skb_cb and copied back and
forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or
used.

Attach frag_max_size to nf_bridge_info and set value in pre_routing and
forward functions. Use its value in forward and xmit functions.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:08:51 +02:00
Bernhard Thaler 72b31f7271 netfilter: bridge: detect NAT66 correctly and change MAC address
IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge.

e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-iptables=1
$ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
  -j REDIRECT --to-ports 81

This does not work with ip6tables on a bridge in NAT66 scenario
because the REDIRECT/DNAT/SNAT is not correctly detected.

The bridge pre-routing (finish) netfilter hook has to check for a possible
redirect and then fix the destination mac address. This allows to use the
ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4
iptables version.

e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-ip6tables=1
$ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
  -j REDIRECT --to-ports 81

This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested
on a bridge with two interfaces using SNAT/DNAT NAT66 rules.

Reported-by: Artie Hamilton <artiemhamilton@yahoo.com>
Signed-off-by: Sven Eckelmann <sven@open-mesh.com>
[bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()]
[bernhard.thaler@wvnet.at: rebased, split into separate patches]
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:08:07 +02:00
Bernhard Thaler 8cae308d2b netfilter: bridge: re-order br_nf_pre_routing_finish_ipv6()
Put br_nf_pre_routing_finish_ipv6() after daddr_was_changed() and
br_nf_pre_routing_finish_bridge() to prepare calling these functions
from there.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:07:56 +02:00
Bernhard Thaler d39a33ed9b netfilter: bridge: refactor clearing BRNF_NF_BRIDGE_PREROUTING
use binary AND on complement of BRNF_NF_BRIDGE_PREROUTING to unset
bit in nf_bridge->mask.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:07:53 +02:00
Marcelo Ricardo Leitner 779668450a netfilter: conntrack: warn the user if there is a better helper to use
After db29a9508a ("netfilter: conntrack: disable generic tracking for
known protocols"), if the specific helper is built but not loaded
(a standard for most distributions) systems with a restrictive firewall
but weak configuration regarding netfilter modules to load, will
silently stop working.

This patch then puts a warning message so the sysadmin knows where to
start looking into. It's a pr_warn_once regardless of protocol itself
but it should be enough to give a hint on where to look.

Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12 14:06:24 +02:00
Johan Hedberg 5d667ef6e0 Bluetooth: Remove redundant check for ACL_LINK
The encryption key size is read only for BR/EDR (ACL_LINK) connections
so there's no need to check for it in the read_enc_key_size_complete()
callback.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 12:07:20 +02:00
Johan Hedberg e3f6a257a7 Bluetooth: Use actual encryption key size for SMP over BR/EDR
When pairing over SMP over BR/EDR the generated LTK has by default the
same key size as the BR/EDR Link Key. Make sure we don't set our
Pairing Request/Response max value to anything higher than that.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Johan Hedberg 821f376668 Bluetooth: Read encryption key size for BR/EDR connections
Since Bluetooth 3.0 there's a HCI command available for reading the
encryption key size of an BR/EDR connection. This information is
essential e.g. for generating an LTK using SMP over BR/EDR, so store
it as part of struct hci_conn.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Johan Hedberg 035ad621b6 Bluetooth: Move SC-only check outside of BT_CONFIG branch
Checking for SC-only mode requirements when we get an encrypt change
event shouldn't be limited to the BT_CONFIG state but done any time
encryption changes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Johan Hedberg b1f663c91c Bluetooth: Add debugfs support for min LE encryption key size
This patch adds a debugfs control to set a different minimum LE
encryption key size. This is useful for testing that implementation of
the encryption key size handling is behaving correctly (e.g. that we
get appropriate 'Encryption Key Size' error responses when necessary).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Johan Hedberg 2fd36558f0 Bluetooth: Add debugfs support for max LE encryption key size
This patch adds a debugfs control to set a different maximum LE
encryption key size. This is useful for testing that implementation of
the encryption key size handling is behaving correctly.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-12 11:38:45 +02:00
Shaohua Li fb05e7a89f net: don't wait for order-3 page allocation
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f76858
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 17:33:44 -07:00
Robert Shearman 0fae3bf018 mpls: handle device renames for per-device sysctls
If a device is renamed and the original name is subsequently reused
for a new device, the following warning is generated:

sysctl duplicate entry: /net/mpls/conf/veth0//input
CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
 0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
 ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
 ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
Call Trace:
 [<ffffffff81566aaf>] ? dump_stack+0x40/0x50
 [<ffffffff81236279>] ? __register_sysctl_table+0x289/0x5a0
 [<ffffffffa051a24f>] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
 [<ffffffff8108db7f>] ? notifier_call_chain+0x4f/0x70
 [<ffffffff81470e72>] ? register_netdevice+0x2b2/0x480
 [<ffffffffa0524748>] ? veth_newlink+0x178/0x2d3 [veth]
 [<ffffffff8147f84c>] ? rtnl_newlink+0x73c/0x8e0
 [<ffffffff8147f27a>] ? rtnl_newlink+0x16a/0x8e0
 [<ffffffff81459ff2>] ? __kmalloc_reserve.isra.30+0x32/0x90
 [<ffffffff8147ccfd>] ? rtnetlink_rcv_msg+0x8d/0x250
 [<ffffffff8145b027>] ? __alloc_skb+0x47/0x1f0
 [<ffffffff8149badb>] ? __netlink_lookup+0xab/0xe0
 [<ffffffff8147cc70>] ? rtnetlink_rcv+0x30/0x30
 [<ffffffff8149e7a0>] ? netlink_rcv_skb+0xb0/0xd0
 [<ffffffff8147cc64>] ? rtnetlink_rcv+0x24/0x30
 [<ffffffff8149df17>] ? netlink_unicast+0x107/0x1a0
 [<ffffffff8149e4be>] ? netlink_sendmsg+0x50e/0x630
 [<ffffffff8145209c>] ? sock_sendmsg+0x3c/0x50
 [<ffffffff81452beb>] ? ___sys_sendmsg+0x27b/0x290
 [<ffffffff811bd258>] ? mem_cgroup_try_charge+0x88/0x110
 [<ffffffff811bd5b6>] ? mem_cgroup_commit_charge+0x56/0xa0
 [<ffffffff811d7700>] ? do_filp_open+0x30/0xa0
 [<ffffffff8145336e>] ? __sys_sendmsg+0x3e/0x80
 [<ffffffff8156c3f2>] ? system_call_fastpath+0x16/0x75

Fix this by unregistering the previous sysctl table (registered for
the path containing the original device name) and re-registering the
table for the path containing the new device name.

Fixes: 37bde79979 ("mpls: Per-device enabling of packet input")
Reported-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:47:16 -07:00
Eric Dumazet b5e2c45783 tcp: remove obsolete check in tcp_set_skb_tso_segs()
We had various issues in the past when TCP stack was modifying
gso_size/gso_segs while clones were in flight.

Commit c52e2421f7 ("tcp: must unclone packets before mangling them")
fixed these bugs and added a WARN_ON_ONCE(skb_cloned(skb)); in
tcp_set_skb_tso_segs()

These bugs are now fixed, and because TCP stack now only sets
shinfo->gso_size|segs on the clone itself, the check can be removed.

As a result of this change, compiler inlines tcp_set_skb_tso_segs() in
tcp_init_tso_segs()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:11 -07:00
Eric Dumazet f69ad292cf tcp: fill shinfo->gso_size at last moment
In commit cd7d8498c9 ("tcp: change tcp_skb_pcount() location") we stored
gso_segs in a temporary cache hot location.

This patch does the same for gso_size.

This allows to save 2 cache line misses in tcp xmit path for
the last packet that is considered but not sent because of
various conditions (cwnd, tso defer, receiver window, TSQ...)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:11 -07:00
Eric Dumazet 5bbb432c89 tcp: tcp_set_skb_tso_segs() no longer need struct sock parameter
tcp_set_skb_tso_segs() & tcp_init_tso_segs() no longer
use the sock pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:11 -07:00
Eric Dumazet 51466a7545 tcp: fill shinfo->gso_type at last moment
Our goal is to touch skb_shinfo(skb) only when absolutely needed,
to avoid two cache line misses in TCP output path for last skb
that is considered but not sent because of various conditions
(cwnd, tso defer, receiver window, TSQ...)

A packet is GSO only when skb_shinfo(skb)->gso_size is not zero.

We can set skb_shinfo(skb)->gso_type to sk->sk_gso_type even for
non GSO packets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:11 -07:00
Eric Dumazet a7eea416cb tcp: reserve tcp_skb_mss() to tcp stack
tcp_gso_segment() and tcp_gro_receive() are not strictly
part of TCP stack. They should not assume tcp_skb_mss(skb)
is in fact skb_shinfo(skb)->gso_size.

This will allow us to change tcp_skb_mss() in following patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:33:10 -07:00
Scott Feldman 57225e7720 switchdev: fix BUG when port driver doesn't support set attr op
Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a
bridged port does not support switchdev_port_attr_set op.  Don't BUG_ON()
if -EOPNOTSUPP is returned.

Also change BUG_ON() to netdev_err since this is a normal error path and
does not warrant the use of BUG_ON(), which is reserved for unrecoverable
errs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 16:27:09 -07:00
Vincent Cuissard e097dc624f NFC: nfcmrvl: add UART driver
Add support of Marvell NFC chip controlled over UART

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-11 23:41:57 +02:00
Vincent Cuissard 9961127d4b NFC: nci: add generic uart support
Some NFC controller supports UART as host interface.
As with SPI, a lot of code can be shared between vendor
drivers. This patch add the generic support of UART and
provides some extension API for vendor specific needs.

This code is strongly inspired by the Bluetooth HCI ldisc
implementation. NCI UART vendor drivers will have to register
themselves to this layer via nci_uart_register.

Underlying tty will have to be configured from user land
thanks to an ioctl.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-11 23:37:37 +02:00
Chuck Lever 5fd23f7e1d SUNRPC: Address kbuild warning in net/sunrpc/debugfs.c
Cross-compile test on ARCH=mn10300:

In file included from include/linux/list.h:8:0,
                 from include/linux/wait.h:6,
                 from include/linux/fs.h:6,
                 from include/linux/debugfs.h:18,
                 from net/sunrpc/debugfs.c:7:
net/sunrpc/debugfs.c: In function 'fault_disconnect_write':
include/linux/kernel.h:723:17: warning: comparison of distinct pointer
types lacks a cast
    (void) (&_min1 == &_min2);  \
                   ^
>> net/sunrpc/debugfs.c:307:8: note: in expansion of macro 'min'
    len = min(len, sizeof(buffer) - 1);

Fixes: ('SUNRPC: Transport fault injection')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-11 14:01:06 -04:00
Hadar Hen Zion a4244b0cf5 net/ethtool: Add current supported tunable options
Add strings array of the current supported tunable options.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 00:36:37 -07:00
Kenneth Klette Jonassen 2b0a8c9eee tcp: add CDG congestion control
CAIA Delay-Gradient (CDG) is a TCP congestion control that modifies
the TCP sender in order to [1]:

  o Use the delay gradient as a congestion signal.
  o Back off with an average probability that is independent of the RTT.
  o Coexist with flows that use loss-based congestion control, i.e.,
    flows that are unresponsive to the delay signal.
  o Tolerate packet loss unrelated to congestion. (Disabled by default.)

Its FreeBSD implementation was presented for the ICCRG in July 2012;
slides are available at http://www.ietf.org/proceedings/84/iccrg.html

Running the experiment scenarios in [1] suggests that our implementation
achieves more goodput compared with FreeBSD 10.0 senders, although it also
causes more queueing delay for a given backoff factor.

The loss tolerance heuristic is disabled by default due to safety concerns
for its use in the Internet [2, p. 45-46].

We use a variant of the Hybrid Slow start algorithm in tcp_cubic to reduce
the probability of slow start overshoot.

[1] D.A. Hayes and G. Armitage. "Revisiting TCP congestion control using
    delay gradients." In Networking 2011, pages 328-341. Springer, 2011.
[2] K.K. Jonassen. "Implementing CAIA Delay-Gradient in Linux."
    MSc thesis. Department of Informatics, University of Oslo, 2015.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Hayes <davihay@ifi.uio.no>
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 00:09:12 -07:00
Kenneth Klette Jonassen 7782ad8bb5 tcp: export tcp_enter_cwr()
Upcoming tcp_cdg uses tcp_enter_cwr() to initiate PRR. Export this
function so that CDG can be compiled as a module.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Hayes <davihay@ifi.uio.no>
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 00:09:12 -07:00
Scott Feldman af201f723f switchdev: fix handling for drivers not supporting IPv4 fib add/del ops
If CONFIG_NET_SWITCHDEV is enabled, but port driver does not implement
support for IPv4 FIB add/del ops, don't fail route add/del offload
operations.  Route adds will not be marked as OFFLOAD.  Routes will be
installed in the kernel FIB, as usual.

This was report/fixed by Florian when testing DSA driver with net-next on
devices with L2 offload support but no L3 offload support. What he reported
was an initial route installed from DHCP client would fail (route not
installed to kernel FIB).  This was triggering the setting of
ipv4.fib_offload_disabled, which would disable route offloading after the
first failure.  So subsequent attempts to install the route would succeed.

There is follow-on work/discussion to address the handling of route install
failures, but for now, let's differentiate between no support and failed
support.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 23:56:34 -07:00
Eric Dumazet f9c2ff22bb net: tcp: dctcp_update_alpha() fixes.
dctcp_alpha can be read by from dctcp_get_info() without
synchro, so use WRITE_ONCE() to prevent compiler from using
dctcp_alpha as a temporary variable.

Also, playing with small dctcp_shift_g (like 1), can expose
an overflow with 32bit values shifted 9 times before divide.

Use an u64 field to avoid this problem, and perform the divide
only if acked_bytes_ecn is not zero.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 23:28:33 -07:00
Mel Gorman 5d75361027 net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap
Jeff Layton reported the following;

 [   74.232485] ------------[ cut here ]------------
 [   74.233354] WARNING: CPU: 2 PID: 754 at net/core/sock.c:364 sk_clear_memalloc+0x51/0x80()
 [   74.234790] Modules linked in: cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device nfsd snd_pcm snd_timer snd e1000 ppdev parport_pc joydev parport pvpanic soundcore floppy serio_raw i2c_piix4 pcspkr nfs_acl lockd virtio_balloon acpi_cpufreq auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm virtio_console virtio_blk virtio_pci ata_generic virtio_ring pata_acpi virtio
 [   74.243599] CPU: 2 PID: 754 Comm: swapoff Not tainted 4.1.0-rc6+ #5
 [   74.244635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [   74.245546]  0000000000000000 0000000079e69e31 ffff8800d066bde8 ffffffff8179263d
 [   74.246786]  0000000000000000 0000000000000000 ffff8800d066be28 ffffffff8109e6fa
 [   74.248175]  0000000000000000 ffff880118d48000 ffff8800d58f5c08 ffff880036e380a8
 [   74.249483] Call Trace:
 [   74.249872]  [<ffffffff8179263d>] dump_stack+0x45/0x57
 [   74.250703]  [<ffffffff8109e6fa>] warn_slowpath_common+0x8a/0xc0
 [   74.251655]  [<ffffffff8109e82a>] warn_slowpath_null+0x1a/0x20
 [   74.252585]  [<ffffffff81661241>] sk_clear_memalloc+0x51/0x80
 [   74.253519]  [<ffffffffa0116c72>] xs_disable_swap+0x42/0x80 [sunrpc]
 [   74.254537]  [<ffffffffa01109de>] rpc_clnt_swap_deactivate+0x7e/0xc0 [sunrpc]
 [   74.255610]  [<ffffffffa03e4fd7>] nfs_swap_deactivate+0x27/0x30 [nfs]
 [   74.256582]  [<ffffffff811e99d4>] destroy_swap_extents+0x74/0x80
 [   74.257496]  [<ffffffff811ecb52>] SyS_swapoff+0x222/0x5c0
 [   74.258318]  [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140
 [   74.259253]  [<ffffffff81798dae>] system_call_fastpath+0x12/0x71
 [   74.260158] ---[ end trace 2530722966429f10 ]---

The warning in question was unnecessary but with Jeff's series the rules
are also clearer.  This patch removes the warning and updates the comment
to explain why sk_mem_reclaim() may still be called.

[jlayton: remove if (sk->sk_forward_alloc) conditional. As Leon
          points out that it's not needed.]

Cc: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 23:02:31 -07:00
David S. Miller 1edaa7e8a7 For this round we mostly have fixes:
* mesh fixes from Alexis Green and Chun-Yeow Yeoh,
  * a documentation fix from Jakub Kicinski,
  * a missing channel release (from Michal Kazior),
  * a fix for a signal strength reporting bug (from Sara Sharon),
  * handle deauth while associating (myself),
  * don't report mangled TX SKB back to userspace for status (myself),
  * handle aggregation session timeouts properly in fast-xmit (myself)
 
 However, there are also a few cleanups and one big change that
 affects all drivers (and that required me to pull in your tree)
 to change the mac80211 HW flags to use an unsigned long bitmap
 so that we can extend them more easily - we're running out of
 flags even with a cleanup to remove the two unused ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVeEQ1AAoJEDBSmw7B7bqrebcP/3v7I2ZXAeHag2W4hdD4YH6W
 tuKfs3JKW3GDh84l2AJs2JBpFxR6Tk0Z7zGKrPLzkBTkkJkSLgKuUKR0+YQU6PYH
 VfZ2NkdIHEqouLgMWxGGlp6suqp2yYD9tiIUroICXZ6aFm5trQuZgzv5ePI+lhmX
 cWUYCawE2tcpVdg0NJsFExeCJhw81e/Bet1LCGHo0asWNpIK7phMdltzD7e4tgQS
 4q475FCIkWxbxKgJRrRkz8J7grsjK1wf2W3acOxKMaoVBeqJVW5BWDrTgo0aDPts
 qQ8n8t1s9o/jKQIvaz3RyjkQgX8T4vCMqkouLF4jJOThKIsUSi3Fvm9oKcMg4YhA
 Ju5QWfbCBFhpLZeBzWzKyePTnDru1XDFFVdIATLONKTVg1modzFAs3j5gb4Z3Wtg
 VYLoLWWpRtHKd9pzfZMhyWq64Xb8C+qlyQHr4r4QRm9ADz0Jq+OCh0rTFt+/bncM
 CHxnf0VS9hEOFk0+TxFqi2yXOnv2uMgcN+jnGkEs4QuLfv9ML1Eb23ZjDoHxd1uq
 1Yd4R8IDEY/KU6UJMwksz+gV/ekoB32eAhw56pxehgAMuZL4OgNvmeAQHx7Jq9it
 0/OfAK2BSNH8odqYQbpg89C8keqSInMwUhFyRhyMJAWSKiPRHypsDBWxMKGJIssI
 3mB4d/go+RP1AvZnazeF
 =2wTw
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For this round we mostly have fixes:
 * mesh fixes from Alexis Green and Chun-Yeow Yeoh,
 * a documentation fix from Jakub Kicinski,
 * a missing channel release (from Michal Kazior),
 * a fix for a signal strength reporting bug (from Sara Sharon),
 * handle deauth while associating (myself),
 * don't report mangled TX SKB back to userspace for status (myself),
 * handle aggregation session timeouts properly in fast-xmit (myself)

However, there are also a few cleanups and one big change that
affects all drivers (and that required me to pull in your tree)
to change the mac80211 HW flags to use an unsigned long bitmap
so that we can extend them more easily - we're running out of
flags even with a cleanup to remove the two unused ones.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:49:49 -07:00
Stephen Smalley 37a9a8df8c net/unix: support SCM_SECURITY for stream sockets
SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets.  However, SCM_CREDENTIALS is supported on
Unix stream sockets.  For consistency, implement Unix stream support
for SCM_SECURITY as well.  Also clean up the existing code and get
rid of the superfluous UNIXSID macro.

Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:49:20 -07:00
Nikolay Aleksandrov 1a040eaca1 bridge: fix multicast router rlist endless loop
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758 ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:07:50 -07:00
Erik Hugne b3be5e3e72 tipc: disconnect socket directly after probe failure
If the TIPC connection timer expires in a probing state, a
self abort message is supposed to be generated and delivered
to the local socket. This is currently broken, and the abort
message is actually sent out to the peer node with invalid
addressing information. This will cause the link to enter
a constant retransmission state and eventually reset.
We fix this by removing the self-abort message creation and
tear down connection immediately instead.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 22:05:20 -07:00
Nikolay Aleksandrov 8c86f967dd bridge: make br_fdb_delete also check if the port matches
Before this patch the user-specified bridge port was ignored when
deleting an fdb entry and thus one could delete an entry that belonged
to any port.
Example (eth0 and eth1 are br0 ports):
bridge fdb add 00:11:22:33:44:55 dev eth0 master
bridge fdb del 00:11:22:33:44:55 dev eth1 master
(succeeds)

after the patch:
bridge fdb add 00:11:22:33:44:55 dev eth0 master
bridge fdb del 00:11:22:33:44:55 dev eth1 master
RTNETLINK answers: No such file or directory

Based on a patch by Wilson Kok.

Reported-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 21:58:13 -07:00
David S. Miller 43559893be linux-can-next-for-4.2-20150609
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCgAGBQJVdphLAAoJEP5prqPJtc/HZR0H/j7+UFxotHs+ummTanP3Rbqi
 lLBgjzXQDvQeEYhCHzUM4s0dKOQp3arHnO538hB5bONijxoNJkHBkz0urbspLR/f
 i9BZRdLd8eiCrXYo4zXM1dZ4xA/xLKX/4sZU/r44wZlcQeLR0VwerZWWd3Zl9hi5
 0RGWvENz2FajCxBHaf+TFbr6AVPg7ikWXRpk3s5//adfGYzZMLTpIecrOI3OdtB2
 JYrX1FZW2NudtzNfC5vR7g8x4esGzTL7TupdNP0SpxYUTfAkEbLrG+VCbswkYVCs
 TL7BElbz5JrfUweWjj/6LJKPFUlCHW3TygBbmHiIl7htyJPllCIpF2gDI+uhSYw=
 =JtLU
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.2-20150609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-05-06

this is a pull request of a two patches for net-next.

The first patch is by Tomas Krcka, he fixes the (currently unused)
register address for acceptance filters. Oliver Hartkopp contributes a
patch for the cangw, where an optional UID is added to reference
routing jobs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 21:56:25 -07:00
Alexey Dobriyan 835a6a2f86 Bluetooth: Stop sabotaging list poisoning
list_del() poisons pointers with special values, no need to overwrite them.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-11 01:22:54 +02:00
Chuck Lever 4a06825839 SUNRPC: Transport fault injection
It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss.  To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.

Fault injection is disabled by default. It is enabled with

  $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.

These hooks are disabled when SUNRPC_DEBUG is turned off.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:37:26 -04:00
David S. Miller 1b0ccfe54a Revert "ipv6: Fix protocol resubmission"
This reverts commit 0243508edd.

It introduces new regressions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 15:29:31 -07:00
Jeff Layton d67fa4d85a sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:26 -04:00
Jeff Layton d6e971d8ec sunrpc: lock xprt before trying to set memalloc on the sockets
It's possible that we could race with a call to xs_reset_transport, in
which case the xprt->inet pointer could be zeroed out while we're
accessing it. Lock the xprt before we try to set memalloc on it.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:24 -04:00
Jeff Layton 264d1df3b3 sunrpc: if we're closing down a socket, clear memalloc on it first
We currently increment the memalloc_socks counter if we have a xprt that
is associated with a swapfile. That socket can be replaced however
during a reconnect event, and the memalloc_socks counter is never
decremented if that occurs.

When tearing down a xprt socket, check to see if the xprt is set up for
swapping and sk_clear_memalloc before releasing the socket if so.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:22 -04:00
Jeff Layton 8e2281330f sunrpc: make xprt->swapper an atomic_t
Split xs_swapper into enable/disable functions and eliminate the
"enable" flag.

Currently, it's racy if you have multiple swapon/swapoff operations
running in parallel over the same xprt. Also fix it so that we only
set it to a memalloc socket on a 0->1 transition and only clear it
on a 1->0 transition.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:18 -04:00
Jeff Layton 3c87ef6efb sunrpc: keep a count of swapfiles associated with the rpc_clnt
Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.

To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.

Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.

Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.

This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.

Acked-by: Mel Gorman <mgorman@suse.de>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:14 -04:00
Johannes Berg 30686bf7f5 mac80211: convert HW flags to unsigned long bitmap
As we're running out of hardware capability flags pretty quickly,
convert them to use the regular test_bit() style unsigned long
bitmaps.

This introduces a number of helper functions/macros to set and to
test the bits, along with new debugfs code.

The occurrences of an explicit __clear_bit() are intentional, the
drivers were never supposed to change their supported bits on the
fly. We should investigate changing this to be a per-frame flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 16:05:36 +02:00
Samuel Ortiz 2df7f8c695 NFC: nci: Export nci_req_complete
Drivers implementing proprietary ops may need it now.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-10 12:50:22 +02:00
Johannes Berg 206c59d1d7 Merge remote-tracking branch 'net-next/master' into mac80211-next
Merge back net-next to get wireless driver changes (from Kalle)
to be able to create the API change across all trees properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 12:45:09 +02:00
Alexis Green 5ec596c41b mac80211: Fix a case of incorrect metric used when forwarding a PREQ
This patch fixes a bug in hwmp_preq_frame_process where the wrong metric
can be used when forwarding a PREQ. This happens because the code uses
the same metric variable to record the value of the metric to the source
of the PREQ and the value of the metric to the target of the PREQ.

This comes into play when both reply and forward are set which happens
when IEEE80211_PREQ_PROACTIVE_PREP_FLAG is set and when MP_F_DO | MP_F_RF
is set. The original code had a special case to handle the first case
but not the second.

The patch uses distinct variables for the two metrics which makes the
code flow much clearer and removes the need to restore the original
value of metric when forwarding.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 11:52:59 +02:00
Johan Hedberg 1fc62c526a Bluetooth: Fix exposing full value of shortened LTKs
When we notify user space of a new LTK or distribute an LTK to the
remote peer the value passed should be the shortened version so that
it's easy to compare values in various traces. The core spec also sets
the requirements for the shortening/masking as:

"The masking shall be done after generation and before being
distributed, used or stored."

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-10 10:50:06 +02:00
David S. Miller c3eee1fb1d Included changes:
- use common Jenkins hash instead of private implementation
 - extend internal routing API
 - properly re-arrange header files inclusion
 - clarify precedence between '&' and '?'
 - remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
 - ensure per-VLAN structs are updated upon MAC change
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVdaxwAAoJEOb/4TMchkvffJsQAK33RTmcnFYVpS09AywTbxfC
 kNo5SAbQJ1MsNui8TmyHsnLBW/l59WnP8CITVYDlwLyW4cEKWVooGsFPv+MCityM
 3vOqghdi7jC1b0LT6kTcoSr7P04q+bBV9atj5Z5nnXUpvjmW0b/a6qCIZmqGKTtQ
 WIIayEs0bVPJ6y1Z6CDpFQY6k9j938BaH3VgP7NL4zXRA3OJGGQKaZTRwVuYnVu4
 Swt/zj+B7lutBSP2d4ngjU2CXS35OzDWl4GybQuIiNzdSXsZwm4inTyuNjA4Eqdg
 vE87ZqbFqD4T2PqR1wcTPKMJegd+nPQrYDdeNfz91KdjVKBN+LrdSCX3IozEynpY
 7584+eROCyCKlQGi7OcRze5THbZ+BNKxUd3Ds4ek2NTU7vG2rMWADFj3wZ0yzX6F
 e/JunQ39BHXPSc6UlaTgdYiVpTNME9+A2I0AXMOpU6urpc4fAyFVGUORJqjM8BBm
 jLZ8QakmXdfgBYi7zqefeCW9de9L1ekGQs/XB1AmKmx5xY7IKzKEZbkUV3nAxtqI
 YKP2yQUhqjoke6bzllz72CPChO4O11fYPCDPL0sarn/UxY8SzwmHeKi8UeGaN6gt
 FZhR17kQGm4c2WF7082HEIrOW137YogEOFWMixqpqix5ICBmmRMPqo8wqQBxllk3
 mC+4jiwdn4kT1EVAqHba
 =u+G3
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- use common Jenkins hash instead of private implementation
- extend internal routing API
- properly re-arrange header files inclusion
- clarify precedence between '&' and '?'
- remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
- ensure per-VLAN structs are updated upon MAC change
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-09 20:23:52 -07:00
Johannes Berg 9c5a18a31b cfg80211: wext: clear sinfo struct before calling driver
Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.

Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.

Fix this by initializing the structure in question before the
driver method call.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691

Cc: stable@vger.kernel.org
Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-09 13:54:58 -07:00
Alexis Green e060e7adc2 mac80211: Always check rates and capabilities in mesh mode
In mesh mode there is a race between establishing links and processing
rates and capabilities in beacons. This is very noticeable with slow
beacons (e.g. beacon intervals of 1s) and manifested for us as stations
using minstrel when minstrel_ht should be used. Fixed by changing
mesh_sta_info_init so that it always checks rates and such if it has not
already done so.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 22:05:06 +02:00
Chun-Yeow Yeoh 8df734e865 mac80211: fix the beacon csa counter for mesh and ibss
The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.

Fixes: af296bdb8d ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 22:04:25 +02:00
Alexis Green afd2efb919 mac80211: Fix incorrectly named last_hop_metric variable in mesh_rx_path_sel_frame
The last hop metric should refer to link cost (this is how
hwmp_route_info_get uses it for example). But in mesh_rx_path_sel_frame
we are not dealing with link cost but with the total cost to the origin
of a PREQ or PREP.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:53:35 +02:00
Sara Sharon 74d803b602 mac80211: ignore invalid scan RSSI values
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).

The scan result ignores those invalid values and the
station last signal should not be updated as well.

In case the scan determines the signal to be invalid turn on
the flag so the station last signal will not be updated with
the value and thus user space probing for NL80211_STA_INFO_SIGNAL
and NL80211_STA_INFO_SIGNAL_AVG will not get this invalid RSSI
value.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:51:59 +02:00
Michal Kazior d01f858c78 mac80211: release channel on auth failure
There were a few rare cases when upon
authentication failure channel wasn't released.
This could cause stale pointers to remain in
chanctx assigned_vifs after interface removal and
trigger general protection fault later.

This could be triggered, e.g. on ath10k with the
following steps:

 1. start an AP
 2. create 2 extra vifs on ath10k host
 3. connect vif1 to the AP
 4. connect vif2 to the AP
    (auth fails because ath10k firmware isn't able
     to maintain 2 peers with colliding AP mac
     addresses across vifs and consequently
     refuses sta_info_insert() in
     ieee80211_prep_connection())
 5. remove the 2 extra vifs
 6. goto step 2; at step 3 kernel was crashing:

 general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
 Modules linked in: ath10k_pci ath10k_core ath
 ...
 Call Trace:
  [<ffffffff81a2dabb>] ieee80211_check_combinations+0x22b/0x290
  [<ffffffff819fb825>] ? ieee80211_check_concurrent_iface+0x125/0x220
  [<ffffffff8180f664>] ? netpoll_poll_disable+0x84/0x100
  [<ffffffff819fb833>] ieee80211_check_concurrent_iface+0x133/0x220
  [<ffffffff81a0029e>] ieee80211_open+0x3e/0x80
  [<ffffffff817f2d26>] __dev_open+0xb6/0x130
  [<ffffffff817f3051>] __dev_change_flags+0xa1/0x170
 ...
 RIP  [<ffffffff81a23140>] ieee80211_chanctx_radar_detect+0xa0/0x170

 (gdb) l * ieee80211_chanctx_radar_detect+0xa0
 0xffffffff81a23140 is in ieee80211_chanctx_radar_detect (/devel/src/linux/net/mac80211/util.c:3182).
 3177             */
 3178            WARN_ON(ctx->replace_state == IEEE80211_CHANCTX_REPLACES_OTHER &&
 3179                    !list_empty(&ctx->assigned_vifs));
 3180
 3181            list_for_each_entry(sdata, &ctx->assigned_vifs, assigned_chanctx_list)
 3182                    if (sdata->radar_required)
 3183                            radar_detect |= BIT(sdata->vif.bss_conf.chandef.width);
 3184
 3185            return radar_detect;

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:51:01 +02:00
Johannes Berg 472be00d04 mac80211: handle aggregation session timeout on fast-xmit path
The conversion to the fast-xmit path lost proper aggregation session
timeout handling - the last_tx wasn't set on that path and the timer
would therefore incorrectly tear down the session periodically (with
those drivers/rate control algorithms that have a timeout.)

In case of iwlwifi, this was every 5 seconds and caused significant
throughput degradation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:38:40 +02:00
Arron Wang ff50e8afc5 Bluetooth: Move SCO support under BT_BREDR config option
SCO/eSCO link is supported by BR/EDR controller, it is
suitable to move them under BT_BREDR config option

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 13:41:36 +02:00
Arron Wang 9b4c33364e Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return void
The return value of l2cap_recv_acldata() and sco_recv_scodata()
are not used, then change it to return void

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 13:41:36 +02:00
Loic Poulain 867146a0d2 Bluetooth: Don't call shutdown when leaving user channel
Don't interfere with the user channel exclusive access.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 11:47:25 +02:00
Stefan Schmidt 9a4d3d4ba1 mac802154/iface: remove superfluous WARN_ON call in slave_open()
This call was used before we aligned our code with the wireless code base. We
are wanted to handle this in the err: code path. Which would actually not work
because the WARN_ON() macro would reset the res value to 0 and thus we would
never hit err:. Removing it makes the code do what we actually intend.

Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 09:44:23 +02:00
Oliver Hartkopp dd895d7f21 can: cangw: introduce optional uid to reference created routing jobs
Similar to referencing iptables rules by their line number this UID allows to
reference created routing jobs, e.g. to alter configured data modifications.

The UID is an optional non-zero value which can be provided at routing job
creation time. When the UID is set the UID replaces the data modification
configuration as job identification attribute e.g. at job removal time.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-06-09 09:39:49 +02:00
Johan Hedberg 8b76ce34c4 Bluetooth: Fix encryption key size handling for LTKs
The encryption key size for LTKs is supposed to be applied only at the
moment of encryption. When generating a Link Key (using LE SC) from
the LTK the full non-shortened value should be used. This patch
modifies the code to always keep the full value around and only apply
the key size when passing the value to HCI.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-09 09:09:06 +02:00
David S. Miller 941742f497 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-08 20:06:56 -07:00
Samuel Ortiz 9e58095f96 NFC: netlink: Implement vendor command support
Vendor commands are passed from userspace through the
NFC_CMD_VENDOR netlink command, allowing driver and hardware
specific operations implementations like for example RF tuning
or production line calibration.

Drivers will associate a set of vendor commands to a vendor
id, which could typically be an OUI. The netlink kernel
implementation will try to match the received vendor id
and sub command attributes with the registered ones. When
such match is found, the driver defined sub command routine
is called.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 01:21:35 +02:00
Christophe Ricard 0e70cba71f NFC: nci: Move close ops call in nci_close_device
When closing the device some data (proprietary commands)
might be sent. The core state machine needs to be set for
correct command execution.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:24 +02:00
Christophe Ricard 759afb8d28 NFC: nci: Add nci_prop_cmd allowing to send proprietary nci cmd
Handle allowing to send proprietary nci commands anywhere in the nci
state machine.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:21 +02:00
Christophe Ricard c39daeee50 NFC: nci: Add nci init ops for early device initialization
Some device may need to execute some proprietary commands
in order to "wake-up"; Before the nci state initialization.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:21 +02:00
Christophe Ricard 81859ab877 NFC: nci: Add NCI_RESET return code check before setup
setup was executed in any case, even if NCI_RESET failed.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:20 +02:00
Samuel Ortiz b6355e972a NFC: nci: Handle proprietary response and notifications
Allow for drivers to explicitly define handlers for each
proprietary notifications and responses they expect to support.

Reviewed-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:20 +02:00
Joe Perches 2622e2a03c NFC: nci: hci: Fix releasing uninitialized skbs
Several of these goto exit; uses should be direct returns
as skb is not yet initialized by nci_hci_get_param().

Miscellanea:

o Use !memcmp instead of memcmp() == 0
o Remove unnecessary goto from if () {... goto exit;} else {...} exit:

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-09 00:34:19 +02:00
Willem de Bruijn bbbf2df003 net: replace last open coded skb_orphan_frags with function call
Commit 70008aa50e ("skbuff: convert to skb_orphan_frags") replaced
open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
to helper function skb_orphan_frags. Apply that to the last remaining
open coded site.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:15:13 -07:00
Josh Hunt 0243508edd ipv6: Fix protocol resubmission
UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.

This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.

With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: "Tom Herbert" <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:13:17 -07:00
Robert Shearman 27e41fcfa6 ipv6: fix possible use after free of dev stats
The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.

Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:12:45 -07:00
Marcel Holtmann 781f899f2f Bluetooth: Fix race condition with user channel and setup stage
During the initial setup stage of a controller, the low-level transport
is actually active. This means that HCI_UP is true. To avoid toggling
the transport off and back on again for normal operation the kernel
holds a grace period with HCI_AUTO_OFF that will turn the low-level
transport off in case no user is present.

The idea of the grace period is important to avoid having to initialize
all of the controller twice. So legacy ioctl and the new management
interface knows how to clear this grace period and then start normal
operation.

For the user channel operation this grace period has not been taken into
account which results in the problem that HCI_UP and HCI_AUTO_OFF are
set and the kernel will return EBUSY. However from a system point of
view the controller is ready to be grabbed by either the ioctl, the
management interface or the user channel.

This patch brings the user channel to the same level as the other two
entries for operating a controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
2015-06-08 11:04:49 +03:00
Firo Yang f38b24c905 fib_trie: coding style: Use pointer after check
As Alexander Duyck pointed out that:
struct tnode {
        ...
        struct key_vector kv[1];
}
The kv[1] member of struct tnode is an arry that refernced by
a null pointer will not crash the system, like this:
struct tnode *p = NULL;
struct key_vector *kv = p->kv;
As such p->kv doesn't actually dereference anything, it is simply a
means for getting the offset to the array from the pointer p.

This patch make the code more regular to avoid making people feel
odd when they look at the code.

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 23:45:40 -07:00
Nikolay Aleksandrov c4c832f89d bridge: disable softirqs around br_fdb_update to avoid lockup
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:44:13 -07:00
David S. Miller 7ff46e79fb Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup"
This reverts commit 1d7c49037b.

Nikolay Aleksandrov has a better version of this fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:43:47 -07:00
Robert Shearman 25cc8f0763 mpls: fix possible use after free of device
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.

Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.

Fixes: 03c57747a7 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:37:27 -07:00
Wilson Kok 1d7c49037b bridge: use _bh spinlock variant for br_fdb_update to avoid lockup
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 15:24:54 -07:00
Eric Dumazet b80c0e7858 tcp: get_cookie_sock() consolidation
IPv4 and IPv6 share same implementation of get_cookie_sock(),
and there is no point inlining it.

We add tcp_ prefix to the common helper name and export it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 15:19:52 -07:00
Antonio Quartulli 94d1dd8731 batman-adv: change the MAC of each VLAN upon ndo_set_mac_address
The MAC address of the soft-interface is used to initialise
the "non-purge" TT entry of each existing VLAN. Therefore
when the user invokes ndo_set_mac_address() all the
"non-purge" TT entries have to be updated, not only the one
belonging to the non-tagged network.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:20 +02:00
Sven Eckelmann 7dac6d9391 batman-adv: Remove unused post-VLAN ethhdr in batadv_gw_dhcp_recipient_get
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:20 +02:00
Sven Eckelmann a2f2b6cd41 batman-adv: Clarify calculation precedence for '&' and '?'
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:19 +02:00
Sven Eckelmann 1e2c2a4fe4 batman-adv: Add required includes to all files
The header files could not be build indepdent from each other. This is
happened because headers didn't include the files for things they've used.
This was problematic because the success of a build depended on the
knowledge about the right order of local includes.

Also source files were not including everything they've used explicitly.
Instead they required that transitive includes are always stable. This is
problematic because some transitive includes are not obvious, depend on
config settings and may not be stable in the future.

The order for include blocks are:

 * primary headers (main.h and the *.h file of a *.c file)
 * global linux headers
 * required local headers
 * extra forward declarations for pointers in function/struct declarations

The only exceptions are linux/bitops.h and linux/if_ether.h in packet.h.
This header file is shared with userspace applications like batctl and must
therefore build together with userspace applications. The header
linux/bitops.h is not part of the uapi headers and linux/if_ether.h
conflicts with the musl implementation of netinet/if_ether.h. The
maintainers rejected the use of __KERNEL__ preprocessor checks and thus
these two headers are only in main.h. All files using packet.h first have
to include main.h to work correctly.

Reported-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:19 +02:00
Antonio Quartulli bcef1f3c49 batman-adv: add bat_neigh_free API
This API has to be used to let any routing protocol free
neighbor specific allocated resources

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:18 +02:00
Antonio Quartulli 6f70eb7552 batman-adv: split name from variable for uint mesh attributes
Some mesh attributes are behind substructs in the
batadv_priv object and for this reason the name cannot be
used anymore to refer to them.

This patch allows to specify the variable name where the
attribute is stored inside batadv_priv instead of using the
name

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:18 +02:00
Sven Eckelmann 36fd61cb80 batman-adv: Use common Jenkins Hash implementation
An unoptimized version of the Jenkins one-at-a-time hash function is used
and partially copied all over the code wherever an hashtable is used.
Instead the optimized version shared between the whole kernel should be
used to reduce code duplication and use better optimized code.

Only the DAT code must use the old implementation because it is used as
distributed hash function which has to be common for all nodes.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:17 +02:00
Alexei Starovoitov d691f9e8d4 bpf: allow programs to write to certain skb fields
allow programs read/write skb->mark, tc_index fields and
((struct qdisc_skb_cb *)cb)->data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 02:01:33 -07:00
Alexei Starovoitov 3431205e03 bpf: make programs see skb->data == L2 for ingress and egress
eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt->func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
       w/o JIT   w/JIT
before  20.5     23.6 Mpps
after   21.8     26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe ("bpf: fix bpf helpers to use skb->mac_header relative offsets")

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 02:01:33 -07:00
Eric Dumazet 98da81a426 tcp: remove redundant checks II
For same reasons than in commit 12e25e1041 ("tcp: remove redundant
checks"), we can remove redundant checks done for timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 01:55:01 -07:00
Alexander Aring ed65963ba0 mac802154: remove unneeded vif struct
This patch removes the virtual interface structure from sub if data
struct, because it isn't used anywhere. This structure could be useful
for give per interface information at softmac driver layer. Nevertheless
there exist no use case currently and it contains the interface type
information currently. This information is also stored inside wpan dev
which is now used to check on the wpan dev interface type.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Varka Bhadram <varkabhadram@gmail.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-07 09:13:32 +02:00
Eric Dumazet 90c337da15 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06 23:57:12 -07:00
Loic Poulain 9380f9eacf Bluetooth: Reorder HCI user channel socket release
The hci close method needs to know if we are in user channel context.
Only add the index to mgmt once close is performed.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-06 20:49:04 +02:00
Jaganath Kanakkassery 951b6a0717 Bluetooth: Fix potential NULL dereference in RFCOMM bind callback
addr can be NULL and it should not be dereferenced before NULL checking.

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-06 08:44:33 +02:00
Trond Myklebust 0d2a970d0a SUNRPC: Fix a backchannel race
We need to allow the server to send a new request immediately after we've
replied to the previous one. Right now, there is a window between the
send and the release of the old request in rpc_put_task(), where the
server could send us a new backchannel RPC call, and we have no
request to service it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-05 11:15:43 -04:00
Trond Myklebust 1dddda86c0 SUNRPC: Clean up allocation and freeing of back channel requests
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-05 11:15:43 -04:00
Trond Myklebust 0f41979164 SUNRPC: Remove unused argument 'tk_ops' in rpc_run_bc_task
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-05 11:15:42 -04:00
Tom Herbert b3baa0fbd0 mpls: Add MPLS entropy label in flow_keys
In flow dissector if an MPLS header contains an entropy label this is
saved in the new keyid field of flow_keys. The entropy label is
then represented in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert 1fdd512c92 net: Add GRE keyid in flow_keys
In flow dissector if a GRE header contains a keyid this is saved in the
new keyid field of flow_keys. The GRE keyid is then represented
in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert 87ee9e52ff net: Add IPv6 flow label to flow_keys
In flow_dissector set the flow label in flow_keys for IPv6. This also
removes the shortcircuiting of flow dissection when a non-zero label
is present, the flow label can be considered to provide additional
entropy for a hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert d34af823ff net: Add VLAN ID to flow_keys
In flow_dissector set vlan_id in flow_keys when VLAN is found.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert 45b47fd00c net: Get rid of IPv6 hash addresses flow keys
We don't need to return the IPv6 address hash as part of flow keys.
In general, using the IPv6 address hash is risky in a hash value
since the underlying use of xor provides no entropy. If someone
really needs the hash value they can get it from the full IPv6
addresses in flow keys (e.g. from flow_get_u32_src).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert 9f24908901 net: Add keys for TIPC address
Add a new flow key for TIPC addresses.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert c3f8324188 net: Add full IPv6 addresses to flow_keys
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert 42aecaa9bb net: Get skb hash over flow_keys structure
This patch changes flow hashing to use jhash2 over the flow_keys
structure instead just doing jhash_3words over src, dst, and ports.
This method will allow us take more input into the hashing function
so that we can include full IPv6 addresses, VLAN, flow labels etc.
without needing to resort to xor'ing which makes for a poor hash.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert c468efe2c7 net: Remove superfluous setting of key_basic
key_basic is set twice in __skb_flow_dissect which seems unnecessary.
Remove second one.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert ce3b535547 net: Simplify GRE case in flow_dissector
Do break when we see routing flag or a non-zero version number in GRE
header.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Chuck Lever ffe1f0df58 rpcrdma: Merge svcrdma and xprtrdma modules into one
Bi-directional RPC support means code in svcrdma.ko invokes a bit of
code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
merge the server and client side modules together into a single
module.

When backchannel capabilities are added, the combined module will
register all needed transport capabilities so that Upper Layer
consumers automatically have everything needed to create a
bi-directional transport connection.

Module aliases are added for backwards compatibility with user
space, which still may expect svcrdma.ko or xprtrdma.ko to be
present.

This commit reverts commit 2e8c12e1b7 ("xprtrdma: add separate
Kconfig options for NFSoRDMA client and server support") and
provides a single CONFIG option for enabling the new module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04 16:56:02 -04:00
Chuck Lever 0380a3f375 svcrdma: Add a separate "max data segs macro for svcrdma
The server and client maximum are architecturally independent.
Allow changing one without affecting the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04 16:56:01 -04:00
Chuck Lever b7e0b9a965 svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
At the 2015 LSF/MM, it was requested that memory allocation
call sites that request GFP_KERNEL allocations in a loop should be
annotated with __GFP_NOFAIL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04 16:56:00 -04:00
Chuck Lever 30b7e246a6 svcrdma: Keep rpcrdma_msg fields in network byte-order
Fields in struct rpcrdma_msg are __be32. Don't byte-swap these
fields when decoding RPC calls and then swap them back for the
reply. For the most part, they can be left alone.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04 16:55:59 -04:00
Chuck Lever 70747c25a7 svcrdma: Fix byte-swapping in svc_rdma_sendto.c
In send_write_chunks(), we have:

	for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0;
	     xfer_len && chunk_no < arg_ary->wc_nchunks;
	     chunk_no++) {
		 . . .
	}

Note that arg_ary->wc_nchunk is in network byte-order. For the
comparison to work correctly, both have to be in native byte-order.

In send_reply_chunks, we have:

	write_len = min(xfer_len, htonl(ch->rs_length));

xfer_len is in native byte-order, and ch->rs_length is in
network byte-order. be32_to_cpu() is the correct byte swap
for ch->rs_length.

As an additional clean up, replace ntohl() with be32_to_cpu() in
a few other places.

This appears to address a problem with large rsize hangs while
using PHYSICAL memory registration. I suspect that is the only
registration mode that uses more than one chunk element.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04 16:55:58 -04:00
Alexei Starovoitov 94db13fe5f bpf: fix build due to missing tc_verd
fix build error:
net/core/filter.c: In function 'bpf_clone_redirect':
net/core/filter.c:1429:18: error: 'struct sk_buff' has no member named 'tc_verd'
  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)

Fixes: 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 11:45:59 -07:00
Varka Bhadram 133be0264f nl802154: export supported commands
This patch will export the supported commands by the devices
to the userspace. This will be useful to check if HardMAC
drivers can support a specific command or not.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-04 12:27:15 +02:00
Lennert Buytenhek 8a70cefa30 ieee802154: Fix sockaddr_ieee802154 implicit padding information leak.
The AF_IEEE802154 sockaddr looks like this:

	struct sockaddr_ieee802154 {
		sa_family_t family; /* AF_IEEE802154 */
		struct ieee802154_addr_sa addr;
	};

	struct ieee802154_addr_sa {
		int addr_type;
		u16 pan_id;
		union {
			u8 hwaddr[IEEE802154_ADDR_LEN];
			u16 short_addr;
		};
	};

On most architectures there will be implicit structure padding here,
in two different places:

* In struct sockaddr_ieee802154, two bytes of padding between 'family'
  (unsigned short) and 'addr', so that 'addr' starts on a four byte
  boundary.

* In struct ieee802154_addr_sa, two bytes at the end of the structure,
  to make the structure 16 bytes.

When calling recvmsg(2) on a PF_IEEE802154 SOCK_DGRAM socket, the
ieee802154 stack constructs a struct sockaddr_ieee802154 on the
kernel stack without clearing these padding fields, and, depending
on the addr_type, between four and ten bytes of uncleared kernel
stack will be copied to userspace.

We can't just insert two 'u16 __pad's in the right places and zero
those before copying an address to userspace, as not all architectures
insert this implicit padding -- from a quick test it seems that avr32,
cris and m68k don't insert this padding, while every other architecture
that I have cross compilers for does insert this padding.

The easiest way to plug the leak is to just memset the whole struct
sockaddr_ieee802154 before filling in the fields we want to fill in,
and that's what this patch does.

Cc: stable@vger.kernel.org
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-04 12:26:58 +02:00
Varka Bhadram 07bd77fa4c cfg802154: fix rdev-ops naming convension and format specifiers
This patch make to use the same naming convention that mac802154
tracing follows and fixes the format specifier for extended addr.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-04 12:26:58 +02:00
Wei Liu c39c4c6abb tcp: double default TSQ output bytes limit
Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 01:09:36 -07:00
Eric Dumazet 12e25e1041 tcp: remove redundant checks
tcp_v4_rcv() checks the following before calling tcp_v4_do_rcv():

if (th->doff < sizeof(struct tcphdr) / 4)
    goto bad_packet;
if (!pskb_may_pull(skb, th->doff * 4))
    goto discard_it;

So following check in tcp_v4_do_rcv() is redundant
and "goto csum_err;" is wrong anyway.

if (skb->len < tcp_hdrlen(skb) || ...)
	goto csum_err;

A second check can be removed after no_tcp_socket label for same reason.

Same tests can be removed in tcp_v6_do_rcv()

Note : short tcp frames are not properly accounted in tcpInErrs MIB,
because pskb_may_pull() failure simply drops incoming skb, we might
fix this in a separate patch.

Signed-off-by: Eric Dumazet  <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 01:04:40 -07:00
Shawn Bohrer 6e54030932 ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
421b3885bf "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.

Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 00:46:26 -07:00
Martin Willi b08b6b7791 xfrm: Define ChaCha20-Poly1305 AEAD XFRM algo for IPsec users
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-04 15:04:55 +08:00
Scott Feldman 7616dcbb21 switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops
Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
David S. Miller cf71f43e44 Included changes:
- code re-arrangement for better reading and understanding
 - code style fixups
 - comments corrections
 - remove unnecessary NULL check in batadv_iv_ogm_update_seqnos()
 - make boolean functions explicitly return a bool result
 - remove unnecessary variables in algo_register() and algo_select()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVbwnkAAoJEOb/4TMchkvf6iAP/3MyKJDF9ubmOLkZpWKyBq+/
 MsTEN4PFRxQQ7Q2+Cct1MshuD0DBBznG+Nu1UwYUB5ahUPUmpntJ8hQoD982jT3u
 K4h/tHlyEtRVxPzYwW79woE/Q+hjdGqE745eKMHury0K+SkNR4jX3yJ7bjVRwQiC
 Sdk6uProCCgK5JHX++bxjbTnJobCvqCSy045hjMxuwFuTG4S+5le60m+tVe21D3C
 tnyT3y6L4OdbhKpBRMMAFkxYUzQONxiEWMYffubM6gk+ziIAttAJemLyE+ViHAH4
 Y7ItGd9Z/5+mPaO0OF3Q3jfN1jhGf3IxoYgKy9rL5JWIy6qomx0TTfPoPTDRYFR+
 2iQX59FIayaa9CgYbauHopEiDOJQ/nQ437haPO25xT9ICZbnPNWshdv9Z+zLNV/A
 uuUQrN+aWNLo9j40iD01s7AfPcYNDYklqygb9hSLTa7yeH/rPCG/RqJJ7zse4IQa
 /QMl1lUl484gPHFqMTVB7/75KL5G5B+KQdwON3AqnyRR3RrlOm7NbtcvuDTDheeW
 BAU5g7y/RG3DSoGtwPvFG6MyyPK8C2+niLY7EWUrs1EBWc5DGH+/oeVBR6SL46Fv
 KY1TiFrzvczjUKA0NyLw3w/jeE3SGxiVEBGN2Wv7veVwuV2Jc3MLGxNZGoKPum/k
 Vz7vG3ghIRM3aA1dO6Nx
 =m9Yd
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
pull request: batman-adv 20150603

here you have our second batch of patches intended for net-next.

In this patchset you won't find any new features, but quite some code
cleanup work, a bunch of code style fixes and also comments corrections
by Markus Pargmann.

Moreover you have a patch from Sven Eckelmann removing an unnecessary
NULL check in batadv_iv_ogm_update_seqnos().

Please pull or let me know of any problem!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:22:46 -07:00
Alexei Starovoitov 3896d655f4 bpf: introduce bpf_clone_redirect() helper
Allow eBPF programs attached to classifier/actions to call
bpf_clone_redirect(skb, ifindex, flags) helper which will
mirror or redirect the packet by dynamic ifindex selection
from within the program to a target device either at ingress
or at egress. Can be used for various scenarios, for example,
to load balance skbs into veths, split parts of the traffic
to local taps, etc.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:16:58 -07:00
Jiri Benc 640b2b107c openvswitch: disable LRO
Currently, openvswitch tries to disable LRO from the user space. This does
not work correctly when the device added is a vlan interface, though.
Instead of dealing with possibly complex stacked cross name space relations
in the user space, do the same as bridging does and call dev_disable_lro in
the kernel.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 19:39:35 -07:00
Chuck Lever da7049f834 svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
svc_rdma_xdr_decode_deferred_req() indexes an array with an
un-byte-swapped value off the wire. Fortunately this function
isn't used anywhere, so simply remove it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-03 15:15:23 -04:00
Chuck Lever 3f87d5d6ac SUNRPC: Move EXPORT_SYMBOL for svc_process
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-03 15:15:22 -04:00
Markus Pargmann f372d09059 batman-adv: Remove unnecessary ret variable in algo_register
Remove ret variable and all jumps.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:25 +02:00
Markus Pargmann 9fb6c6519b batman-adv: Remove unnecessary ret variable
We can avoid this indirect return variable by directly returning the
error values.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:24 +02:00
Markus Pargmann f2d5cf2add batman-adv: main, batadv_compare_eth return bool
Declare the returntype of batadv_compare_eth as bool.
The function called inside this helper function
(ether_addr_equal_unaligned) also uses bool as return value, so there is
no need to return int.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:24 +02:00
Markus Pargmann e8ad3b1acf batman-adv: main, Convert is_my_mac() to bool
It is much clearer to see a bool type as return value than 'int' for
functions that are supposed to return true or false.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:24 +02:00
Sven Eckelmann a0c77227ff batman-adv: Remove unnecessary check for orig_ifinfo not NULL
orig_ifinfo is dereferenced multiple times in batadv_iv_ogm_update_seqnos
before the check for NULL is done. The function also exists at the
beginning when orig_ifinfo would have been NULL. This makes the check at
the end unnecessary and only confuses the reader/code analyzers.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:23 +02:00
Markus Pargmann 21102626da batman-adv: types, Fix comment on bcast_own
batadv_orig_bat_iv->bcast_own is actually not a bitfield, it is an
array. Adjust the comment accordingly.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:23 +02:00
Markus Pargmann d491dbb68b batman-adv: iv_ogm, fix comment function name
This is a small copy paste fix for batadv_ing_buffer_avg.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:32 +02:00
Markus Pargmann 6c4a1622e2 batman-adv: iv_ogm, fix coding style
The kernel coding style says, that there should not be multiple
assignments in one row.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:31 +02:00
Markus Pargmann 9f52ee19c3 batman-adv: iv_ogm, Fix dup_status comment
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:31 +02:00
Markus Pargmann 23badd6dbe batman-adv: iv_ogm_orig_update, style, add missing brackets
CodingStyle describes that either none or both branches of a conditional
have to have brackets.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:31 +02:00
Markus Pargmann 564891510e batman-adv: iv_ogm_queue_add, Simplify expressions
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:30 +02:00
Markus Pargmann 940d156f52 batman-adv: iv_ogm_aggregate_new, simplify error handling
It is just a bit easier to put the error handling at one place and let
multiple error paths use the same calls.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 10:58:30 +02:00
Johannes Berg c526a46767 mac80211: rename single hw-scan flag to follow naming convention
The naming convention is to always have the flags prefixed with
IEEE80211_HW_ so they're 'namespaced', make this flag follow it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 20:32:00 +02:00
Johannes Berg ea1b2b45f5 mac80211: remove short slot/short preamble incapable flags
There are no drivers setting IEEE80211_HW_2GHZ_SHORT_SLOT_INCAPABLE
or IEEE80211_HW_2GHZ_SHORT_PREAMBLE_INCAPABLE, so any code using the
two flags is dead; it's also exceedingly unlikely that any new driver
could ever need to set these flags.

The wcn36xx code is almost certainly broken, but this preserves the
previous behaviour.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 20:28:58 +02:00
Chuck Lever 632dda833e SUNRPC: Clean up bc_send()
Clean up: Merge bc_send() into bc_svc_process().

Note: even thought this touches svc.c, it is a client-side change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02 13:30:35 -04:00
Trond Myklebust 1193d58f75 SUNRPC: Backchannel handle socket nospace
If the socket was busy due to a socket nospace error, then we should
retry the send.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02 13:30:35 -04:00
Varka Bhadram 0ecc4e688b mac802154: add trace functionality for driver ops
This patch adds trace events for driver operations.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-02 19:21:09 +02:00
Alexander Aring 1caf6f476e ieee802154: 6lowpan: set ackreq when needed
This patch sets the acknowledge request bit inside the 802.15.4 mac
header when frame retries is 0 or above. The other frame retries value
which is -1 indicates that the transmitter doesn't care about an
acknowledge frame which will be ignored after transmitting if the node
sends anyway an ack frame after receiving. This is currently unnecessary
traffic if the max frame retries parameter is -1.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-06-02 17:09:35 +02:00
Doug Ledford b806ef3bbe Merge branch 'for-4.2-misc' into k.o/for-4.2 2015-06-02 09:33:22 -04:00
Wengang Wang d655a9fbc8 rds: re-entry of rds_ib_xmit/rds_iw_xmit
The BUG_ON at line 452/453 is triggered in function rds_send_xmit.

 441                         while (ret) {
 442                                 tmp = min_t(int, ret, sg->length -
 443                                                       conn->c_xmit_data_off);
 444                                 conn->c_xmit_data_off += tmp;
 445                                 ret -= tmp;
 446                                 if (conn->c_xmit_data_off == sg->length) {
 447                                         conn->c_xmit_data_off = 0;
 448                                         sg++;
 449                                         conn->c_xmit_sg++;
 450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
 451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
 452                                         BUG_ON(ret != 0 &&
 453                                                conn->c_xmit_sg == rm->data.op_nents);
 454                                 }
 455                         }

it is complaining the total sent length is bigger that we want to send.

rds_ib_xmit() is wrong for the second entry for the same rds_message returning
wrong value.

the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
scatterlist.offset/length, but the rds_ib_xmit action is based on
scatterlist.dma_address/dma_length. in case dma_length is larger than length
there is problem. for the 2nd and later entries of rds_ib_xmit for same
rds_message, at least one of the following two is wrong:

1) the scatterlist to start with,  the choosen one can far beyond the correct
   one.
2) the offset to start with within the scatterlist.

fix:
add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
and op_dmaoff are initialized to zero when doing dma mapping for the first see
of the message and are changed when filling send slots.

the same applies to rds_iw_xmit too.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02 09:22:31 -04:00
Trond Myklebust 88de6af24f SUNRPC: Fix a memory leak in the backchannel code
req->rq_private_buf isn't initialised when xprt_setup_backchannel calls
xprt_free_allocation.

Fixes: fb7a0b9add ("nfs41: New backchannel helper routines")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02 08:55:28 -04:00
Stefan Hajnoczi 9300fdba25 SUNRPC: drop stale doc comments in xprtsock.c
Several functions have outdated arguments listed in the doc comments.
Drop documentation for arguments that no longer exist.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02 08:55:28 -04:00
Johannes Berg 3b79af973c mac80211: stop using pointers as userspace cookies
Even if the pointers are really only accessible to root and used
pretty much only by wpa_supplicant, this is still not great; even
for debugging it'd be easier to have something that's easier to
read and guaranteed to never get reused.

With the recent change to make mac80211 create an ack_skb for the
mgmt-tx path this becomes possible, only the client probe method
needs to also allocate an ack_skb, and we can store the cookie in
that skb.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 13:07:59 +02:00
Johannes Berg b2eb0ee6d0 mac80211: copy nl80211 mgmt TX SKB for status
When we return the TX status for an nl80211 mgmt TX SKB, we
should also return the original frame with the status to
allow userspace to match up the submission (it could also
use the cookie but both ways are permissible.)

As TX SKBs could be encrypted, at least in the case of ANQP
while associated with the AP, copy the original SKB, store
it with an ACK frame ID and restructure the status path to
use that to return status with the original SKB. Otherwise,
userspace (in particular wpa_supplicant) will get confused.

Reported-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 13:07:55 +02:00
Johannes Berg db388a567f mac80211: move TX PN to public part of key struct
For drivers supporting TSO or similar features, but that still have
PN assignment in software, there's a need to have some memory to
store the current PN value. As mac80211 already stores this and it's
somewhat complicated to add a per-driver area to the key struct (due
to the dynamic sizing thereof) it makes sense to just move the TX PN
to the keyconf, i.e. the public part of the key struct.

As TKIP is more complicated and we won't able to offload it in this
way right now (fast-xmit is skipped for TKIP unless the HW does it
all, and our hardware needs MMIC calculation in software) I've not
moved that for now - it's possible but requires exposing a lot of
the internal TKIP state.

As an bonus side effect, we can remove a lot of code by assuming the
keyseq struct has a certain layout - with BUILD_BUG_ON to verify it.

This might also improve performance, since now TX and RX no longer
share a cacheline.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-02 11:16:35 +02:00
David S. Miller dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
David S. Miller e453581dd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following patch reverts the ebtables chunk that enforces counters that was
introduced in the recently applied d26e2c9ffa ('Revert "netfilter: ensure
number of counters is >0 in do_replace()"') since this breaks ebtables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:56:43 -07:00
Toshiaki Makita 66e5133f19 vlan: Add GRO support for non hardware accelerated vlan
Currently packets with non-hardware-accelerated vlan cannot be handled
by GRO. This causes low performance for 802.1ad and stacked vlan, as their
vlan tags are currently not stripped by hardware.

This patch adds GRO support for non-hardware-accelerated vlan and
improves receive performance of them.

Test Environment:
 vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

Result:

- Before

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    5233.17

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.27     58.03      0.00     41.70      0.00

- After

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    7586.85

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.50     25.83      0.00     59.53     14.14

[ Register VLAN offloads with priority 10 -DaveM ]

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:50:52 -07:00
Steffen Klassert ccd740cbc6 vti6: Add pmtu handling to vti6_xmit.
We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:03:43 -07:00
Neil McKee ccea74457b openvswitch: include datapath actions with sampled-packet upcall to userspace
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.

Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 15:05:40 -07:00
David S. Miller bdef7de4b8 net: Add priority to packet_offload objects.
When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.

So add a priority field so we can handle this properly.

IPv4/IPv6 get the highest priority with the implicit zero priority
field.

Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 14:56:09 -07:00
David S. Miller 18ec898ee5 Revert "net: core: 'ethtool' issue with querying phy settings"
This reverts commit f96dee13b8.

It isn't right, ethtool is meant to manage one PHY instance
per netdevice at a time, and this is selected by the SET
command.  Therefore by definition the GET command must only
return the settings for the configured and selected PHY.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 14:43:50 -07:00
Bernhard Thaler d26e2c9ffa Revert "netfilter: ensure number of counters is >0 in do_replace()"
This partially reverts commit 1086bbe97a ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.

Setting rules with ebtables does not work any more with 1086bbe97a place.

There is an error message and no rules set in the end.

e.g.

~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
   userspace tool doesn't by default support multiple ebtables programs
running

Reverting the ebtables part of 1086bbe97a makes this work again.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-01 19:45:47 +02:00
Johannes Berg c9c99f8938 mac80211: act upon and report deauth while associating
When trying to associate, the AP could send a deauth frame instead.
Currently mac80211 drops that frame and doesn't report it to the
supplicant, which, in some versions and/or in certain circumstances
will simply keep trying to associate over and over again instead of
trying authentication again.

Fix this by reacting to deauth frames while associating, reporting
them to the supplicant and dropping the association attempt (which
is bound to fail.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-01 14:10:27 +02:00
Florian Fainelli 24595346d7 net: dsa: Properly propagate errors from dsa_switch_setup_one
While shuffling some code around, dsa_switch_setup_one() was introduced,
and it was modified to return either an error code using ERR_PTR() or a
NULL pointer when running out of memory or failing to setup a switch.

This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
and expects to find an error code, not a NULL pointer, so we still try
to proceed with dsa_switch_setup() and operate on invalid memory
addresses. This can be easily reproduced by having e.g: the bcm_sf2
driver built-in, but having no such switch, such that drv->setup will
fail.

Fix this by using PTR_ERR() consistently which is both more informative
and avoids for the caller to use IS_ERR_OR_NULL().

Fixes: df197195a5 ("net: dsa: split dsa_switch_setup into two functions")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:50:34 -07:00
Neal Cardwell 9f950415e4 tcp: fix child sockets to use system default congestion control if not set
Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.

This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.

In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.

Fixes: 55d8694fa8 ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:49:14 -07:00
Sowmini Varadhan 8ba38460f3 net/rds Add getsockopt support for SO_RDS_TRANSPORT
The currently attached transport for a PF_RDS socket may be obtained
from user space by invoking getsockopt(2) using the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer optval returned will be one
of the RDS_TRANS_* constants defined in linux/rds.h.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:47:23 -07:00
Sowmini Varadhan d97dac54bf net/rds: Add setsockopt support for SO_RDS_TRANSPORT
An application may deterministically attach the underlying transport for
a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer argument to setsockopt must be
one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
must be specified before invoking bind(2) on the socket, and may only
be used once on the socket. An attempt to set the option on a bound
socket, or to invoke the option after a successful SO_RDS_TRANSPORT
attachment, will return EOPNOTSUPP.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:47:23 -07:00
Sowmini Varadhan a28c257c9e net/rds: Declare SO_RDS_TRANSPORT and RDS_TRANS_* constants in uapi/linux/rds.h
User space applications that desire to explicitly select the
underlying transport for a PF_RDS socket may do so by using the
SO_RDS_TRANSPORT socket option at the SOL_RDS level before bind().
The integer argument provided to the socket option would be one
of the RDS_TRANS_* values, e.g., RDS_TRANS_TCP. This commit exports
the constant values need by such applications via <linux/rds.h>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:47:23 -07:00