Commit Graph

41357 Commits

Author SHA1 Message Date
Linus Torvalds 814a2bf957 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - a couple of hotfixes

 - the rest of MM

 - a new timer slack control in procfs

 - a couple of procfs fixes

 - a few misc things

 - some printk tweaks

 - lib/ updates, notably to radix-tree.

 - add my and Nick Piggin's old userspace radix-tree test harness to
   tools/testing/radix-tree/.  Matthew said it was a godsend during the
   radix-tree work he did.

 - a few code-size improvements, switching to __always_inline where gcc
   screwed up.

 - partially implement character sets in sscanf

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  sscanf: implement basic character sets
  lib/bug.c: use common WARN helper
  param: convert some "on"/"off" users to strtobool
  lib: add "on"/"off" support to kstrtobool
  lib: update single-char callers of strtobool()
  lib: move strtobool() to kstrtobool()
  include/linux/unaligned: force inlining of byteswap operations
  include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
  include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
  usb: common: convert to use match_string() helper
  ide: hpt366: convert to use match_string() helper
  ata: hpt366: convert to use match_string() helper
  power: ab8500: convert to use match_string() helper
  power: charger_manager: convert to use match_string() helper
  drm/edid: convert to use match_string() helper
  pinctrl: convert to use match_string() helper
  device property: convert to use match_string() helper
  lib/string: introduce match_string() helper
  radix-tree tests: add test for radix_tree_iter_next
  radix-tree tests: add regression3 test
  ...
2016-03-18 19:26:54 -07:00
Sowmini Varadhan a3382e408b RDS: TCP: Remove unused constant
RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d2
("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune").

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 22:25:41 -04:00
Sowmini Varadhan c6a58ffed5 RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
Add per-net sysctl tunables to set the size of sndbuf and
rcvbuf on the kernel tcp socket.

The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf
and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf.

These values must be set before accept() or connect(),
and there may be an arbitrary number of existing rds-tcp
sockets when the tunable is modified. To make sure that all
connections in the netns pick up the same value for the tunable,
we reset existing rds-tcp connections in the netns, so that
they can reconnect with the new parameters.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 22:25:41 -04:00
Daniel Borkmann fca5fdf67d ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
for this, which makes the code a bit more generic and allows to remove
BPF_TUNLEN_MAX from eBPF code.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 19:38:46 -04:00
Daniel Borkmann 808c1b697c bpf, dst: add and use dst_tclassid helper
We can just add a small helper dst_tclassid() for retrieving the
dst->tclassid value. It makes the code a bit better in that we can
get rid of the ifdef from filter.c by moving this into the header.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 19:38:46 -04:00
Daniel Borkmann 09c37a2c5b bpf: make skb->tc_classid also readable
Currently, the tc_classid from eBPF skb context is write-only, but there's
no good reason for tc programs to limit it to write-only. For example,
it can be used to transfer its state via tail calls where the resulting
tc_classid gets filled gradually.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 19:38:46 -04:00
Daniel Borkmann 3a461da1d0 cls_bpf: reset class and reuse major in da
There are two issues with the current code. First one is that we need
to set res->class to 0 in case we use non-default classid matching.

This is important for the case where cls_bpf was initially set up with
an optional binding to a default class with tcf_bind_filter(), where
the underlying qdisc implements bind_tcf() that fills res->class and
tests for it later on when doing the classification. Convention for
these cases is that after tc_classify() was called, such qdiscs (atm,
drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup
based on classid.

Second, there's a bug with da mode, where res->classid is only assigned
a 16 bit minor, but it needs to expand to the full 32 bit major/minor
combination instead, therefore we need to expand with the bound major.
This is fine as classes belonging to a classful qdisc must share the
same major.

Fixes: 045efa82ff ("cls_bpf: introduce integrated actions")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 19:35:21 -04:00
Simon Horman fe3a5f6c79 openvswitch: allow output of MPLS packets on tunnel vports
Currently output of MPLS packets on tunnel vports is not allowed by Open
vSwitch. This is because historically encapsulation was done in such a way
that the inner_protocol field of the skb needed to hold the inner protocol
for both MPLS and tunnel encapsulation in order for GSO segmentation to be
performed correctly.

Since b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of
vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
perform encapsulation. As no drivers expose support for MPLS offloads this
means that GSO packets are segmented in software by validate_xmit_skb(),
which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
This means that the inner protocol of MPLS is no longer needed by the time
encapsulation occurs and the contention on the inner_protocol field of the
skb no longer occurs.

Thus it is now safe to output MPLS to tunnel vports.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 18:26:38 -04:00
Wu Fengguang e014e84685 ovs: internal_set_rx_headroom() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 17:50:36 -04:00
Wu Fengguang b73f96fcb4 net: dst_cache_per_cpu_dst_set() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 17:45:08 -04:00
Linus Torvalds 9ea4463520 Initial roundup of 4.6 merge window patches
- cxgb4 updates
 - nes updates
 - unification of iwarp portmapper code to core
 - add drain_cq API
 - various ib_core updates
 - minor ipoib updates
 - minor mlx4 updates
 - more significant mlx5 updates (including a minor merge conflict with
   net-next tree...merge is simple to resolve and Stephen's resolution was
   confirmed by Mellanox)
 - trivial net/9p rdma conversion
 - ocrdma RoCEv2 update
 - srpt updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6aTEAAoJELgmozMOVy/dlAEQAKgT0VwBi6Zd4PihP2UQgsfH
 LUmbGhCzBpcao1eJ7piOOEYQGSb3slN3Cnup4qBJak+y2mhtErxNkLOIhGRrvcHk
 XCym7N9uAhp4j++OnUBp6Cpr0hZNmBEBKm6nKqdEcdaxLaVa0ezdcxAOkVlHhZ77
 NnhTHvPy8pu4kC8NZCvCIJK+fqW+5Xj+ojAcVKGPV+Y3zf9lfaDCXCSdD2m6+dFX
 /KV3V/CNUSdYTWrPZSIDhqoYix2AGl5Fg17mfsgBWQB/T405fiwZkd0FEXkqXDkR
 bOhS5PnuCN+ScwsxMDHCbzqtaOb06sKttg9IE3s0qdFpOwGtbyoU+lLUh1qbjKLP
 vtEiySZq2Mhlr41ajuUuDSgNbqCTL7+52/HUf8qcjFFiSBlZRaTO8rVJ5tABKRiW
 SkxkHbR6orx8okKtaWRskKRtYSNkA2uexdIQ/wzc4fJVqzqJUh6Elcxp3dPq/KSN
 lkrYXNJ5X4ux72QfHRobBX1pBjT0P2+avoFri3763k9ZrsWwY9tXgDUB/OdX11IF
 gAadgUNw2pHgY10jqCZBOw22F+foB2qx8ZkaNSGYE0h3uQrp+iiCnfeU9rWNCWVv
 MelRGpfGa7VF3RTDojc7Dq7JpWRUChMx9BY+XrQPmV08Z+JGoVuRT20Q7twgillz
 Yb3aGRKZNtqYehj9fM4n
 =kTkT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "Initial roundup of 4.6 merge window patches.

  This is the first of two pull requests.  It is the smaller request,
  but touches for more different things (this is everything but what is
  in or going into staging).  The pull request for the code in
  staging/rdma is on hold until after we decide what to do on the
  write/writev API issue and may be partially deferred until 4.7 as a
  result.

  Summary:

   - cxgb4 updates
   - nes updates
   - unification of iwarp portmapper code to core
   - add drain_cq API
   - various ib_core updates
   - minor ipoib updates
   - minor mlx4 updates
   - more significant mlx5 updates (including a minor merge conflict
     with net-next tree...merge is simple to resolve and Stephen's
     resolution was confirmed by Mellanox)
   - trivial net/9p rdma conversion
   - ocrdma RoCEv2 update
   - srpt updates"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits)
  iwpm: crash fix for large connections test
  iw_cxgb3: support for iWARP port mapping
  iw_cxgb4: remove port mapper related code
  iw_nes: remove port mapper related code
  iwcm: common code for port mapper
  net/9p: convert to new CQ API
  IB/mlx5: Add support for don't trap rules
  net/mlx5_core: Introduce forward to next priority action
  net/mlx5_core: Create anchor of last flow table
  iser: Accept arbitrary sg lists mapping if the device supports it
  mlx5: Add arbitrary sg list support
  IB/core: Add arbitrary sg_list support
  IB/mlx5: Expose correct max_fast_reg_page_list_len
  IB/mlx5: Make coding style more consistent
  IB/mlx5: Convert UMR CQ to new CQ API
  IB/ocrdma: Skip using unneeded intermediate variable
  IB/ocrdma: Skip using unneeded intermediate variable
  IB/ocrdma: Delete unnecessary variable initialisations in 11 functions
  IB/core: Documentation fix in the MAD header file
  IB/core: trivial prink cleanup.
  ...
2016-03-18 09:39:22 -07:00
Linus Torvalds 49dc2b7173 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  drivers/rtc: broken link fix
  drm/i915 Fix typos in i915_gem_fence.c
  Docs: fix missing word in REPORTING-BUGS
  lib+mm: fix few spelling mistakes
  MAINTAINERS: add git URL for APM driver
  treewide: Fix typo in printk
2016-03-17 21:38:27 -07:00
Eric Dumazet e316ea62e3 tcp/dccp: remove obsolete WARN_ON() in icmp handlers
Now SYN_RECV request sockets are installed in ehash table, an ICMP
handler can find a request socket while another cpu handles an incoming
packet transforming this SYN_RECV request socket into an ESTABLISHED
socket.

We need to remove the now obsolete WARN_ON(req->sk), since req->sk
is set when a new child is created and added into listener accept queue.

If this race happens, the ICMP will do nothing special.

Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ben Lazarus <blazarus@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17 21:06:40 -04:00
Eric Dumazet f6773c5e95 vlan: propagate gso_max_segs
vlan drivers lack proper propagation of gso_max_segs from
lower device.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17 21:05:01 -04:00
Joonsoo Kim fe896d1878 mm: introduce page reference manipulation functions
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Linus Torvalds 96b9b1c956 TTY/Serial patches for 4.6-rc1
Here's the big tty/serial driver pull request for 4.6-rc1.
 
 Lots of changes in here, Peter has been on a tear again, with lots of
 refactoring and bugs fixes, many thanks to the great work he has been
 doing.  Lots of driver updates and fixes as well, full details in the
 shortlog.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlbp8z8ACgkQMUfUDdst+ym1vwCgnOOCORaZyeQ4QrcxPAK5pHFn
 VrMAoNHvDgNYtG+Hmzv25Lgp3HnysPin
 =MLRG
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here's the big tty/serial driver pull request for 4.6-rc1.

  Lots of changes in here, Peter has been on a tear again, with lots of
  refactoring and bugs fixes, many thanks to the great work he has been
  doing.  Lots of driver updates and fixes as well, full details in the
  shortlog.

  All have been in linux-next for a while with no reported issues"

* tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits)
  serial: 8250: describe CONFIG_SERIAL_8250_RSA
  serial: samsung: optimize UART rx fifo access routine
  serial: pl011: add mark/space parity support
  serial: sa1100: make sa1100_register_uart_fns a function
  tty: serial: 8250: add MOXA Smartio MUE boards support
  serial: 8250: convert drivers to use up_to_u8250p()
  serial: 8250/mediatek: fix building with SERIAL_8250=m
  serial: 8250/ingenic: fix building with SERIAL_8250=m
  serial: 8250/uniphier: fix modular build
  Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular"
  Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular"
  serial: mvebu-uart: initial support for Armada-3700 serial port
  serial: mctrl_gpio: Add missing module license
  serial: ifx6x60: avoid uninitialized variable use
  tty/serial: at91: fix bad offset for UART timeout register
  tty/serial: at91: restore dynamic driver binding
  serial: 8250: Add hardware dependency to RT288X option
  TTY, devpts: document pty count limiting
  tty: goldfish: support platform_device with id -1
  drivers: tty: goldfish: Add device tree bindings
  ...
2016-03-17 13:53:25 -07:00
NeilBrown a6ab1e8126 sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
set.  In this case it aborts the queuing of the upcall.
However it has already taken a new counted reference on "h" and
doesn't "put" it, even though it frees the data structure holding the reference.

So let's delay the "cache_get" until we know we need it.

Fixes: f9e1aedc6c ("sunrpc/cache: remove races with queuing an upcall.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-17 14:57:15 -04:00
Linus Torvalds 70477371dc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.6:

  API:
   - Convert remaining crypto_hash users to shash or ahash, also convert
     blkcipher/ablkcipher users to skcipher.
   - Remove crypto_hash interface.
   - Remove crypto_pcomp interface.
   - Add crypto engine for async cipher drivers.
   - Add akcipher documentation.
   - Add skcipher documentation.

  Algorithms:
   - Rename crypto/crc32 to avoid name clash with lib/crc32.
   - Fix bug in keywrap where we zero the wrong pointer.

  Drivers:
   - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
   - Add PIC32 hwrng driver.
   - Support BCM6368 in bcm63xx hwrng driver.
   - Pack structs for 32-bit compat users in qat.
   - Use crypto engine in omap-aes.
   - Add support for sama5d2x SoCs in atmel-sha.
   - Make atmel-sha available again.
   - Make sahara hashing available again.
   - Make ccp hashing available again.
   - Make sha1-mb available again.
   - Add support for multiple devices in ccp.
   - Improve DMA performance in caam.
   - Add hashing support to rockchip"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: qat - remove redundant arbiter configuration
  crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
  crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
  crypto: qat - Change the definition of icp_qat_uof_regtype
  hwrng: exynos - use __maybe_unused to hide pm functions
  crypto: ccp - Add abstraction for device-specific calls
  crypto: ccp - CCP versioning support
  crypto: ccp - Support for multiple CCPs
  crypto: ccp - Remove check for x86 family and model
  crypto: ccp - memset request context to zero during import
  lib/mpi: use "static inline" instead of "extern inline"
  lib/mpi: avoid assembler warning
  hwrng: bcm63xx - fix non device tree compatibility
  crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
  crypto: qat - The AE id should be less than the maximal AE number
  lib/mpi: Endianness fix
  crypto: rockchip - add hash support for crypto engine in rk3288
  crypto: xts - fix compile errors
  crypto: doc - add skcipher API documentation
  crypto: doc - update AEAD AD handling
  ...
2016-03-17 11:22:54 -07:00
Ben Hutchings 793cf87de9 ethtool: Set cmd field in ETHTOOL_GLINKSETTINGS response to wrong nwords
When the ETHTOOL_GLINKSETTINGS implementation finds that userland is
using the wrong number of words of link mode bitmaps (or is trying to
find out the right numbers) it sets the cmd field to 0 in the response
structure.

This is inconsistent with the implementation of every other ethtool
command, so let's remove that inconsistency before it gets into a
stable release.

Fixes: 3f1ac7a700 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16 19:26:04 -04:00
Nicholas Mc Guire 489ce5f441 sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variant
local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for
the unlock/enable case, so replace the calls by the appropriate wrappers.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16 19:20:50 -04:00
Trond Myklebust 1425075e72 NFS: NFSoRDMA Client Side Changes
These patches include several bugfixes and cleanups for the NFSoRDMA client.
 This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
 from the recent workqueue swicchover.  These patches also switch xprtrdma to
 use the new CQ API
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJW5wl+AAoJENfLVL+wpUDrH7sP/i3lZ7n6Pr+Flrb/Z+ywjmEA
 mvZ0u3O3eviFxDqr82J/WL7fgDUGbBd9urtUu5xZGscc0HNs3LQ8izm6Dy3gmrVF
 MFh69jNGL5djmfymYMWRbdoKLuOw4V70EBtGnYqH7Bh7gwpiIl3EiVcBBJup/vKV
 rgvx6NnSGYpYPYpBFC4Ql4qZx7m5j4cxsThRScbu7wMMjDknls+7ZDM1B5mGO00+
 1vYObpYGXOXovGZyHAHspQityWp6jvUcEMnJzWbMUFDqxkOmmGK3t54MfvRXZiFE
 vUkgg5nGxhMejEIfMywuf6czKGfXc4HZT2yF9eSZeA4IW+7QkeeXeCcIANBDY6Ga
 oKXu4fgM6T7SnCpefwkXRhinwtEM04tGAlxo1X3UcKrMz7Q3di3/NtgaWijfL8gy
 9Nd1lt2kqI375h7+OZccURl33lnQBSjO4F2pt/pFk6wYwGh0F8co9bIp6QIEVQ0f
 N2l8KU9vgLofhrD9JzZeu1l3+TCdDU9YaJLSjbkZ71BTjNtkNdUcd9Tk+bMSxept
 mq7mNKe1oBHAGgR8+7zYBUKEt85rdpNovPoU+Hz/QKbV3zikGSPmL3e9gpEvhmH4
 MKD7Vs61Fi8volfv7wHmzZF8Tk68qc1MAZXSbgzVQxwF/uBjqEuSO+n4Kii/gfkG
 MJjlzqmqlO01O1fn2XHu
 =aHja
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFSoRDMA Client Side Changes

These patches include several bugfixes and cleanups for the NFSoRDMA client.
This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
from the recent workqueue swicchover.  These patches also switch xprtrdma to
use the new CQ API

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits)
  xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
  xprtrdma: Use an anonymous union in struct rpcrdma_mw
  xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
  xprtrdma: Serialize credit accounting again
  xprtrdma: Properly handle RDMA_ERROR replies
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  xprtrdma: Do not wait if ib_post_send() fails
  xprtrdma: Segment head and tail XDR buffers on page boundaries
  xprtrdma: Clean up dprintk format string containing a newline
  xprtrdma: Clean up physical_op_map()
  xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
2016-03-16 16:25:09 -04:00
Peter Zijlstra 25528213fe tags: Fix DEFINE_PER_CPU expansions
$ make tags
  GEN     tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

  scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
  scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'

The below cures them. All except the workqueue one are within reasonable
distance of the 80 char limit. TJ do you have any preference on how to
fix the wq one, or shall we just not care its too long?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
David S. Miller 1cdba55055 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS/OVS updates for net-next

The following patchset contains Netfilter/IPVS fixes and OVS NAT
support, more specifically this batch is composed of:

1) Fix a crash in ipset when performing a parallel flush/dump with
   set:list type, from Jozsef Kadlecsik.

2) Make sure NFACCT_FILTER_* netlink attributes are in place before
   accessing them, from Phil Turnbull.

3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
   helper, from Arnd Bergmann.

4) Add workaround to IPVS to reschedule existing connections to new
   destination server by dropping the packet and wait for retransmission
   of TCP syn packet, from Julian Anastasov.

5) Allow connection rescheduling in IPVS when in CLOSE state, also
   from Julian.

6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.

7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.

8) Check match/targetinfo netlink attribute size in nft_compat,
   patch from Florian Westphal.

9) Check for integer overflow on 32-bit systems in x_tables, from
   Florian Westphal.

Several patches from Jarno Rajahalme to prepare the introduction of
NAT support to OVS based on the Netfilter infrastructure:

10) Schedule IP_CT_NEW_REPLY definition for removal in
    nf_conntrack_common.h.

11) Simplify checksumming recalculation in nf_nat.

12) Add comments to the openvswitch conntrack code, from Jarno.

13) Update the CT state key only after successful nf_conntrack_in()
    invocation.

14) Find existing conntrack entry after upcall.

15) Handle NF_REPEAT case due to templates in nf_conntrack_in().

16) Call the conntrack helper functions once the conntrack has been
    confirmed.

17) And finally, add the NAT interface to OVS.

The batch closes with:

18) Cleanup to use spin_unlock_wait() instead of
    spin_lock()/spin_unlock(), from Nicholas Mc Guire.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 22:10:25 -04:00
Nicholas Mc Guire e39365be03 netfilter: nf_conntrack: consolidate lock/unlock into unlock_wait
The spin_lock()/spin_unlock() is synchronizing on the
nf_conntrack_locks_all_lock which is equivalent to
spin_unlock_wait() but the later should be more efficient.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-15 01:10:42 +01:00
Eric Dumazet acffb584cd net: diag: add a scheduling point in inet_diag_dump_icsk()
On loaded TCP servers, looking at millions of sockets can hold
cpu for many seconds, if the lookup condition is very narrow.

(eg : ss dst 1.2.3.4 )

Better add a cond_resched() to allow other processes to access
the cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 19:38:09 -04:00
Jarno Rajahalme 05752523e5 openvswitch: Interface with NAT.
Extend OVS conntrack interface to cover NAT.  New nested
OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
attributes, new (non-committed/non-confirmed) connections are mangled
according to the rest of the nested attributes.

The corresponding OVS userspace patch series includes test cases (in
tests/system-traffic.at) that also serve as example uses.

This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:29 +01:00
Jarno Rajahalme 28b6e0c1ac openvswitch: Delay conntrack helper call for new connections.
There is no need to help connections that are not confirmed, so we can
delay helping new connections to the time when they are confirmed.
This change is needed for NAT support, and having this as a separate
patch will make the following NAT patch a bit easier to review.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:29 +01:00
Jarno Rajahalme 5b6b929376 openvswitch: Handle NF_REPEAT in conntrack action.
Repeat the nf_conntrack_in() call when it returns NF_REPEAT.  This
avoids dropping a SYN packet re-opening an existing TCP connection.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:29 +01:00
Jarno Rajahalme 289f225349 openvswitch: Find existing conntrack entry after upcall.
Add a new function ovs_ct_find_existing() to find an existing
conntrack entry for which this packet was already applied to.  This is
only to be called when there is evidence that the packet was already
tracked and committed, but we lost the ct reference due to an
userspace upcall.

ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
hide the fact that the ct reference may have been lost due to an
upcall.  This allows ovs_ct_commit() to be simplified.

This patch is needed by later "openvswitch: Interface with NAT" patch,
as we need to be able to pass the packet through NAT using the
original ct reference also after the reference is lost after an
upcall.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:28 +01:00
Jarno Rajahalme 394e910e90 openvswitch: Update the CT state key only after nf_conntrack_in().
Only a successful nf_conntrack_in() call can effect a connection state
change, so it suffices to update the key only after the
nf_conntrack_in() returns.

This change is needed for the later NAT patches.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:28 +01:00
Jarno Rajahalme 9f13ded8d3 openvswitch: Add commentary to conntrack.c
This makes the code easier to understand and the following patches
more focused.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Jarno Rajahalme 264619055b netfilter: Allow calling into nat helper without skb_dst.
NAT checksum recalculation code assumes existence of skb_dst, which
becomes a problem for a later patch in the series ("openvswitch:
Interface with NAT.").  Simplify this by removing the check on
skb_dst, as the checksum will be dealt with later in the stack.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Jarno Rajahalme bfa3f9d7f3 netfilter: Remove IP_CT_NEW_REPLY definition.
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
not make sense.  This allows the definition of IP_CT_NUMBER to be
simplified as well.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14 23:47:27 +01:00
Vivien Didelot 6debb68a2d net: dsa: refine netdev event notifier
Rework the netdev event handler, similar to what the Mellanox Spectrum
driver does, to easily welcome more events later (for example
NETDEV_PRECHANGEUPPER) and use netdev helpers (such as
netif_is_bridge_master).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:32 -04:00
Vivien Didelot 16bfa7024e net: dsa: make port_bridge_leave return void
netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns
void, as well as del_nbp(). So there's no advantage to catch an eventual
error from the port_bridge_leave routine at the DSA level.

Make this routine void for the DSA layer and its existing drivers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:31 -04:00
Vivien Didelot 71327a4e7d net: dsa: rename port_*_bridge routines
Rename DSA port_join_bridge and port_leave_bridge routines to
respectively port_bridge_join and port_bridge_leave in order to respect
an implicit Port::Bridge namespace.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:31 -04:00
Florian Westphal 8626c56c82 bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict
Zefir Kurtisi reported kernel panic with an openwrt specific patch.
However, it turns out that mainline has a similar bug waiting to happen.

Once NF_HOOK() returns the skb is in undefined state and must not be
used.   Moreover, the okfn must consume the skb to support async
processing (NF_QUEUE).

Current okfn in this spot doesn't consume it and caller assumes that
NF_HOOK return value tells us if skb was freed or not, but thats wrong.

It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at
LOCAL_IN that returns STOLEN or NF_QUEUE verdicts.

Once we add NF_QUEUE support for nftables bridge this will break --
NF_QUEUE holds the skb for async processing, caller will erronoulsy
return RX_HANDLER_PASS and on reinject netfilter will access free'd skb.

Fix this by pushing skb up the stack in the okfn instead.

NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING
completely in this case but this is how its been forever so it seems
preferable to not change this.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:46:41 -04:00
David S. Miller 180a2c542c Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-03-12

Here's the last bluetooth-next pull request for the 4.6 kernel.

 - New USB ID for AR3012 in btusb
 - New BCM2E55 ACPI ID
 - Buffer overflow fix for the Add Advertising command
 - Support for a new Bluetooth LE limited privacy mode
 - Fix for firmware activation in btmrvl_sdio
 - Cleanups to mac802154 & 6lowpan code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:44:56 -04:00
Andrew Lunn 5bcbe0f35f phy: fixed: Fix removal of phys.
The fixed phys delete function simply removed the fixed phy from the
internal linked list and freed the memory. It however did not
unregister the associated phy device. This meant it was still possible
to find the phy device on the mdio bus.

Make fixed_phy_del() an internal function and add a
fixed_phy_unregister() to unregisters the phy device and then uses
fixed_phy_del() to free resources.

Modify DSA to use this new API function, so we don't leak phys.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn ec777e6b42 dsa: dsa: Fix freeing of fixed-phys from user ports.
All ports types can have a fixed PHY associated with it. Remove the
check which limits removal to only CPU and DSA ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn 3a44514ff9 dsa: Destroy fixed link phys after the phy has been disconnected
The phy is disconnected from the slave in dsa_slave_destroy(). Don't
destroy fixed link phys until after this, since there can be fixed
linked phys connected to ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:11 -04:00
Andrew Lunn b71be352f7 dsa: slave: Don't reference NULL pointer during phy_disconnect
When the phy is disconnected, the parent pointer to the netdev it was
attached to is set to NULL. The code then tries to suspend the phy,
but dsa_slave_fixed_link_update needs the parent pointer to determine
which switch the phy is connected to. So it dereferenced a NULL
pointer. Check for this condition.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:43:10 -04:00
Chuck Lever 2fa8f88d88 xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

Send completions were previously handled entirely in the completion
upcall handler (ie, deferring to a process context is unneeded).
Thus IB_POLL_SOFTIRQ is a direct replacement for the current
xprtrdma send code path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:08 -04:00
Chuck Lever c882a655b7 xprtrdma: Use an anonymous union in struct rpcrdma_mw
Clean up: Make code more readable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:06 -04:00
Chuck Lever 552bf22528 xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

xprtrdma's reply processing is already handled in a work queue, but
there is some initial order-dependent processing that is done in the
soft IRQ context before a work item is scheduled.

IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma
receive code path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:04 -04:00
Chuck Lever 23826c7aea xprtrdma: Serialize credit accounting again
Commit fe97b47cd6 ("xprtrdma: Use workqueue to process RPC/RDMA
replies") replaced the reply tasklet with a workqueue that allows
RPC replies to be processed in parallel. Thus the credit values in
RPC-over-RDMA replies can be applied in a different order than in
which the server sent them.

To fix this, revert commit eba8ff660b ("xprtrdma: Move credit
update to RPC reply handler"). Reverting is done by hand to
accommodate code changes that have occurred since then.

Fixes: fe97b47cd6 ("xprtrdma: Use workqueue to process . . .")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:56:01 -04:00
Chuck Lever 59aa1f9a3c xprtrdma: Properly handle RDMA_ERROR replies
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to
complete the waiting RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:59 -04:00
Chuck Lever b892a699ce xprtrdma: Do not wait if ib_post_send() fails
If ib_post_send() in ro_unmap_sync() fails, the WRs have not been
posted, no completions will fire, and wait_for_completion() will
wait forever. Skip the wait in that case.

To ensure the MRs are invalid, disconnect.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:55 -04:00
Chuck Lever 821c791a0b xprtrdma: Segment head and tail XDR buffers on page boundaries
A single memory allocation is used for the pair of buffers wherein
the RPC client builds an RPC call message and decodes its matching
reply. These buffers are sized based on the maximum possible size
of the RPC call and reply messages for the operation in progress.

This means that as the call buffer increases in size, the start of
the reply buffer is pushed farther into the memory allocation.

RPC requests are growing in size. It used to be that both the call
and reply buffers fit inside a single page.

But these days, thanks to NFSv4 (and especially security labels in
NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN,
for example, now requires a 6KB allocation for a pair of call and
reply buffers, and NFSv4 LOOKUP is not far behind.

As the maximum size of a call increases, the reply buffer is pushed
far enough into the buffer's memory allocation that a page boundary
can appear in the middle of it.

When the maximum possible reply size is larger than the client's
RDMA receive buffers (currently 1KB), the client has to register a
Reply chunk for the server to RDMA Write the reply into.

The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and
tail buffers would always be contained on a single page. It supplies
just one segment for the head and one for the tail.

FMR, for example, registers up to a page boundary (only a portion of
the reply buffer in the OPEN case above). But without additional
segments, it doesn't register the rest of the buffer.

When the server tries to write the OPEN reply, the RDMA Write fails
with a remote access error since the client registered only part of
the Reply chunk.

rpcrdma_convert_iovs() must split the XDR buffer into multiple
segments, each of which are guaranteed not to contain a page
boundary. That way fmr_op_map is given the proper number of segments
to register the whole reply buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:53 -04:00
Chuck Lever af0f16e825 xprtrdma: Clean up dprintk format string containing a newline
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:50 -04:00
Chuck Lever 4d97291b6f xprtrdma: Clean up physical_op_map()
physical_op_unmap{_sync} don't use mr_nsegs, so don't bother to set
it in physical_op_map.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:47 -04:00
Martin KaFai Lau a44d6eacda tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
Per RFC4898, they count segments sent/received
containing a positive length data segment (that includes
retransmission segments carrying data).  Unlike
tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
carrying no data (e.g. pure ack).

The patch also updates the segs_in in tcp_fastopen_add_skb()
so that segs_in >= data_segs_in property is kept.

Together with retransmission data, tcpi_data_segs_out
gives a better signal on the rxmit rate.

v6: Rebase on the latest net-next

v5: Eric pointed out that checking skb->len is still needed in
tcp_fastopen_add_skb() because skb can carry a FIN without data.
Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
helper is used.  Comment is added to the fastopen case to explain why
segs_in has to be reset and tcp_segs_in() has to be called before
__skb_pull().

v4: Add comment to the changes in tcp_fastopen_add_skb()
and also add remark on this case in the commit message.

v3: Add const modifier to the skb parameter in tcp_segs_in()

v2: Rework based on recent fix by Eric:
commit a9d99ce28e ("tcp: fix tcpi_segs_in after connection establishment")

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 14:55:26 -04:00
Arnd Bergmann 8e0cc8c326 net: caif: fix misleading indentation
gcc points out code that is not indented the way it is
interpreted:

net/caif/cfpkt_skbuff.c: In function 'cfpkt_setlen':
net/caif/cfpkt_skbuff.c:289:4: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
    return cfpkt_getlen(pkt);
    ^~~~~~
net/caif/cfpkt_skbuff.c:286:3: note: ...this 'else' clause, but it is not
   else
   ^~~~

It is clear from the context that not returning here would be
a bug, as we'd end up passing a negative length into a function
that takes a u16 length, so it is not missing curly braces
here, and I'm assuming that the indentation is the only part
that's wrong about it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 13:09:50 -04:00
Arnaldo Carvalho de Melo 34b88a68f2 net: Fix use after free in the recvmmsg exit path
The syzkaller fuzzer hit the following use-after-free:

  Call Trace:
   [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
   [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
   [<     inline     >] SYSC_recvmmsg net/socket.c:2281
   [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
   [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
  arch/x86/entry/entry_64.S:185

And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.

Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541 ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 12:41:49 -04:00
Richard Alpe 9bd160bfa2 tipc: make sure IPv6 header fits in skb headroom
Expand headroom further in order to be able to fit the larger IPv6
header. Prior to this patch this caused a skb under panic for certain
tipc packets when using IPv6 UDP bearer(s).

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 12:23:12 -04:00
Gregory CLEMENT 8cb2d8bf57 net: add a hardware buffer management helper API
This basic implementation allows to share code between driver using
hardware buffer management. As the code is hardware agnostic, there is
few helpers, most of the optimization brought by the an HW BM has to be
done at driver level.

Tested-by: Sebastian Careba <nitroshift@yahoo.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 12:19:46 -04:00
Alexander Duyck 0833482495 GSO/UDP: Use skb->len instead of udph->len to determine length of original skb
It is possible for tunnels to end up generating IP or IPv6 datagrams that
are larger than 64K and expecting to be segmented.  As such we need to deal
with length values greater than 64K.  In order to accommodate this we need
to update the code to work with a 32b length value instead of a 16b one.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 23:55:14 -04:00
Alexander Duyck 1e94082963 ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short
This patch updates csum_ipv6_magic so that it correctly recognizes that
protocol is a unsigned 8 bit value.

This will allow us to better understand what limitations may or may not be
present in how we handle the data.  For example there are a number of
places that call htonl on the protocol value.  This is likely not necessary
and can be replaced with a multiplication by ntohl(1) which will be
converted to a shift by the compiler.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 23:55:13 -04:00
David S. Miller fbd40ea018 ipv4: Don't do expensive useless work during inetdev destroy.
When an inetdev is destroyed, every address assigned to the interface
is removed.  And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:

1) Address promotion.  We are deleting all addresses, so there is no
   point in doing this.

2) A full nf conntrack table purge for every address.  We only need to
   do this once, as is already caught by the existing
   masq_dev_notifier so masq_inet_event() can skip this.

Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
2016-03-13 23:28:35 -04:00
David S. Miller f4fa6e6d88 NFC 4.6 pull request
This is a very small one this time, with only 5 patches.
 There are a couple of big items that could not be merged/finished
 on time.
 
 We have:
 
 - 2 LLCP fixes for a race and a potential OOM.
 - 2 cleanups for the pn544 and microread drivers.
 - 1 Maintainer addition for the s3fwrn5 driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW4ygDAAoJEIqAPN1PVmxKQ5gQAIN5MaT6gcu2/p1kd7FzPKyC
 KRb2EU907uVQObNLyZgU6liLYQ2prAvK7Dr9SuL10sl+zJkdU2TUuPfK+G4I50qQ
 nY7dHN94S0d1F6vBZVKFslh9+mrJKZCqhyN7WrCdXT2i7W5bYO8lw419nADImAq9
 fF4k7Ww+t9ZHGlzYun9WmNl8lGYz27w/sjjqnEbD8G+SZlkrqRNwccILP7rmOso2
 orGvzxouGiWmp4XcXno2X3ydfFVTl5LdVeaX6oyKxEBpcikz0QFJUmTcqqy0GqGo
 NeAGUip36MaJETXVgqKqeZERrST2Eminps3nTEwftd/ukOE+Sa8tX3spe2seN7mm
 o9ZBk8a0VChlhfGmIqnYZfksRyacEmVaLFV2W5EQdumL+G+jelCmpPgQKo0ZxEEX
 WTYULG5JiSIu34RjtTAP9tOPQ9XTV9hcArRYR355aki5EXGh/ntY1eVtV2nCSHjA
 vNZ0hpJNwfcx/0UlVULEJqxwDYYtl/QzJFTQO4l0zFYgf5SK9DXbOGDcZfoKgFf8
 7jzLn/hDzHNjCPocHDkXOq50H/pPHwdV65U4hPMnemcGVuVrNzHr5CyKBDgj7nXN
 NYP3S6fphg8W/DIlQFaTRAuBCqanZE5DIqFUgptSSdAmBLKagZKkaypUjCscb7Ee
 GinQKHm7YgZWG+YeAs4O
 =W1gM
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.6 pull request

This is a very small one this time, with only 5 patches.
There are a couple of big items that could not be merged/finished
on time.

We have:

- 2 LLCP fixes for a race and a potential OOM.
- 2 cleanups for the pn544 and microread drivers.
- 1 Maintainer addition for the s3fwrn5 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:43:01 -04:00
liping.zhang f3c986908c net: socket: use pr_info_once to tip the obsolete usage of PF_PACKET
There is no need to use the static variable here, pr_info_once is more
concise.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:37:50 -04:00
Jesper Dangaard Brouer 885eb0a516 net: adjust napi_consume_skb to handle non-NAPI callers
Some drivers reuse/share code paths that free SKBs between NAPI
and non-NAPI calls. Adjust napi_consume_skb to handle this
use-case.

Before, calls from netpoll (w/ IRQs disabled) was handled and
indicated with a budget zero indication.  Use the same zero
indication to handle calls not originating from NAPI/softirq.
Simply handled by using dev_consume_skb_any().

This adds an extra branch+call for the netpoll case (checking
in_irq() + irqs_disabled()), but that is okay as this is a slowpath.

Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:35:35 -04:00
Marcelo Ricardo Leitner cea8768f33 sctp: allow sctp_transmit_packet and others to use gfp
Currently sctp_sendmsg() triggers some calls that will allocate memory
with GFP_ATOMIC even when not necessary. In the case of
sctp_packet_transmit it will allocate a linear skb that will be used to
construct the packet and this may cause sends to fail due to ENOMEM more
often than anticipated specially with big MTUs.

This patch thus allows it to inherit gfp flags from upper calls so that
it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
similar. All others, like retransmits or flushes started from BH, are
still allocated using GFP_ATOMIC.

In netperf tests this didn't result in any performance drawbacks when
memory is not too fragmented and made it trigger ENOMEM way less often.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:29:07 -04:00
Samuel Gauthier 6f15cdbf8a ovs: allow nl 'flow set' to use ufid without flow key
When we want to change a flow using netlink, we have to identify it to
be able to perform a lookup. Both the flow key and unique flow ID
(ufid) are valid identifiers, but we always have to specify the flow
key in the netlink message. When both attributes are there, the ufid
is used. The flow key is used to validate the actions provided by
the userland.

This commit allows to use the ufid without having to provide the flow
key, as it is already done in the netlink 'flow get' and 'flow del'
path. The flow key remains mandatory when an action is provided.

Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:18:26 -04:00
Zhang Shengju 136ba622de netconf: add macro to represent all attributes
This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 21:54:44 -04:00
Xin Long 39d2adebf1 sctp: fix the transports round robin issue when init is retransmitted
prior to this patch, at the beginning if we have two paths in one assoc,
they may have the same params other than the last_time_heard, it will try
the paths like this:

1st cycle
  try trans1 fail.
  then trans2 is selected.(cause it's last_time_heard is after trans1).

2nd cycle:
  try  trans2 fail
  then trans2 is selected.(cause it's last_time_heard is after trans1).

3rd cycle:
  try  trans2 fail
  then trans2 is selected.(cause it's last_time_heard is after trans1).

....

trans1 will never have change to be selected, which is not what we expect.
we should keeping round robin all the paths if they are just added at the
beginning.

So at first every tranport's last_time_heard should be initialized 0, so
that we ensure they have the same value at the beginning, only by this,
all the transports could get equal chance to be selected.

Then for sctp_trans_elect_best, it should return the trans_next one when
*trans == *trans_next, so that we can try next if it fails,  but now it
always return trans. so we can fix it by exchanging these two params when
we calls sctp_trans_elect_tie().

Fixes: 4c47af4d5e ('net: sctp: rework multihoming retransmission path selection to rfc4960')
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 21:52:50 -04:00
David Howells dad8aff754 rxrpc: Replace all unsigned with unsigned int
Replace all "unsigned" types with "unsigned int" types.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 15:14:57 -04:00
Alexander Duyck c194cf93c1 gro: Defer clearing of flush bit in tunnel paths
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler.  Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE.  By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 15:01:00 -04:00
Florian Westphal d157bd7615 netfilter: x_tables: check for size overflow
Ben Hawkes says:
 integer overflow in xt_alloc_table_info, which on 32-bit systems can
 lead to small structure allocation and a copy_from_user based heap
 corruption.

Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-12 11:55:01 +01:00
Daniel Borkmann 4018ab1875 bpf: support flow label for bpf_skb_{set, get}_tunnel_key
This patch extends bpf_tunnel_key with a tunnel_label member, that maps
to ip_tunnel_key's label so underlying backends like vxlan and geneve
can propagate the label to udp_tunnel6_xmit_skb(), where it's being set
in the IPv6 header. It allows for having 20 more bits to encode/decode
flow related meta information programmatically. Tested with vxlan and
geneve.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:27 -05:00
Daniel Borkmann 134611446d ip_tunnel: add support for setting flow label via collect metadata
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:26 -05:00
Stephen Hemminger 4c656c13b2 bridge: allow zero ageing time
This fixes a regression in the bridge ageing time caused by:
commit c62987bbd8 ("bridge: push bridge setting ageing_time down to switchdev")

There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
  https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge

For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 14:58:58 -05:00
Amir Vadai 8208d21bf3 net/flower: Fix pointer cast
Cast pointer to unsigned long instead of u64, to fix compilation warning
on 32 bit arch, spotted by 0day build.

Fixes: 5b33f48 ("net/flower: Introduce hardware offload support")
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 12:04:37 -05:00
Johan Hedberg 6a0e78072c Bluetooth: Fix potential buffer overflow with Add Advertising
The Add Advertising command handler does the appropriate checks for
the AD and Scan Response data, however fails to take into account the
general length of the mgmt command itself, which could lead to
potential buffer overflows. This patch adds the necessary check that
the mgmt command length is consistent with the given ad and scan_rsp
lengths.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2016-03-11 16:28:43 +01:00
Johan Hedberg 6a19cc8c89 Bluetooth: Fix setting correct flags in AD
A recent change added MGMT_ADV_FLAG_DISCOV to the flags returned by
get_adv_instance_flags(), however failed to take into account limited
discoverable mode. This patch fixes the issue by setting the correct
discoverability flag in the AD data.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-11 16:28:24 +01:00
Florian Westphal f0716cd6eb netfilter: nft_compat: check match/targetinfo attr size
We copy according to ->target|matchsize, so check that the netlink attribute
(which can include padding and might be larger) contains enough data.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-11 11:37:56 +01:00
Pablo Neira Ayuso d387eaf51f Merge tag 'ipvs-fixes-for-v4.5' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:

====================
please consider these IPVS fixes for v4.5 or
if it is too late please consider them for v4.6.

* Arnd Bergman has corrected an error whereby the SIP persistence engine
  may incorrectly access protocol fields
* Julian Anastasov has corrected a problem reported by Jiri Bohac with the
  connection rescheduling mechanism added in 3.10 when new SYNs in
  connection to dead real server can be redirected to another real server.
* Marco Angaroni resolved a problem in the SIP persistence engine
  whereby the Call-ID could not be found if it was at the beginning of a
  SIP message.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-11 11:37:35 +01:00
Christoph Hellwig 7cf20fc624 net/9p: convert to new CQ API
Trivial conversion to the new RDMA CQ API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10 20:54:09 -05:00
Amir Vadai 8de2d793da net/flow_dissector: Make dissector_uses_key() and skb_flow_dissector_target() public
Will be used in a following patch to query if a key is being used, and
what it's value in the target object.

Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10 16:24:02 -05:00
Amir Vadai 5b33f48842 net/flower: Introduce hardware offload support
This patch is based on a patch made by John Fastabend.
It adds support for offloading cls_flower.
when NETIF_F_HW_TC is on:
  flags = 0       => Rule will be processed twice - by hardware, and if
                     still relevant, by software.
  flags = SKIP_HW => Rull will be processed by software only

If hardware fail/not capabale to apply the rule, operation will NOT
fail. Filter will be processed by SW only.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Suggested-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10 16:24:02 -05:00
Neil Armstrong 04761890a7 net: dsa: Fix cleanup resources upon module removal
The initial commit badly merged into the dsa_resume method instead
of the dsa_remove_dst method.
As consequence, the dst->master_netdev->dsa_ptr is not set to NULL on
removal and re-bind of the dsa device fails with error -17.

Fixes: b0dc635d92 ("net: dsa: cleanup resources upon module removal ")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-10 16:20:17 -05:00
Johan Hedberg 9c176cec42 Bluetooth: Increment management interface revision
Increment the mgmt revision due to the recently added limited
privacy mode.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:30 +01:00
Johan Hedberg 82a37adeed Bluetooth: Add support for limited privacy mode
Introduce a limited privacy mode indicated by value 0x02 to the mgmt
Set Privacy command.

With value 0x02 the kernel will use privacy mode with a resolvable
private address. In case the controller is bondable and discoverable
the identity address will be used.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:30 +01:00
Johan Hedberg d43efbd0d5 Bluetooth: Fix adding discoverable to adv instance flags
When lookup up the advertising instance flags for the default
advertising instance (0) the discoverable flag should be filled in
based on the HCI_DISCOVERABLE flag.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:29 +01:00
Johan Hedberg eec7a01dc8 Bluetooth: Move memset closer to where it's needed
Minor fix to not do the memset until the variable it clears is
actually used.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:29 +01:00
Alexander Aring 24c4a81d76 6lowpan: iphc: fix SAM/DAM bit comment
This patch fixes the comments for SAM/DAM value.

Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:29 +01:00
Alexander Aring 6aaf37b41a 6lowpan: debugfs: add missing static
This patch solves the sparse warning:

net/6lowpan/debugfs.c:164:30: warning: symbol 'lowpan_ctx_pfx_fops' was
not declared. Should it be static?
net/6lowpan/debugfs.c:241:30: warning: symbol 'lowpan_context_fops' was
not declared. Should it be static?

Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-03-10 19:51:29 +01:00
Pablo Neira Ayuso 793882bfc3 Merge branch 'master' of git://blackhole.kfki.hu/nf
Jozsef Kadlecsik says:

====================
Please apply the next patch against the nf tree:

- Deniz Eren reported that parallel flush/dump of list:set type of sets
  can lead to kernel crash. The bug was due to non-RCU compatible
  flushing, listing in the set type, fixed by me.

- Julia Lawall pointed out that IPSET_ATTR_ETHER netlink attribute
  length was not checked explicitly. The patch adds the missing
  checkings.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-10 17:34:00 +01:00
Willem de Bruijn 9ed988cd59 packet: validate variable length ll headers
Replace link layer header validation check ll_header_truncate with
more generic dev_validate_header.

Validation based on hard_header_len incorrectly drops valid packets
in variable length protocols, such as AX25. dev_validate_header
calls header_ops.validate for such protocols to ensure correctness
below hard_header_len.

See also http://comments.gmane.org/gmane.linux.network/401064

Fixes 9c7077622d ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 22:13:01 -05:00
Willem de Bruijn ea47781c26 ax25: add link layer header validation function
As variable length protocol, AX25 fails link layer header validation
tests based on a minimum length. header_ops.validate allows protocols
to validate headers that are shorter than hard_header_len. Implement
this callback for AX25.

See also http://comments.gmane.org/gmane.linux.network/401064

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 22:13:01 -05:00
Tom Herbert 29152a34f7 kcm: Add receive message timeout
This patch adds receive timeout for message assembly on the attached TCP
sockets. The timeout is set when a new messages is started and the whole
message has not been received by TCP (not in the receive queue). If the
completely message is subsequently received the timer is cancelled, if the
timer expires the RX side is aborted.

The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is
set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket
to KCM.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:15 -05:00
Tom Herbert 7ced95ef52 kcm: Add memory limit for receive message construction
Message assembly is performed on the TCP socket. This is logically
equivalent of an application that performs a peek on the socket to find
out how much memory is needed for a receive buffer. The receive socket
buffer also provides the maximum message size which is checked.

The receive algorithm is something like:

   1) Receive the first skbuf for a message (or skbufs if multiple are
      needed to determine message length).
   2) Check the message length against the number of bytes in the TCP
      receive queue (tcp_inq()).
	- If all the bytes of the message are in the queue (incluing the
	  skbuf received), then proceed with message assembly (it should
	  complete with the tcp_read_sock)
        - Else, mark the psock with the number of bytes needed to
	  complete the message.
   3) In TCP data ready function, if the psock indicates that we are
      waiting for the rest of the bytes of a messages, check the number
      of queued bytes against that.
        - If there are still not enough bytes for the message, just
	  return
        - Else, clear the waiting bytes and proceed to receive the
	  skbufs.  The message should now be received in one
	  tcp_read_sock

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:15 -05:00
Tom Herbert f29698fc6b kcm: Sendpage support
Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both
dgram and seqpacket ops.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:15 -05:00
Tom Herbert 91687355b9 kcm: Splice support
Implement kcm_splice_read. This is supported only for seqpacket.
Add kcm_seqpacket_ops and set splice read to kcm_splice_read.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:15 -05:00
Tom Herbert cd6e111bf5 kcm: Add statistics and proc interfaces
This patch adds various counters for KCM. These include counters for
messages and bytes received or sent, as well as counters for number of
attached/unattached TCP sockets and other error or edge events.

The statistics are exposed via a proc interface. /proc/net/kcm provides
statistics per KCM socket and per psock (attached TCP sockets).
/proc/net/kcm_stats provides aggregate statistics.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:14 -05:00
Tom Herbert ab7ac4eb98 kcm: Kernel Connection Multiplexor module
This module implements the Kernel Connection Multiplexor.

Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
With KCM an application can efficiently send and receive application
protocol messages over TCP using datagram sockets.

For more information see the included Documentation/networking/kcm.txt

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:14 -05:00
Tom Herbert 473bd239b8 tcp: Add tcp_inq to get available receive bytes on socket
Create a common kernel function to get the number of bytes available
on a TCP socket. This is based on code in INQ getsockopt and we now call
the function for that getsockopt.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:14 -05:00
Tom Herbert fa9835e52e net: Walk fragments in __skb_splice_bits
Add walking of fragments in __skb_splice_bits.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:14 -05:00
Tom Herbert f092276d85 net: Add MSG_BATCH flag
Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to
indicate that more messages will follow (i.e. a batch of messages is
being sent). This is similar to MSG_MORE except that the following
messages are not merged into one packet, they are sent individually.
sendmmsg is updated so that each contained message except for the
last one is marked as MSG_BATCH.

MSG_BATCH is a performance optimization in cases where a socket
implementation can benefit by transmitting packets in a batch.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:13 -05:00
Tom Herbert 28a94d8fb3 net: Allow MSG_EOR in each msghdr of sendmmsg
This patch allows setting MSG_EOR in each individual msghdr passed
in sendmmsg. This allows a sendmmsg to send multiple messages when
using SOCK_SEQPACKET.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:13 -05:00