Commit Graph

38216 Commits

Author SHA1 Message Date
Linus Torvalds 9090fdb9c9 Changes for 4.2-rc
- Mainly fix-ups for the various 4.2 items
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVpWPTAAoJELgmozMOVy/dRjkQALGpY+m0Q+dfS4geU+JvvH0/
 tTNJevWA1lu5xIgzk8aW+RgNOt+IQV2K6XnKOdKWcdE2wg9Yg9/8Y7atxeNSLoB6
 mdUfstmTHoDFRXea1wog45QVVv9ABqSXIOmFd/PQSf2pEHvwgmVaxO2KW5n0n71H
 0G9nw28jx+Igd1IjX3UezbB4Rm5jeG6exQEM5KKQ/l/cD2Pt/DIr9OmTmkMhj4Ft
 4ppC+JsDHZMT2mQljEB1UL6Va1BGgULn4JwZviXRGgTEFiS+rtUkWXy9CVeNIV+A
 bo6aHEBlO8zbj9xg0POkHGBP2XCkDpsc5EP6/zon30MLpOTpnnTp2zq/bhGpyr5w
 ytH1x6V8Od1Or9BrHygy1yLGaqaVthIuZMReLq0Bu26px8mTECdu8bUjwxEvbvep
 u51vQ75h99td5XtXBHULcd/I8mUb3JqbEHqig0ChLOcnOMh72KLql8qfhLvzrYSv
 gFMvnqUZWq4WYET8cKcWypj5+cIME4objOim6T/TG+zTCmgbPnhwBsgvujLhfhDy
 JhUjUfS+31S99I4smL7ceKQpfA/7awLuZf/C20zdLZO2wXnx7VmbNlhQA3mvssyt
 ZOXeLkLGcXDIT3egNj5VWQR/KlVtvUS9FNVslxqo4ItyZZwJtbHTwRAgATp3XRwD
 Po7E7bWoUQqpanhgiX4Q
 =N/5G
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "Mainly fix-ups for the various 4.2 items"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits)
  IB/core: Destroy ocrdma_dev_id IDR on module exit
  IB/core: Destroy multcast_idr on module exit
  IB/mlx4: Optimize do_slave_init
  IB/mlx4: Fix memory leak in do_slave_init
  IB/mlx4: Optimize freeing of items on error unwind
  IB/mlx4: Fix use of flow-counters for process_mad
  IB/ipath: Convert use of __constant_<foo> to <foo>
  IB/ipoib: Set MTU to max allowed by mode when mode changes
  IB/ipoib: Scatter-Gather support in connected mode
  IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES
  IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush
  IB/ucma: Fix lockdep warning in ucma_lock_files
  rds: rds_ib_device.refcount overflow
  RDMA/nes: Fix for incorrect recording of the MAC address
  RDMA/nes: Fix for resolving the neigh
  RDMA/core: Fixes for port mapper client registration
  IB/IPoIB: Fix bad error flow in ipoib_add_port()
  IB/mlx4: Do not attemp to report HCA clock offset on VFs
  IB/cm: Do not queue work to a device that's going away
  IB/srp: Avoid using uninitialized variable
  ...
2015-07-15 17:03:03 -07:00
Wengang Wang 4fabb59449 rds: rds_ib_device.refcount overflow
Fixes: 3e0249f9c0 ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14 13:20:11 -04:00
Linus Torvalds f760b87f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Missing list head init in bluetooth hidp session creation, from Tedd
    Ho-Jeong An.

 2) Don't leak SKB in bridge netfilter error paths, from Florian
    Westphal.

 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
    Grall.

 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
    Baechle.

 5) Fix race between rhashtable resize events and table walks, from Phil
    Sutter.

 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
    Daniel Borkmann.

 7) Missing security layer socket state initialization in tipc code,
    from Stephen Smalley.

 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
    Denys Vlasenko.

 9) Missing minor_idr destroy on module unload on macvtap driver, from
    Johannes Thumshirn.

10) Various pktgen kernel thread races, from Oleg Nesterov.

11) Fix races that can cause packets to be processed in the backlog even
    after a device attached to that SKB has been fully unregistered.
    From Julian Anastasov.

12) bcmgenet driver doesn't account packet drops vs.  errors properly,
    fix from Petri Gynther.

13) Array index validation and off by one fix in DSA layer from Florian
    Fainelli

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
  can: replace timestamp as unique skb attribute
  ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
  can: c_can: Fix default pinmux glitch at init
  can: rcar_can: unify error messages
  can: rcar_can: print request_irq() error code
  can: rcar_can: fix typo in error message
  can: rcar_can: print signed IRQ #
  can: rcar_can: fix IRQ check
  net: dsa: Fix off-by-one in switch address parsing
  net: dsa: Test array index before use
  net: switchdev: don't abort unsupported operations
  net: bcmgenet: fix accounting of packet drops vs errors
  cdc_ncm: update specs URL
  Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
  net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
  bridge: mdb: allow the user to delete mdb entry if there's a querier
  net: call rcu_read_lock early in process_backlog
  net: do not process device backlog during unregistration
  bridge: fix potential crash in __netdev_pick_tx()
  net: axienet: Fix devm_ioremap_resource return value check
  ...
2015-07-13 11:18:25 -07:00
David S. Miller cee9f6d018 linux-can-fixes-for-4.2-20150712
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCgAGBQJVorxbAAoJEP5prqPJtc/H5zgH/2nvkmT3aQ1gBKdU8L+Ten5D
 LIKYyDJ67agNoMfEC/5xWOm7P3X8Qi8cGc326/9AZE22QLE5x6fMCq/SmEC4MA25
 GKu2ocwosdPVhIDQOY33IYZHxxtV8UZjt5KdNbQnNO6+iqE6tX5CbgueMlcWusry
 WmPCOlezTqHydXo0TYYLPjmqJ66RlrQMWYKvcB0SaxLYQfkBGbkW+fei3wdE4kW1
 t7cocRj6Ievv59wCeF+G+fpviTVQVR5bknGA0J3lku9onATOYfdArEAD9FRajygG
 KhA/eNr0YEA40VO/baUInuEAJ/YvDqdk9LoyJ1DOXgUv1ysxYSxu/44G/IcZsOU=
 =+zVD
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.2-20150712' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2015-07-12

this is a pull request of 8 patchs for net/master.

Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ
check and several info and error messages. There are two patches by J.D.
Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree,
which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better
approach to make the CAN skbs unique, the timestamp is replaced by a counter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-12 22:24:01 -07:00
Oliver Hartkopp d3b58c47d3 can: replace timestamp as unique skb attribute
Commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.

Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.

This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.

This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.

Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-07-12 21:13:22 +02:00
Florian Fainelli c8cf89f73f net: dsa: Fix off-by-one in switch address parsing
cd->sw_addr is used as a MDIO bus address, which cannot exceed
PHY_MAX_ADDR (32), our check was off-by-one.

Fixes: 5e95329b70 ("dsa: add device tree bindings to register DSA switches")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-11 23:25:16 -07:00
Florian Fainelli 8f5063e97f net: dsa: Test array index before use
port_index is used an index into an array, and this information comes
from Device Tree, make sure that port_index is not equal to the array
size before using it. Move the check against port_index earlier in the
loop.

Fixes: 5e95329b701c: ("dsa: add device tree bindings to register DSA switches")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-11 23:25:16 -07:00
Vivien Didelot 2ee94014d9 net: switchdev: don't abort unsupported operations
There is no need to abort attribute setting or object addition, if the
prepare phase returned operation not supported.

Thus, abort these two transactions only if the error is not -EOPNOTSUPP.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-11 21:29:55 -07:00
Phil Sutter 8220ea2324 net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
Reconsidering my commit 20462155 "net: inet_diag: export IPV6_V6ONLY
sockopt", I am not happy with the limitations it causes for socket
analysing code in userspace. Exporting the value only if it is set makes
it hard for userspace to decide whether the option is not set or the
kernel does not support exporting the option at all.

>From an auditor's perspective, the interesting question for listening
AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is
the unexpected case. This patch allows to answer this question reliably.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-10 23:25:24 -07:00
Satish Ashok 51ed7f3e7d bridge: mdb: allow the user to delete mdb entry if there's a querier
Until now when a querier was present static entries couldn't be deleted.
Fix this and allow the user to manipulate the mdb with or without a
querier.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-10 18:18:00 -07:00
Julian Anastasov 2c17d27c36 net: call rcu_read_lock early in process_backlog
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:

CPU 1                  CPU 2
                       skb->dev: no reference
                       process_backlog:__skb_dequeue
                       process_backlog:local_irq_enable

on_each_cpu for
flush_backlog =>       IPI(hardirq): flush_backlog
                       - packet not found in backlog

                       CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections

netdev_run_todo,
rcu_barrier: no
ongoing callbacks
                       __netif_receive_skb_core:rcu_read_lock
                       - too late
free dev
                       process packet for freed dev

Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-10 18:16:36 -07:00
Julian Anastasov e9e4dd3267 net: do not process device backlog during unregistration
commit 381c759d99 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.

Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.

Thanks to Eric W. Biederman for the test script and his
valuable feedback!

Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-10 18:16:36 -07:00
Eric Dumazet a7d35f9d73 bridge: fix potential crash in __netdev_pick_tx()
Commit c29390c6df ("xps: must clear sender_cpu before forwarding")
fixed an issue in normal forward path, caused by sender_cpu & napi_id
skb fields being an union.

Bridge is another point where skb can be forwarded, so we need
the same cure.

Bug triggers if packet was received on a NIC using skb_mark_napi_id()

Fixes: 2bd82484bb ("xps: fix xps for stacked devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Bob Liu <bob.liu@oracle.com>
Tested-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 22:48:42 -07:00
Oleg Nesterov 1fbe4b46ca net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()
pktgen_thread_worker() doesn't need to wait for kthread_stop(), it
can simply exit. Just pktgen_create_thread() and pg_net_exit() should
do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is
fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:05:32 -07:00
Oleg Nesterov fecdf8be2d net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()
pktgen_thread_worker() is obviously racy, kthread_stop() can come
between the kthread_should_stop() check and set_current_state().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:05:32 -07:00
Ilya Dryomov c44bd69c0c libceph: treat sockaddr_storage with uninitialized family as blank
addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer.  As it is,
we never learn our address and always send out a blank one.  This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.

While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2015-07-09 20:30:34 +03:00
Ilya Dryomov 757856d2b9 libceph: enable ceph in a non-default network namespace
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2015-07-09 20:30:34 +03:00
David S. Miller ace15bbb39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree. This batch
mostly comes with patches to address fallout from the previous merge window
cycle, they are:

1) Use entry->state.hook_list from nf_queue() instead of the global nf_hooks
   which is not valid when used from NFPROTO_NETDEV, this should cause no
   problems though since we have no userspace queueing for that family, but
   let's fix this now for the sake of correctness. Patch from Eric W. Biederman.

2) Fix compilation breakage in bridge netfilter if CONFIG_NF_DEFRAG_IPV4 is not
   set, from Bernhard Thaler.

3) Use percpu jumpstack in arptables too, now that there's a single copy of the
   rule blob we can't store the return address there anymore. Patch from
   Florian Westphal.

4) Fix a skb leak in the xmit path of bridge netfilter, problem there since
   2.6.37 although it should be not possible to hit invalid traffic there, also
   from Florian.

5) Eric Leblond reports that when loading a large ruleset with many missing
   modules after a fresh boot, nf_tables can take long time commit it. Fix this
   by processing the full batch until the end, even on missing modules, then
   abort only once and restart processing.

6) Add bridge netfilter files to the MAINTAINER files.

7) Fix a net_device refcount leak in the new IPV6 bridge netfilter code, from
   Julien Grall.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 00:03:10 -07:00
Andy Gospodarek 974d7af5fc ipv4: add support for linkdown sysctl to netconf
This kernel patch exports the value of the new
ignore_routes_with_linkdown via netconf.

v2: changes to notify userspace via netlink when sysctl values change
and proposed for 'net' since this could be considered a bugfix

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 23:34:53 -07:00
Nikolay Aleksandrov f1158b74e5 bridge: mdb: zero out the local br_ip variable before use
Since commit b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
there's a check in br_ip_equal() for a matching vlan id, but the mdb
functions were not modified to use (or at least zero it) so when an
entry was added it would have a garbage vlan id (from the local br_ip
variable in __br_mdb_add/del) and this would prevent it from being
matched and also deleted. So zero out the whole local ip var to protect
ourselves from future changes and also to fix the current bug, since
there's no vlan id support in the mdb uapi - use always vlan id 0.
Example before patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
RTNETLINK answers: Invalid argument

After patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 16:10:40 -07:00
Stephen Smalley fdd75ea8df net/tipc: initialize security state for new connection socket
Calling connect() with an AF_TIPC socket would trigger a series
of error messages from SELinux along the lines of:
SELinux: Invalid class 0
type=AVC msg=audit(1434126658.487:34500): avc:  denied  { <unprintable> }
  for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
  tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
  permissive=0

This was due to a failure to initialize the security state of the new
connection sock by the tipc code, leaving it with junk in the security
class field and an unlabeled secid.  Add a call to security_sk_clone()
to inherit the security state from the parent socket.

Reported-by: Tim Shearer <tim.shearer@overturenetworks.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 16:08:23 -07:00
Timo Teräs fc24f2b209 ip_tunnel: fix ipv4 pmtu check to honor inner ip header df
Frag needed should be sent only if the inner header asked
to not fragment. Currently fragmentation is broken if the
tunnel has df set, but df was not asked in the original
packet. The tunnel's df needs to be still checked to update
internally the pmtu cache.

Commit 23a3647bc4 broke it, and this commit fixes
the ipv4 df check back to the way it was.

Fixes: 23a3647bc4 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 16:03:09 -07:00
Daniel Borkmann 4f7d2cdfdd rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver
Jason Gunthorpe reported that since commit c02db8c629 ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].

Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.

Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.

Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).

Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c629 ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 16:01:52 -07:00
Nicolas Dichtel 95ec655bc4 Revert "dev: set iflink to 0 for virtual interfaces"
This reverts commit e1622baf54.

The side effect of this commit is to add a '@NONE' after each virtual
interface name with a 'ip link'. It may break existing scripts.

Reported-by: Olivier Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 15:52:33 -07:00
Eric Dumazet d339727c2b net: graceful exit from netif_alloc_netdev_queues()
User space can crash kernel with

ip link add ifb10 numtxqueues 100000 type ifb

We must replace a BUG_ON() by proper test and return -EINVAL for
crazy values.

Fixes: 60877a32bc ("net: allow large number of tx queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 15:46:17 -07:00
Satish Ashok f7e2965db1 bridge: mdb: start delete timer for temp static entries
Start the delete timer when adding temp static entries so they can expire.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 14:52:58 -07:00
Eric Dumazet 32f675bbc9 net_sched: gen_estimator: extend pps limit
rate estimators are limited to 4 Mpps, which was fine years ago, but
too small with current hardware generation.

Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit.

On 64bit arch, use an "unsigned long" for temp storage and remove limit.
(We do not expect 32bit arches to be able to reach this point)

Tested:

tc -s -d filter sh dev eth0 parent ffff:

filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
  match 07000000/ff000000 at 12
	action order 1: gact action drop
	 random type none pass val 0
	 index 1 ref 1 bind 1 installed 166 sec
 	Action statistics:
	Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 requeues 0)
	rate 4067Mbit 11053596pps backlog 0b 0p requeues 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:59:20 -07:00
Julien Grall 86e8971800 netfilter: bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6
The commit efb6de9b4b "netfilter: bridge:
forward IPv6 fragmented packets" introduced a new function
br_validate_ipv6 which take a reference on the inet6 device. Although,
the reference is not released at the end.

This will result to the impossibility to destroy any netdevice using
ipv6 and bridge.

It's possible to directly retrieve the inet6 device without taking a
reference as all netfilter hooks are protected by rcu_read_lock via
nf_hook_slow.

Spotted while trying to destroy a Xen guest on the upstream Linux:
"unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1"

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Bernhard Thaler <bernhard.thaler@wvnet.at>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: fw@strlen.de
Cc: ian.campbell@citrix.com
Cc: wei.liu2@citrix.com
Cc: Bob Liu <bob.liu@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-08 11:02:16 +02:00
Linus Torvalds 1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds 9b284cbdb5 bluetooth: fix list handling
Commit 835a6a2f86 ("Bluetooth: Stop sabotaging list poisoning")
thought that the code was sabotaging the list poisoning when NULL'ing
out the list pointers and removed it.

But what was going on was that the bluetooth code was using NULL
pointers for the list as a way to mark it empty, and that commit just
broke it (and replaced the test with NULL with a "list_empty()" test on
a uninitialized list instead, breaking things even further).

So fix it all up to use the regular and real list_empty() handling
(which does not use NULL, but a pointer to itself), also making sure to
initialize the list properly (the previous NULL case was initialized
implicitly by the session being allocated with kzalloc())

This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
An.

[ I would normally expect to get this through the bt tree, but I'm going
  to release -rc1, so I'm just committing this directly   - Linus ]

Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Original-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Original-by: Marcel Holtmann <marcel@holtmann.org>:
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-04 19:11:33 -07:00
Al Viro 0f1db7dee2 9p: cope with bogus responses from server in p9_client_{read,write}
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to.
2015-07-04 16:17:39 -04:00
Al Viro 67e808fbb0 p9_client_write(): avoid double p9_free_req()
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04 16:11:05 -04:00
Al Viro a84b69cb6e 9p: forgetting to cancel request on interrupted zero-copy RPC
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.

Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04 16:04:19 -04:00
Angga 4c938d22c8 ipv6: Make MLD packets to only be processed locally
Before commit daad151263 ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.

Make MLD packet only processed locally.

Fixes: daad151263 ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-03 09:52:38 -07:00
Markus Elfring 92b80eb33c netlink: Delete an unnecessary check before the function call "module_put"
The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-03 09:27:43 -07:00
Markus Elfring f6e1c91666 net-RDS: Delete an unnecessary check before the function call "module_put"
The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-03 09:27:42 -07:00
Markus Elfring 87775312a8 net-ipv6: Delete an unnecessary check before the function call "free_percpu"
The free_percpu() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-03 09:27:42 -07:00
Nikolay Aleksandrov 462e1ead92 bridge: vlan: fix usage of vlan 0 and 4095 again
Vlan ids 0 and 4095 were disallowed by commit:
8adff41c3d ("bridge: Don't use VID 0 and 4095 in vlan filtering")
but then the check was removed when vlan ranges were introduced by:
bdced7ef78 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
So reintroduce the vlan range check.
Before patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
(succeeds)
After Patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
RTNETLINK answers: Invalid argument

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: bdced7ef78 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests")
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-02 12:19:17 -07:00
David S. Miller c4555d16d9 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-07-02

A couple of regressions crept in because of a patch to use proper list
APIs rather than manually reading & writing the next/prev pointers
(commit 835a6a2f86). Turns out this was
masking a few bugs: a missing INIT_LIST_HEAD() call and incorrectly
using list_del() rather than list_del_init(). The two patches in this
set fix these, and it'd be nice they could still make it to 4.2-rc1 to
avoid new bug reports from users.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-02 12:17:11 -07:00
Linus Torvalds 0c76c6ba24 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "We have a pile of bug fixes from Ilya, including a few patches that
  sync up the CRUSH code with the latest from userspace.

  There is also a long series from Zheng that fixes various issues with
  snapshots, inline data, and directory fsync, some simplification and
  improvement in the cap release code, and a rework of the caching of
  directory contents.

  To top it off there are a few small fixes and cleanups from Benoit and
  Hong"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  rbd: use GFP_NOIO in rbd_obj_request_create()
  crush: fix a bug in tree bucket decode
  libceph: Fix ceph_tcp_sendpage()'s more boolean usage
  libceph: Remove spurious kunmap() of the zero page
  rbd: queue_depth map option
  rbd: store rbd_options in rbd_device
  rbd: terminate rbd_opts_tokens with Opt_err
  ceph: fix ceph_writepages_start()
  rbd: bump queue_max_segments
  ceph: rework dcache readdir
  crush: sync up with userspace
  crush: fix crash from invalid 'take' argument
  ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
  ceph: pre-allocate data structure that tracks caps flushing
  ceph: re-send flushing caps (which are revoked) in reconnect stage
  ceph: send TID of the oldest pending caps flush to MDS
  ceph: track pending caps flushing globally
  ceph: track pending caps flushing accurately
  libceph: fix wrong name "Ceph filesystem for Linux"
  ceph: fix directory fsync
  ...
2015-07-02 11:35:00 -07:00
Linus Torvalds 8688d9540c NFS client updates for Linux 4.2
Highlights include:
 
 Stable patches:
 - Fix a crash in the NFSv4 file locking code.
 - Fix an fsync() regression, where we were failing to retry I/O in some
   circumstances.
 - Fix an infinite loop in NFSv4.0 OPEN stateid recovery
 - Fix a memory leak when an attempted pnfs fails.
 - Fix a memory leak in the backchannel code
 - Large hostnames were not supported correctly in NFSv4.1
 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
 - Fix a couple of credential issues in pNFS/flexfiles
 
 Bugfixes + cleanups:
 - Open flag sanity checks in the NFSv4 atomic open codepath
 - More NFSv4 delegation related bugfixes
 - Various NFSv4.1 backchannel bugfixes and cleanups
 - Fix the NFS swap socket code
 - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
 - Fix a UDP transport deadlock issue
 
 Features:
 - More RDMA client transport improvements
 - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVlWQgAAoJEGcL54qWCgDyXtcP/2Y3HJ9xu5qU3Bo/jzCAw4E1
 jPPMSFAz4kqy/LGoslyc1cNDEiKGzJYWU8TtCGI3KAyNxb6n3pT1mEE1tvIsSdis
 D8bpV13M452PPpZYrBawIf4+OuohXmuYHpFiVNSpLbH3Uo7dthvFFnbqCGaGlnqY
 rXYZHAnx637OGBcJsT4AXCUz12ILvxMYRnqwW6Xn+j9JmwR1coQX3v8W8e7SMf6i
 J+zOny7Uetjrg1U9C9uQB6ZvIoxUMo9QOVmtGCwsBl8lM3fLmzaQfcUf9fm76pMT
 yTrKJs4jBLvVf00bRHFDv9EHWCy97oqCkeQEw1EY2lnxp/lmM5SiI4zQqjbf0QTW
 5VQScT1MK6xwHoUbuI/sYdXXR8KGDVT1xCFFHUNcg69CvgqdgWslPQY7xLJMvUJZ
 vBWfWDd8ppdCw2ZVX4ae/bnhfc+/mVh4wRPF7tgVAjT0pobBV9xMOeMkF4mo76Wa
 pvo/nTRMt68hpESVSvq9dYEMVhy5haqFhPrSbyAGOpT4SE2V3RCCZQfhu15TMKdW
 BdvItG+mdAVPbIHqhx7vRdAudcOEZKyxbFA+l3E5FyCAXLV7XS3M8CEl3P1w7gmm
 Ccr8DW9abKFJf1RAKdX3stexIoJLGTwciSMR5smsbup/xNcx/fRgx2f1w31JMPxb
 kG3Izfk25w9uGSsbR39D
 =AREr
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable patches:
   - Fix a crash in the NFSv4 file locking code.
   - Fix an fsync() regression, where we were failing to retry I/O in
     some circumstances.
   - Fix an infinite loop in NFSv4.0 OPEN stateid recovery
   - Fix a memory leak when an attempted pnfs fails.
   - Fix a memory leak in the backchannel code
   - Large hostnames were not supported correctly in NFSv4.1
   - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
   - Fix a couple of credential issues in pNFS/flexfiles

  Bugfixes + cleanups:
   - Open flag sanity checks in the NFSv4 atomic open codepath
   - More NFSv4 delegation related bugfixes
   - Various NFSv4.1 backchannel bugfixes and cleanups
   - Fix the NFS swap socket code
   - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
   - Fix a UDP transport deadlock issue

  Features:
   - More RDMA client transport improvements
   - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles"

* tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits)
  nfs: Remove invalid tk_pid from debug message
  nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh
  nfs: Drop bad comment in nfs41_walk_client_list()
  nfs: Remove unneeded micro checking of CONFIG_PROC_FS
  nfs: Don't setting FILE_CREATED flags always
  nfs: Use remove_proc_subtree() instead remove_proc_entry()
  nfs: Remove unused argument in nfs_server_set_fsinfo()
  nfs: Fix a memory leak when meeting an unsupported state protect
  nfs: take extra reference to fl->fl_file when running a LOCKU operation
  NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
  NFSv4.2: LAYOUTSTATS is optional to implement
  NFSv4.2: Fix up a decoding error in layoutstats
  pNFS/flexfiles: Fix the reset of struct pgio_header when resending
  pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
  pnfs/flexfiles: protect ktime manipulation with mirror lock
  nfs: provide pnfs_report_layoutstat when NFS42 is disabled
  nfs: verify open flags before allowing open
  nfs: always update creds in mirror, even when we have an already connected ds
  nfs: fix potential credential leak in ff_layout_update_mirror_cred
  pnfs/flexfiles: report layoutstat regularly
  ...
2015-07-02 11:32:23 -07:00
Linus Torvalds 9d90f03531 Replace module_init with appropriate alternate initcall in non modules.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkO6nAAoJEOvOhAQsB9HWpHMP/Aknc+lmX2dZeIn96gdkP+UK
 1qL24C5oq2sm/9yTZLdoXbyApLaaTbAJHS9O4kolaOU6uOs3JrgtXqL1697PVp1R
 qV4f4DOzXmmEHaE2oO21afAri3tXIVQNqA2NQl2TmKfwz0Atu01Vj5RJPu/ZOBPl
 dONXcFnE6nO2p7AEFRP/GfDZwkng4xALyZPhwL7tJDAeGaBpqG/n2hCuq+Szn9g8
 wjTFACBdad/mRrYsL6YsWZ1e+LKI8vsArQbdPTam+jPaEUlK7yjFReFKCJVzL2JP
 xfQoTcCgFztzTUV0JTGR9sqeYA3WH9AkJOFDxNE/eIili4xiTh789WbEpHLVECSX
 1LsW025I3DkRWBPT4L+9ZP805ha71kNXDFc5N3XJkzrCYaFvD2BgsUzxi6FXj7aC
 9lEVKt6xO04FFG5SwTKnO0f8PEhPemZH3BDnVvjBDWQYLjUcPSNz7bfyHUhif0G5
 ulOGVB0ncJJF9iP8PyZs1RA/F8kKxXWnhYMIHzvl0f0vLUA7rAKsACnhBgq8s9ZQ
 uM5YjzU91Z/4pe5C2E5MmQIZ84b79ZPsee1lF0GJdjK5W3PDvnCjIdXfQ5M/f3S8
 76cssXWNhS78/P+19YqirLeb0u7Zw0jf73m9t9ywRgcByWfY5ZUDm0DFpQnWKkoR
 QY/aFO/yHKTO3VHj8Ril
 =KDJO
 -----END PGP SIGNATURE-----

Merge tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull module_init replacement part two from Paul Gortmaker:
 "Replace module_init with appropriate alternate initcall in non
  modules.

  This series converts non-modular code that is using the module_init()
  call to hook itself into the system to instead use one of our
  alternate priority initcalls.

  Unlike the previous series that used device_initcall and hence was a
  runtime no-op, these commits change to one of the alternate initcalls,
  because (a) we have them and (b) it seems like the right thing to do.

  For example, it would seem logical to use arch_initcall for arch
  specific setup code and fs_initcall for filesystem setup code.

  This does mean however, that changes in the init ordering will be
  taking place, and so there is a small risk that some kind of implicit
  init ordering issue may lie uncovered.  But I think it is still better
  to give these ones sensible priorities than to just assign them all to
  device_initcall in order to exactly preserve the old ordering.

  Thad said, we have already made similar changes in core kernel code in
  commit c96d6660dc ("kernel: audit/fix non-modular users of
  module_init in core code") without any regressions reported, so this
  type of change isn't without precedent.  It has also got the same
  local testing and linux-next coverage as all the other pull requests
  that I'm sending for this merge window have got.

  Once again, there is an unused module_exit function removal that shows
  up as an outlier upon casual inspection of the diffstat"

* tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
  x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
  mm/page_owner.c: use late_initcall to hook in enabling
  lib/list_sort: use late_initcall to hook in self tests
  arm: use subsys_initcall in non-modular pl320 IPC code
  powerpc: don't use module_init for non-modular core hugetlb code
  powerpc: use subsys_initcall for Freescale Local Bus
  x86: don't use module_init for non-modular core bootflag code
  netfilter: don't use module_init/exit in core IPV4 code
  fs/notify: don't use module_init for non-modular inotify_user code
  mm: replace module_init usages with subsys_initcall in nommu.c
2015-07-02 10:36:29 -07:00
Pablo Neira Ayuso 6742b9e310 netfilter: nfnetlink: keep going batch handling on missing modules
After a fresh boot with no modules in place at all and a large rulesets, the
existing nfnetlink_rcv_batch() funcion can take long time to commit the ruleset
due to the many abort path. This is specifically a problem for the existing
client of this code, ie. nf_tables, since it results in several
synchronize_rcu() call in a row.

This patch changes the policy to keep full batch processing on missing modules
errors so we abort only once.

Reported-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-02 17:59:33 +02:00
Florian Westphal dd302b59bd netfilter: bridge: don't leak skb in error paths
br_nf_dev_queue_xmit must free skb in its error path.
NF_DROP is misleading -- its an okfn, not a netfilter hook.

Fixes: 462fb2af97 ("bridge : Sanitize skb before it enters the IP stack")
Fixes: efb6de9b4b ("netfilter: bridge: forward IPv6 fragmented packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-02 17:59:26 +02:00
Florian Westphal 3bd229976f netfilter: arptables: use percpu jumpstack
commit 482cfc3185 ("netfilter: xtables: avoid percpu ruleset duplication")

Unlike ip and ip6tables, arp tables were never converted to use the percpu
jump stack.

It still uses the rule blob to store return address, which isn't safe
anymore since we now share this blob among all processors.

Because there is no TEE support for arptables, we don't need to cope
with reentrancy, so we can use loocal variable to hold stack offset.

Fixes: 482cfc3185 ("netfilter: xtables: avoid percpu ruleset duplication")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-02 17:58:59 +02:00
Bernhard Thaler a1bc1b356a netfilter: bridge: fix CONFIG_NF_DEFRAG_IPV4/6 related warnings/errors
br_nf_ip_fragment() is not needed when neither CONFIG_NF_DEFRAG_IPV4 nor
CONFIG_NF_DEFRAG_IPV6 is set.

struct brnf_frag_data must be available if either CONFIG_NF_DEFRAG_IPV4
or CONFIG_NF_DEFRAG_IPV6 is set.

Fixes: efb6de9b4b ("netfilter: bridge: forward IPv6 fragmented packets")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-02 15:03:13 +02:00
Eric W. Biederman f307170d6e netfilter: nf_queue: Don't recompute the hook_list head
If someone sends packets from one of the netdevice ingress hooks to
the a userspace queue, and then userspace later accepts the packet,
the netfilter code can enter an infinite loop as the list head will
never be found.

Pass in the saved list_head to avoid this.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-02 15:03:13 +02:00
Linus Torvalds 47ebed96ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications)
    from Ido Shamay, Eran Ben Elisha, and Or Gerlitz.

 2) Missing unlock in error path of PTP support in renesas driver, from
    Dan Carpenter.

 3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie.

 4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad
    parity error notifications, flow-control and speed settings) from
    Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior.

 5) ipv6 extension header parsing in the igb chip has a HW errata,
    disable it.  Frm Todd Fujinaka.

 6) Fix PCI link state locking issue in e1000e driver, from Yanir
    Lubetkin.

 7) Cure panics during MTU change in i40e, from Mitch Williams.

 8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef.

 9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery
    Uytterhoeven.

10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from
    Michal Schmidt.

11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian
    Fainelli.

12) Fix races in SCTP protocol between OTTB responses and route
    removals, from Alexander Sverdlin.

13) Fix jumbo frame checksum issue with some mvneta devices, from Simon
    Guinot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits)
  sock_diag: don't broadcast kernel sockets
  net: mvneta: disable IP checksum with jumbo frames for Armada 370
  ARM: mvebu: update Ethernet compatible string for Armada XP
  net: mvneta: introduce compatible string "marvell, armada-xp-neta"
  api: fix compatibility of linux/in.h with netinet/in.h
  net: icplus: fix typo in constant name
  sis900: Trivial: Fix typos in enums
  stmmac: Trivial: fix typo in constant name
  sctp: Fix race between OOTB responce and route removal
  net-Liquidio: Delete unnecessary checks before the function call "vfree"
  vmxnet3: Bump up driver version number
  amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
  net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
  net: bcmgenet: workaround initial read failures for integrated PHYs
  net: phy: bcm7xxx: workaround MDIO management controller initial read
  bnx2x: fix DMA API usage
  net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA
  net/phy: tune get_phy_c45_ids to support more c45 phy
  bnx2x: fix lockdep splat
  net: fec: don't access RACC register when not available
  ...
2015-07-01 14:58:07 -07:00
Linus Torvalds 02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Ilya Dryomov 82cd003a77 crush: fix a bug in tree bucket decode
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used.  -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews.  The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-07-01 00:46:35 +03:00