Commit Graph

34760 Commits

Author SHA1 Message Date
Linus Torvalds 1afcb6ed0d NFS client bugfixes for Linux 3.18
Highlights include:
 
 - Stable patches to fix NFSv4.x delegation reclaim error paths
 - Fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features
 - Fix a use-after-free problem with pNFS block layouts
 - Fix a memory leak in the pNFS files O_DIRECT code
 - Replace an intrusive and Oops-prone performance fix in the NFSv4 atomic
   open code with a safer one-line version and revert the two original patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZol9AAoJEGcL54qWCgDyNQQQALnngvpPR51BoO/iTz9ruXol
 fGZy0SRIlTUKm1ArsQsQ+HGbV5K0hgP3Tg+z2AtEEZ8u/2Fi2Bqdl6+eNY12tKHd
 uUctDdM5TXLrETAn1UULrnd2eX1cvPMBfOlXlAdNHHsGEgC7w7YQ+rzGwnls+HDy
 LYXzY7Y3jYGdTMaRgZc5YRdtd8JBpCxciRvPEQLDIobwP0JnZC1afTLe1XInqB2I
 TZ4NTHT+DEWA+Ou1P2deL7+RuJNEAeWWBvULJy76n4BqKvN/HNedOO5HyBYXrwSd
 3UX3wbx9CWRxN1F0EqNKxjxZ/597JwqBeNoTDRcofLsqumUfAOtlbym1EahcD3Ls
 pykopNfgUhGuhxolStmuHdS6CnyQPERpR5lFZcDp7XtcwSq4FcwD8DRzLJMZW5dg
 N1lkfFlwQN3rqdk/NEHL+IxS41Hlk4HXjMoP6MNbRtqzIN6tW9tvC4MtAWd1aYxO
 YuUW281pbWxXQ731s0kTIrMUdQ9vGSRBMcbnO9rL3o+xkh8y5SPVkx9lhdhJN0UD
 VbQ5Ws/xZ54bD1PfyYb+Yx659lI8MSFOsDuMuLmDtfYnVicHwCA3H63StvQ3ihf/
 q0gu8Iex9YbNNjf7IfYGuWPmPn3gwPBoURPC0bcZvMPdY6DXodU6Oj4BRTQ5VCie
 9N0pt2wp2eRjaSzD7r5A
 =8YN6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - stable patches to fix NFSv4.x delegation reclaim error paths
   - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
     features
   - fix a use-after-free problem with pNFS block layouts
   - fix a memory leak in the pNFS files O_DIRECT code
   - replace an intrusive and Oops-prone performance fix in the NFSv4
     atomic open code with a safer one-line version and revert the two
     original patches"

* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  NFS: SEEK is an NFS v4.2 feature
  nfs: Fix use of uninitialized variable in nfs_getattr()
  nfs: Remove bogus assignment
  nfs: remove spurious WARN_ON_ONCE in write path
  pnfs/blocklayout: serialize GETDEVICEINFO calls
  nfs: fix pnfs direct write memory leak
  Revert "NFS: nfs4_do_open should add negative results to the dcache."
  Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
  NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
2014-11-15 14:15:16 -08:00
Linus Torvalds 5cf5203704 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) sunhme driver lacks DMA mapping error checks, based upon a report by
    Meelis Roos.

 2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.

 3) DMA memory allocation sizes are wrong in systemport ethernet driver,
    fix from Florian Fainelli.

 4) Fix use after free in mac80211 defragmentation code, from Johannes
    Berg.

 5) Some networking uapi headers missing from Kbuild file, from Stephen
    Hemminger.

 6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
    and macvtap has a similar bug, from Herbert Xu.

 7) Adjust several tunneling drivers to set dev->iflink after registry,
    because registry sets that to -1 overwriting whatever we did.  From
    Steffen Klassert.

 8) Geneve forgets to set inner tunneling type, causing GSO segmentation
    to fail on some NICs.  From Jesse Gross.

 9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
    Giuseppe CAVALLARO.

10) Fix spurious timeouts with NewReno on low traffic connections, from
    Marcelo Leitner.

11) Fix descriptor updates in enic driver, from Govindarajulu
    Varadarajan.

12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
    Fix from Takashi Iwai.

13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
    Borkmann.

14) psock_fanout selftest accesses past the end of the mmap ring, fix
    from Shuah Khan.

15) Fix PTP timestamping for VLAN packets, from Richard Cochran.

16) netlink_unbind() calls in netlink pass wrong initial argument, from
    Hiroaki SHIMODA.

17) vxlan socket reuse accidently reuses a socket when the address
    family is different, so we have to explicitly check this, from
    Marcelo Lietner.

18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
    and other architectures, from Guenter Roeck.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  vxlan: Do not reuse sockets for a different address family
  smsc911x: power-up phydev before doing a software reset.
  lib: rhashtable - Remove weird non-ASCII characters from comments
  net/smsc911x: Fix delays in the PHY enable/disable routines
  net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
  netlink: Properly unbind in error conditions.
  net: ptp: fix time stamp matching logic for VLAN packets.
  cxgb4 : dcb open-lldp interop fixes
  selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
  net: bcmgenet: apply MII configuration in bcmgenet_open()
  net: bcmgenet: connect and disconnect from the PHY state machine
  net: qualcomm: Fix dependency
  ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
  net: phy: Correctly handle MII ioctl which changes autonegotiation.
  ipv6: fix IPV6_PKTINFO with v4 mapped
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  net: ppp: Don't call bpf_prog_create() in ppp_lock
  net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
  cxgb4 : Fix bug in DCB app deletion
  ...
2014-11-13 17:54:08 -08:00
Ilya Dryomov cc9f1f518c libceph: change from BUG to WARN for __remove_osd() asserts
No reason to use BUG_ON for osd request list assertions.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:26:34 +03:00
Ilya Dryomov ba9d114ec5 libceph: clear r_req_lru_item in __unregister_linger_request()
kick_requests() can put linger requests on the notarget list.  This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:21:14 +03:00
Ilya Dryomov a390de0208 libceph: unlink from o_linger_requests when clearing r_osd
Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:21:13 +03:00
Ilya Dryomov aaef31703a libceph: do not crash on large auth tickets
Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
[   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
[   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
[   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-11-13 22:21:12 +03:00
Jeff Layton b3ecba0967 sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
    [<ffffffff81097854>] __might_sleep+0x114/0x180
    [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
    [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
    [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
    [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
    [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
    [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
    [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
    [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
    [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffff811b5830>] do_mount+0x210/0xbe0
    [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
    [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
    [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc: <stable@vger.kernel.org> # v3.17+
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-13 13:15:49 -05:00
Hiroaki SHIMODA 6251edd932 netlink: Properly unbind in error conditions.
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
not called, because cfg->unbind is omitted in __netlink_kernel_create().
And fix wrong argument of test_bit() and off by one problem.

At this point, no unbind() method is implemented, so there is no real
issue.

Fixes: 4f52090052 ("netlink: have netlink per-protocol bind function return an error code.")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 15:12:06 -05:00
Eric Dumazet 5337b5b75c ipv6: fix IPV6_PKTINFO with v4 mapped
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: c8e6ad0829 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:32:45 -05:00
Daniel Borkmann 4184b2a79a net: sctp: fix memory leak in auth key management
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:19:11 -05:00
Daniel Borkmann e40607cbe2 net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:19:10 -05:00
Jesse Gross cfdf1e1ba5 udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete.
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 15:09:45 -05:00
David S. Miller 1f5623106f Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-11-06

Please pull this batch of fixes intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."

For the iwlwifi bits, Emmanuel says:

"I fix here two issues that are related to the firmware
loading flow. A user reported that he couldn't load the
driver because the rfkill line was pulled up while we
were running the calibrations. This was happening while
booting the system: systemd was restoring the "disable
wifi settings" and that raised an RFKILL interrupt during
the calibration. Our driver didn't handle that properly
and this is now fixed."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 22:15:20 -05:00
Andrew Lunn b31f65fb43 net: dsa: slave: Fix autoneg for phys on switch MDIO bus
When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:06:28 -05:00
Marcelo Leitner 1f37bf87aa tcp: zero retrans_stamp if all retrans were acked
Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:

When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.

This happens because the tcp_sock's retrans_stamp is set once then never
cleared.

Take the following connection:

                      Linux                    remote-machine
                        |                           |
         send#1---->(*1)|--------> data#1 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*2)|----> data#1(retrans) ---->|
                  | (*3)|<---------- ACK <----------|
                  |     |                           |
                  |     :                           :
                  |     :                           :
                  |     :                           :
                16 minutes (or more)                :
                  |     :                           :
                  |     :                           :
                  |     :                           :
                  |     |                           |
         send#2---->(*4)|--------> data#2 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*5)|----> data#2(retrans) ---->|
                  |     |                           |
                  |     |                           |
                RTO*2   :                           :
                  |     |                           |
                  |     |                           |
      ETIMEDOUT<----(*6)|                           |

(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.

At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".

(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).

The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.

(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.

Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.

Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:49 -05:00
Jesse Gross d3ca9eafc0 geneve: Unregister pernet subsys on module unload.
The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.

Fixes: 0b5e8b8eea ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
Jesse Gross 45cac46e51 geneve: Set GSO type on transmit.
Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.

CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
John W. Linville 0c9a67c8f1 This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
 to the original mac80211 we merged into the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWJSYAAoJEDBSmw7B7bqr17IP/3RFbqI4S/ceZzrVNLtvGDUd
 MIlkGhngDdhpFhSxTdOH4opFM/j9bkwndk0F35d4r94mCeB5eJQKrtUfeon/7aft
 AKaRa3CNsEVQgCempCYOKGwlJZQQ86IL6IvU4CW5CTNHENUBLA83KHqX+6Aoumhm
 mdJxhSzmB53Qn1bteIJXyJmOjgxQvBZggBIF/25Xnosb3FBH3hvPsH0qbIKZaicy
 PlD5JWk9UseySjLNwk1/jriQ4koF5Dy/BVRyQ/0fRYswdmS3o2EiC4JOWjsOfIUi
 NE9Ax+DAKvHHGYNcsX/hXsPJTc6fYgq3INEZBvnK04GHVFVGLq1WoEIfOeLugK7o
 j7OIEJbkKAQjJSnEpB9Y6YHO/jPXEokJjUNT7VuZJqLElp4Hd8K9jnhKD9jkZBA6
 TGjNO5NJqgGdlxnq3nu4+XFh9StAam6J1Ey1TWarc6Kxd8Gtg3Ymkj3cO46rHcQU
 JX3i3RGlYqibEQ0NVtZ4EfnGjtcGx0Vbf+yAc9ZpWzKFvX9YKS1wuOd5i/eZI8bb
 hxMjHFwmViV3Ifk9GjBNKioXkCpEfk9Q3pKzRllHQn56ueTu1mBvAfIe93PRm9kR
 y/giIZvHEhs8VH2PHVuHzT16YMVnNfQniAi+BK73QWC3zAhj1ss3xN33+Q8FfpMM
 xw/prlY9IAH2A9zis1Vz
 =8uwd
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 15:56:33 -05:00
Linus Torvalds ce1928da84 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "There is a GFP flag fix from Mike Christie, an error code fix from
  Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
  from Ilya.  All are well tested.

  Ilya has one other fix on the way but it didn't get tested in time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: eliminate unnecessary allocation in process_one_ticket()
  rbd: Fix error recovery in rbd_obj_read_sync()
  libceph: use memalloc flags for net IO
  rbd: use a single workqueue for all devices
2014-11-03 15:04:26 -08:00
Steffen Klassert f03eb128e3 gre6: Move the setting of dev->iflink into the ndo_init functions.
Otherwise it gets overwritten by register_netdev().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert ebe084aafb sit: Use ipip6_tunnel_init as the ndo_init function.
ipip6_tunnel_init() sets the dev->iflink via a call to
ipip6_tunnel_bind_dev(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
ndo_init function. Then ipip6_tunnel_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert 16a0231bf7 vti6: Use vti6_dev_init as the ndo_init function.
vti6_dev_init() sets the dev->iflink via a call to
vti6_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for vti6 tunnels. Fix this by using vti6_dev_init() as the
ndo_init function. Then vti6_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert 6c6151daaf ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Guenter Roeck c1207c049b netfilter: nft_reject_bridge: Fix powerpc build error
Fix:
net/bridge/netfilter/nft_reject_bridge.c:
In function 'nft_reject_br_send_v6_unreach':
net/bridge/netfilter/nft_reject_bridge.c:240:3:
	error: implicit declaration of function 'csum_ipv6_magic'
   csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr,
   ^
make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1

Seen with powerpc:allmodconfig.

Fixes: 523b929d54 ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:12:34 -05:00
Johannes Berg b8fff407a1 mac80211: fix use-after-free in defragmentation
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.

Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.

Cc: stable@vger.kernel.org
Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-03 14:28:50 +01:00
Linus Torvalds 4cb8c3593b irda: stop calling sk_prot->disconnect() on connection failure
The sk_prot is irda's own set of protocol handlers, so irda should
statically know what that function is anyway, without using an indirect
pointer.  And as it happens, we know *exactly* what that pointer is
statically: it's NULL, because irda doesn't define a disconnect
operation.

So calling that function is doubly wrong, and will just cause an oops.

Reported-by: Martin Lang <mlg.hessigheim@gmail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-02 10:20:26 -08:00
Ilya Dryomov e9226d7c9f libceph: eliminate unnecessary allocation in process_one_ticket()
Commit c27a3e4d66 ("libceph: do not hard code max auth ticket len")
while fixing a buffer overlow tried to keep the same as much of the
surrounding code as possible and introduced an unnecessary kmalloc() in
the unencrypted ticket path.  It is likely to fail on huge tickets, so
get rid of it.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-31 23:43:08 +03:00
Guenter Roeck e0fb6fb6d5 net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0
If a driver supports reading EEPROM but no EEPROM is installed in the system,
the driver's get_eeprom_len function returns 0. ethtool will subsequently
try to read that zero-length EEPROM anyway. If the driver does not support
EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
does support EEPROM access but no EEPROM is installed, the operation will
return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 16:12:34 -04:00
Pravin B Shelar de05c400f7 mpls: Allow mpls_gso to be built as module
Kconfig already allows mpls to be built as module. Following patch
fixes Makefile to do same.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 15:47:21 -04:00
Pravin B Shelar f7065f4bd3 mpls: Fix mpls_gso handler.
mpls gso handler needs to pull skb after segmenting skb.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 15:47:21 -04:00
David S. Miller e3a88f9c4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:

1) Fix a null-pointer dereference that may occur when logging
   errors. This problem was introduced by 4a4739d56b ("ipvs: Pull
   out crosses_local_route_boundary logic") in v3.17-rc5.

2) Update hook mask in nft_reject_bridge so we can also filter out
   packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
   to filter from prerouting and postrouting"), which needs this chunk
   to work.

3) Two patches to refactor common code to forge the IPv4 and IPv6
   reject packets from the bridge. These are required by the nf_tables
   reject bridge fix.

4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
   packets from the bridge. The idea is to forge the reject packets and
   inject them to the original port via br_deliver() which is now
   exported for that purpose.

5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
   the original skbuff may cloned after prerouting when the bridge stack
   needs to flood it to several bridge ports, it is too late to reject
   the traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 12:29:42 -04:00
Pablo Neira Ayuso 127917c29a netfilter: nft_reject_bridge: restrict reject to prerouting and input
Restrict the reject expression to the prerouting and input bridge
hooks. If we allow this to be used from forward or any other later
bridge hook, if the frame is flooded to several ports, we'll end up
sending several reject packets, one per cloned packet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:50:09 +01:00
Pablo Neira Ayuso 523b929d54 netfilter: nft_reject_bridge: don't use IP stack to reject traffic
If the packet is received via the bridge stack, this cannot reject
packets from the IP stack.

This adds functions to build the reject packet and send it from the
bridge stack. Comments and assumptions on this patch:

1) Validate the IPv4 and IPv6 headers before further processing,
   given that the packet comes from the bridge stack, we cannot assume
   they are clean. Truncated packets are dropped, we follow similar
   approach in the existing iptables match/target extensions that need
   to inspect layer 4 headers that is not available. This also includes
   packets that are directed to multicast and broadcast ethernet
   addresses.

2) br_deliver() is exported to inject the reject packet via
   bridge localout -> postrouting. So the approach is similar to what
   we already do in the iptables reject target. The reject packet is
   sent to the bridge port from which we have received the original
   packet.

3) The reject packet is forged based on the original packet. The TTL
   is set based on sysctl_ip_default_ttl for IPv4 and per-net
   ipv6.devconf_all hoplimit for IPv6.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:50:08 +01:00
Pablo Neira Ayuso 8bfcdf6671 netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:57 +01:00
Pablo Neira Ayuso 052b9498ee netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:05 +01:00
Pablo Neira Ayuso 4d87716cd0 netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing
Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:44:56 +01:00
Ben Hutchings 5188cd44c5 drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 20:01:18 -04:00
Eric Dumazet 39bb5e6286 net: skb_fclone_busy() needs to detect orphaned skb
Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f3279ae0c ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:58:30 -04:00
Tom Herbert 14051f0452 gre: Use inner mac length when computing tunnel length
Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:51:56 -04:00
Nicolas Cavallari fa19c2b050 ipv4: Do not cache routing failures due to disabled forwarding.
If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not.  Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them.  The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:20:40 -04:00
Johannes Berg 46238845bd mac80211: properly flush delayed scan work on interface removal
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.

However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.

To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.

Cc: stable@vger.kernel.org
Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-30 15:48:32 +01:00
Mike Christie 89baaa570a libceph: use memalloc flags for net IO
This patch has ceph's lib code use the memalloc flags.

If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.

The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.

I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-10-30 13:11:50 +03:00
Nikolay Aleksandrov d70127e8a9 inet: frags: remove the WARN_ON from inet_evict_bucket
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:21:30 -04:00
Nikolay Aleksandrov 65ba1f1ec0 inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
 - then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
 - then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:21:30 -04:00
Lubomir Rintel b2ed64a974 ipv6: notify userspace when we added or changed an ipv6 token
NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 14:35:32 -04:00
WANG Cong d56109020d sch_pie: schedule the timer after all init succeed
Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-29 14:28:01 -04:00
Luciano Coelho ff1e417c7c mac80211: schedule the actual switch of the station before CSA count 0
Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0.  To avoid that, use count - 1 when calculating the switch time.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:37:54 +01:00
Luciano Coelho 84469a45a1 mac80211: use secondary channel offset IE also beacons during CSA
If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used.  This applies both to beacons and to probe
responses.

In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead.  This was
causing us to use the same offset as before the switch.

Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset.  Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:37:45 +01:00
Felix Fietkau 10b6848786 mac80211: flush keys for AP mode on ieee80211_do_stop
Userspace can add keys to an AP mode interface before start_ap has been
called. If there have been no calls to start_ap/stop_ap in the mean
time, the keys will still be around when the interface is brought down.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[adjust comments, fix AP_VLAN case]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:33:37 +01:00
David S. Miller 25946f20b7 Merge tag 'master-2014-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-10-28

Please pull this batch of fixes intended for the 3.18 stream!

For the mac80211 bits, Johannes says:

"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."

For the iwlwifi bits, Emmanuel says:

"I revert here a patch that caused interoperability issues.
dvm gets a fix for a bug that was reported by many users.
Two minor fixes for BT Coex and platform power fix that helps
reducing latency when the PCIe link goes to low power states."

In addition...

Felix Fietkau adds a couple of ath code fixes related to regulatory
rule enforcement.

Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS
is not set.

Karsten Wiese provides a trio of minor fixes for rtl8192cu.

Kees Cook prevents a potential information leak in rtlwifi.

Larry Finger also brings a trio of minor fixes for rtlwifi.

Rafał Miłecki adds a device ID to the bcma bus driver.

Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac
to eliminate non-terminated string issues.

Sujith Manoharan avoids some ath9k stalls by enabling HW queue control
only for MCC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 15:30:15 -04:00