Commit Graph

42294 Commits

Author SHA1 Message Date
Linus Torvalds 68093c43f3 A fix for a long-standing bug in the incremental osdmap handling code
that caused misdirected requests, tagged for stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXk4kSAAoJEEp/3jgCEfOLvpsIAKJTs1ELIQ5RmfvwdvqyqI0N
 DoSA6rBYIwQvjqBjevJw72w5HKR7hhJoxaEjXEFrw9zbRmXMNnlk5xZpgD8vy5E3
 1iCA8LtscFp4ld4ZNWIus45mUpf6/a5ugPd9Mr3V5C4J05LWqZeXufpAHNHyFbII
 ++hTu6J/RAg8DddEUhBcDl7c65tQpc8ai0h8ll0pLRYNFLPeCoYO3yTitEYax4fR
 i6erB3+7pNWnZIsPnUTrXS4B2NG5kPmflVkD7UH9i14PwdQ4QO85LSXD1o8xYrpa
 Occ9EvgFuT8zTJHckCEcT2Y0dINz2uHiE05DUea3Udz82keV9zKeZhZUDwJ95RE=
 =P1qk
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a long-standing bug in the incremental osdmap handling code
  that caused misdirected requests, tagged for stable"

  The tag is signed with a brand new key - Sage is on vacation and I
  didn't anticipate this"

* tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client:
  libceph: apply new_state before new_up_client on incrementals
2016-07-24 10:00:31 +09:00
Ilya Dryomov 930c532869 libceph: apply new_state before new_up_client on incrementals
Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-07-22 15:17:40 +02:00
Soheil Hassas Yeganeh f8e7718cc0 packet: propagate sock_cmsg_send() error
sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.

Fixes: c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22 01:41:48 -04:00
Yoshihiro Shimoda edbe774623 packet: fix second argument of sock_tx_timestamp()
This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c ("sock: enable timestamping using control messages")
Reported-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Reported-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:00:50 -07:00
Willem de Bruijn c74bfbdba0 sctp: load transport header after sk_filter
Do not cache pointers into the skb linear segment across sk_filter.
The function call can trigger pskb_expand_head.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-18 22:46:52 -07:00
Konstantin Khlebnikov 0564bf0afa net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink
protocol these values are converted into pshed ticks (64ns for now) and
truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens"
are declared as unsigned 32-bit but they could be negative thus tool 'tc'
prints them as signed. Big values loose higher bits and/or become negative.

This patch clamps tokens in xstat into range from INT_MIN to INT_MAX.
In this way it's easier to understand what's going on here.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-18 22:44:31 -07:00
Paolo Abeni 18d3df3eab vlan: use a valid default mtu value for vlan over macsec
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16 20:15:02 -07:00
Jason Baron 083ae30828 tcp: enable per-socket rate limiting of all 'challenge acks'
The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:

commit f2b2c582e8 ("tcp: mitigate ACK loops for connections as tcp_sock")

And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.

Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.

It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable

Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:18:29 -07:00
Willem de Bruijn 4f0c40d944 dccp: limit sk_filter trim to payload
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.

Fixes: 7c657876b6 ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:53:41 -07:00
Willem de Bruijn f4979fcea7 rose: limit sk_filter trim to payload
Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:53:40 -07:00
David S. Miller 92a03eb012 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree.
they are:

1) Fix leak in the error path of nft_expr_init(), from Liping Zhang.

2) Tracing from nf_tables cannot be disabled, also from Zhang.

3) Fix an integer overflow on 32bit archs when setting the number of
   hashtable buckets, from Florian Westphal.

4) Fix configuration of ipvs sync in backup mode with IPv6 address,
   from Quentin Armitage via Simon Horman.

5) Fix incorrect timeout calculation in nft_ct NFT_CT_EXPIRATION,
   from Florian Westphal.

6) Skip clash resolution in conntrack insertion races if NAT is in
   place.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12 10:21:27 -07:00
Pablo Neira Ayuso 590b52e10d netfilter: conntrack: skip clash resolution if nat is in place
The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:

http://marc.info/?l=netfilter-devel&m=146818011604484&w=2

Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().

Fixes: 71d8c47fc6 ("netfilter: conntrack: introduce clash resolution on insertion race")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
2016-07-12 16:28:41 +02:00
Jon Paul Maloy 1fc07f3e15 tipc: reset all unicast links when broadcast send link fails
In test situations with many nodes and a heavily stressed system we have
observed that the transmission broadcast link may fail due to an
excessive number of retransmissions of the same packet. In such
situations we need to reset all unicast links to all peers, in order to
reset and re-synchronize the broadcast link.

In this commit, we add a new function tipc_bearer_reset_all() to be used
in such situations. The function scans across all bearers and resets all
their pertaining links.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 22:42:12 -07:00
Jon Paul Maloy a71eb72035 tipc: ensure correct broadcast send buffer release when peer is lost
After a new receiver peer has been added to the broadcast transmission
link, we allow immediate transmission of new broadcast packets, trusting
that the new peer will not accept the packets until it has received the
previously sent unicast broadcast initialiation message. In the same
way, the sender must not accept any acknowledges until it has itself
received the broadcast initialization from the peer, as well as
confirmation of the reception of its own initialization message.

Furthermore, when a receiver peer goes down, the sender has to produce
the missing acknowledges from the lost peer locally, in order ensure
correct release of the buffers that were expected to be acknowledged by
the said peer.

In a highly stressed system we have observed that contact with a peer
may come up and be lost before the above mentioned broadcast initial-
ization and confirmation have been received. This leads to the locally
produced acknowledges being rejected, and the non-acknowledged buffers
to linger in the broadcast link transmission queue until it fills up
and the link goes into permanent congestion.

In this commit, we remedy this by temporarily setting the corresponding
broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
state to true before we issue the local acknowledges. This ensures that
those acknowledges will always be accepted. The mentioned state values
are restored immediately afterwards when the link is reset.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 22:42:12 -07:00
Jon Paul Maloy 2d18ac4ba7 tipc: extend broadcast link initialization criteria
At first contact between two nodes, an endpoint might sometimes have
time to send out a LINK_PROTOCOL/STATE packet before it has received
the broadcast initialization packet from the peer, i.e., before it has
received a valid broadcast packet number to add to the 'bc_ack' field
of the protocol message.

This means that the peer endpoint will receive a protocol packet with an
invalid broadcast acknowledge value of 0. Under unlucky circumstances
this may lead to the original, already received acknowledge value being
overwritten, so that the whole broadcast link goes stale after a while.

We fix this by delaying the setting of the link field 'bc_peer_is_up'
until we know that the peer really has received our own broadcast
initialization message. The latter is always sent out as the first
unicast message on a link, and always with seqeunce number 1. Because
of this, we only need to look for a non-zero unicast acknowledge value
in the arriving STATE messages, and once that is confirmed we know we
are safe and can set the mentioned field. Before this moment, we must
ignore all broadcast acknowledges from the peer.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 22:42:12 -07:00
Soheil Hassas Yeganeh 779f1edec6 sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send
Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.

SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).

Fixes: c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 14:32:44 -07:00
Julian Anastasov 80610229ef ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:

Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
 626238f0 960226a02 00000400 000000fe
 62623910 600afca7 62623970 62623a48
 62468038 00000018 00000000 00000000
Call Trace:
 [<602b3e93>] rtmsg_fib+0xd3/0x190
 [<602b6680>] fib_table_insert+0x260/0x500
 [<602b0e5d>] inet_rtm_newroute+0x4d/0x60
 [<60250def>] rtnetlink_rcv_msg+0x8f/0x270
 [<60267079>] netlink_rcv_skb+0xc9/0xe0
 [<60250d4b>] rtnetlink_rcv+0x3b/0x50
 [<60265400>] netlink_unicast+0x1a0/0x2c0
 [<60265e47>] netlink_sendmsg+0x3f7/0x470
 [<6021dc9a>] sock_sendmsg+0x3a/0x90
 [<6021e0d0>] ___sys_sendmsg+0x300/0x360
 [<6021fa64>] __sys_sendmsg+0x54/0xa0
 [<6021fac0>] SyS_sendmsg+0x10/0x20
 [<6001ea68>] handle_syscall+0x88/0x90
 [<600295fd>] userspace+0x3fd/0x500
 [<6001ac55>] fork_handler+0x85/0x90

$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264

Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.

Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Fixes: 0eeb075fad ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:41:09 -07:00
Eric Dumazet 75ff39ccc1 tcp: make challenge acks less predictable
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.

This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.

Based on initial analysis and patch from Linus.

Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.

v2: randomize the count of challenge acks per second, not the period.

Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:33:35 -07:00
Michal Kubeček a612769774 udp: prevent bugcheck if filter truncates packet too much
If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 12:43:15 -07:00
David S. Miller 7d32eb8781 Here are a couple batman-adv bugfix patches, all by Sven Eckelmann:
- Fix possible NULL pointer dereference for vlan_insert_tag (two patches)
 
  - Fix reference handling in some features, which may lead to reference
    leaks or invalid memory access (four patches)
 
  - Fix speedy join: DHCP packets handled by the gateway feature should
    be sent with 4-address unicast instead of 3-address unicast to make
    speedy join work. This fixes/speeds up DHCP assignment for clients
    which join a mesh for the first time. (one patch)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXf3SCAAoJEKEr45hCkp6hAaAQAJxKFavGbXHWvj1M1VxqVFkN
 AlxP7JZ6OHgnWxBT3drk4ZRaxIA7v/2VkRYrCbxoYjIENiyrmNz+93SAzaBcTKxE
 nnUntdDbQWYE3MOGC1lUBIoPgjvs4DQRejyq5dvG9CYEcK9hE4pDKV7FUfeBgmgL
 dG5+9ht8JEjMYZq48FQp4SQwkQGpWRiS4fekZEUmcO1pIQpx0uOYTMfMZ/HpqpCN
 im1QhUXlAGCBcOIJwztqVb/04LKcuTS8Du+b50BFF5uITmCZdK0NmG5yBH+1Nn8K
 uKYanY3dHYUE4eGw3NAqnJ0uSiMQFlhk3gqKgHY8uu/KoMiqZ3tjBkNp+3fF3KqH
 0AnXEPPsQPU8RJ5WAHH6TR/UNnoCrfqU6AjbIclHNq7l3WY6u0fD2uKHCGlaV13M
 8XolPWECum8iLEptmYDlhYZrh5D9kteGDV7kt3XtQY8Hpv/UE1Jh1/iGrhNjtbdX
 7P6NsZdi/cnkGPhIaRnoEQaWHZVmbO4Rl8Q2Yb3Ze2LEUuLdrkmBjTBKqiOFMnMe
 7ltA3JL7ip/alRPeNsuiHOY28uNaog3YuEHg8QYiyTs449Os/TjWoh9pzD44dhkB
 auIxmiy/IyVdYwlQwfBHDJupVK7WncUq+iF/rv3TfTmY25FO4FC+EV+PsBZdWsc+
 co+amJR57ZOAygd0GgU2
 =7Z04
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20160708' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are a couple batman-adv bugfix patches, all by Sven Eckelmann:

 - Fix possible NULL pointer dereference for vlan_insert_tag (two patches)

 - Fix reference handling in some features, which may lead to reference
   leaks or invalid memory access (four patches)

 - Fix speedy join: DHCP packets handled by the gateway feature should
   be sent with 4-address unicast instead of 3-address unicast to make
   speedy join work. This fixes/speeds up DHCP assignment for clients
   which join a mesh for the first time. (one patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 12:28:44 -07:00
Pablo Neira Ayuso 4edfa9d0bf Merge tag 'ipvs-fixes2-for-v4.7' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:

====================
Second Round of IPVS Fixes for v4.7

The fix from Quentin Armitage allows the backup sync daemon to
be bound to a link-local mcast IPv6 address as is already the case
for IPv4.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11 11:58:33 +02:00
Eric Dumazet 95556a8838 dccp: avoid deadlock in dccp_v4_ctl_send_reset
In the prep work I did before enabling BH while handling socket backlog,
I missed two points in DCCP :

1) dccp_v4_ctl_send_reset() uses bh_lock_sock(), assuming BH were
blocked. It is not anymore always true.

2) dccp_v4_route_skb() was using __IP_INC_STATS() instead of
  IP_INC_STATS()

A similar fix was done for TCP, in commit 47dcc20a39
("ipv4: tcp: ip_send_unicast_reply() is not BH safe")

Fixes: 7309f8821f ("dccp: do not assume DCCP code is non preemptible")
Fixes: 5413d1babe ("net: do not block BH while processing socket backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 18:14:17 -04:00
David S. Miller 5b58d83617 Two more fixes:
* handle allocation failures in new(ish) A-MSDU decapsulation
  * don't leak memory on nl80211 ACL parse errors
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXfPjfAAoJEGt7eEactAAd9HMQAJ+KQipCHO+TE/b1tH40LFAO
 1lptZfe3BSM24nkk0mYTi0a2ylzd9hsiuqFXsVBB32B7TARR3/MOCo/DsYd66PdN
 guEg5l/blMOiOoSmdnehvssboZTRwofFebDxnRv8UFWhyFxhbe/xZLUgYjX8JHhD
 +NvzTG0bpTlN8n8e4IsGsVzQkG0G9ianiOir8xSgj1ahHug3S/phR8PDCkbeEYA/
 fgFk3bFhkOOoKbSDMAfeEN/Xc/k6IFkVS89ZI8QSTSrMZklvZqfFLLmxeRuxegk5
 x5VDfPiCXSBez/OoxjlZgHmipGhWZOA9o5S3JxlFl8JSBWhJeVNvW4C1uAEaLRzF
 BZSWynA/a9HecSjzdjswOvx9bTXVNp3D3QldmjuxchdBrmFtrBRMSH3xmARG5F1X
 TslEJlyj6YkjsEjBgpwjxaE0wW9sAhd3GQDj6rvyz3RoRO1lDsu50Z8RVZ1uDbgY
 Q2/nspgPDf1hqQolsOYvaYhYPj75ZcncFG7FqO3mV5wqrSh+n0eyntUz1HW23T/J
 tm7nvS911V+jLGtbow0yHNu3jRSPW8mVXpI7eL1oP6DKJyxjn5CR4MtOFL8BB/lp
 NVtRzI88IgloyMc52/3A5dMELQigIbC2VheRZKNRX7FVYJnqPck6SNhcOvH9emBZ
 Ji/SbZ8oZNba2hvyQmMj
 =pfmq
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two more fixes:
 * handle allocation failures in new(ish) A-MSDU decapsulation
 * don't leak memory on nl80211 ACL parse errors
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08 23:53:41 -04:00
Florian Westphal c8607e0200 netfilter: nft_ct: fix expiration getter
We need to compute timeout.expires - jiffies, not the other way around.
Add a helper, another patch can then later change more places in
conntrack code where we currently open-code this.

Will allow us to only change one place later when we remove per-ct timer.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-08 14:55:14 +02:00
Quentin Armitage 3777ed688f ipvs: fix bind to link-local mcast IPv6 address in backup
When using HEAD from
https://git.kernel.org/cgit/utils/kernel/ipvsadm/ipvsadm.git/,
the command:
ipvsadm --start-daemon backup --mcast-interface eth0.60 \
    --mcast-group ff02::1:81
fails with the error message:
Argument list too long

whereas both:
ipvsadm --start-daemon master --mcast-interface eth0.60 \
    --mcast-group ff02::1:81
and:
ipvsadm --start-daemon backup --mcast-interface eth0.60 \
    --mcast-group 224.0.0.81
are successful.

The error message "Argument list too long" isn't helpful. The error occurs
because an IPv6 address is given in backup mode.

The error is in make_receive_sock() in net/netfilter/ipvs/ip_vs_sync.c,
since it fails to set the interface on the address or the socket before
calling inet6_bind() (via sock->ops->bind), where the test
'if (!sk->sk_bound_dev_if)' failed.

Setting sock->sk->sk_bound_dev_if on the socket before calling
inet6_bind() resolves the issue.

Fixes: d33288172e ("ipvs: add more mcast parameters for the sync daemon")
Signed-off-by: Quentin Armitage <quentin@armitage.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2016-07-07 20:21:32 +02:00
Sven Eckelmann d1fe176ca5 batman-adv: Fix speedy join in gateway client mode
Speedy join only works when the received packet is either broadcast or an
4addr unicast packet. Thus packets converted from broadcast to unicast via
the gateway handling code have to be converted to 4addr packets to allow
the receiving gateway server to add the sender address as temporary entry
to the translation table.

Not doing it will make the batman-adv gateway server drop the DHCP response
in many situations because it doesn't yet have the TT entry for the
destination of the DHCP response.

Fixes: 371351731e ("batman-adv: change interface_rx to get orig node")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-06 16:03:40 +02:00
Gregory Greenman 16a910a672 cfg80211: handle failed skb allocation
Handle the case when dev_alloc_skb returns NULL.

Cc: stable@vger.kernel.org
Fixes: 2b67f944f8 ("cfg80211: reuse existing page fragments in A-MSDU rx")
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-07-06 13:52:18 +02:00
Purushottam Kushwaha 6e8ef84222 nl80211: Move ACL parsing later to avoid a possible memory leak
No support for pbss results in a memory leak for the acl_data
(if parse_acl_data succeeds). Fix this by moving the ACL parsing later.

Cc: stable@vger.kernel.org
Fixes: 34d505193b ("cfg80211: basic support for PBSS network type")
Signed-off-by: Purushottam Kushwaha <pkushwah@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-07-06 13:09:02 +02:00
Martin KaFai Lau 903ce4abdf ipv6: Fix mem leak in rt6i_pcpu
It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it.  We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy <pety@rusnet.ru>
Tested-by: Petr Novopashenniy <pety@rusnet.ru>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Petr Novopashenniy <pety@rusnet.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 14:09:23 -07:00
Vegard Nossum ab58298cf4 net: fix decnet rtnexthop parsing
dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).

Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.

This fixes the softlockup for me.

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 14:08:47 -07:00
Sven Eckelmann cbef1e1020 batman-adv: Free last_bonding_candidate on release of orig_node
The orig_ifinfo reference counter for last_bonding_candidate in
batadv_orig_node has to be reduced when an originator node is released.
Otherwise the orig_ifinfo is leaked and the reference counter the netdevice
is not reduced correctly.

Fixes: f3b3d90189 ("batman-adv: add bonding again")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:43:58 +02:00
Sven Eckelmann 15c2ed753c batman-adv: Fix reference leak in batadv_find_router
The replacement of last_bonding_candidate in batadv_orig_node has to be an
atomic operation. Otherwise it is possible that the reference counter of a
batadv_orig_ifinfo is reduced which was no longer the
last_bonding_candidate when the new candidate is added. This can either
lead to an invalid memory access or to reference leaks which make it
impossible to an interface which was added to batman-adv.

Fixes: f3b3d90189 ("batman-adv: add bonding again")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:43:52 +02:00
Sven Eckelmann 3db0decf11 batman-adv: Fix non-atomic bla_claim::backbone_gw access
The pointer batadv_bla_claim::backbone_gw can be changed at any time.
Therefore, access to it must be protected to ensure that two function
accessing the same backbone_gw are actually accessing the same. This is
especially important when the crc_lock is used or when the backbone_gw of a
claim is exchanged.

Not doing so leads to invalid memory access and/or reference leaks.

Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Fixes: 5a1dd8a477 ("batman-adv: lock crc access in bridge loop avoidance")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:43:21 +02:00
Sven Eckelmann 33fbb1f3db batman-adv: Fix orig_node_vlan leak on orig_node_release
batadv_orig_node_new uses batadv_orig_node_vlan_new to allocate a new
batadv_orig_node_vlan and add it to batadv_orig_node::vlan_list. References
to this list have also to be cleaned when the batadv_orig_node is removed.

Fixes: 7ea7b4a142 ("batman-adv: make the TT CRC logic VLAN specific")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:43:10 +02:00
Sven Eckelmann 60154a1e04 batman-adv: Avoid nullptr dereference in dat after vlan_insert_tag
vlan_insert_tag can return NULL on errors. The distributed arp table code
therefore has to check the return value of vlan_insert_tag for NULL before
it can safely operate on this pointer.

Fixes: be1db4f661 ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:40:01 +02:00
Sven Eckelmann 10c78f5854 batman-adv: Avoid nullptr dereference in bla after vlan_insert_tag
vlan_insert_tag can return NULL on errors. The bridge loop avoidance code
therefore has to check the return value of vlan_insert_tag for NULL before
it can safely operate on this pointer.

Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-05 12:40:01 +02:00
Vegard Nossum 3dad5424ad RDS: fix rds_tcp_init() error path
If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().

Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Cc: stable@vger.kernel.org
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 16:09:49 -07:00
Richard Alpe 55e77a3e82 tipc: fix nl compat regression for link statistics
Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes
of the link name where left out.

Making the output of tipc-config -ls look something like:
Link statistics:
dcast-link
1:data0-1.1.2:data0
1:data0-1.1.3:data0

Also, for the record, the patch that introduce this regression
claims "Sending the whole object out can cause a leak". Which isn't
very likely as this is a compat layer, where the data we are parsing
is generated by us and we know the string to be NULL terminated. But
you can of course never be to secure.

Fixes: 5d2be1422e (tipc: fix an infoleak in tipc_nl_compat_link_dump)
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:47:38 -04:00
WANG Cong 82a31b9231 net_sched: fix mirrored packets checksum
Similar to commit 9b368814b3 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:19:34 -04:00
David S. Miller eb70db8756 packet: Use symmetric hash for PACKET_FANOUT_HASH.
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.

The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.

But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.

Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports.  This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.

Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:07:50 -04:00
Florian Westphal 9cc1c73ad6 netfilter: conntrack: avoid integer overflow when resizing
Can overflow so we might allocate very small table when bucket count is
high on a 32bit platform.

Note: resize is only possible from init_netns.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01 16:02:33 +02:00
Shmulik Ladkani fedbb6b4ff ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output
ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it
calls ip_sk_use_pmtu which casts sk as an inet_sk).

However, in the case of UDP tunneling, the skb->sk is not necessarily an
inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from
tun/tap).

OTOH, the sk passed as an argument throughout IP stack's output path is
the one which is of PMTU interest:
 - In case of local sockets, sk is same as skb->sk;
 - In case of a udp tunnel, sk is the tunneling socket.

Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu.
This augments 7026b1ddb6 'netfilter: Pass socket pointer down through okfn().'

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 09:02:48 -04:00
Linus Torvalds 32826ac41f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "I've been traveling so this accumulates more than week or so of bug
  fixing.  It perhaps looks a little worse than it really is.

   1) Fix deadlock in ath10k driver, from Ben Greear.

   2) Increase scan timeout in iwlwifi, from Luca Coelho.

   3) Unbreak STP by properly reinjecting STP packets back into the
      stack.  Regression fix from Ido Schimmel.

   4) Mediatek driver fixes (missing malloc failure checks, leaking of
      scratch memory, wrong indexing when mapping TX buffers, etc.) from
      John Crispin.

   5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
      Sowa.

   6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

   7) Fix netlink notifications in ovs for tunnels, delete link messages
      are never emitted because of how the device registry state is
      handled.  From Nicolas Dichtel.

   8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

   9) Prevent endless jump loops in nft rules, from Liping Zhang and
      Pablo Neira Ayuso.

  10) Not early enough spinlock initialization in mlx4, from Eric
      Dumazet.

  11) Bind refcount leak in act_ipt, from Cong WANG.

  12) Missing RCU locking in HTB scheduler, from Florian Westphal.

  13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
      barrier, using heap for SG and IV, and erroneous use of async flag
      when allocating AEAD conext.)

  14) RCU handling fix in TIPC, from Ying Xue.

  15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
      SIT driver, from Simon Horman.

  16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

  17) Fix potential deadlock in team enslave, from Ido Schimmel.

  18) Memory leak in KCM procfs handling, from Jiri Slaby.

  19) ESN generation fix in ipv4 ESP, from Herbert Xu.

  20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
      WANG.

  21) Use after free in netem, from Eric Dumazet.

  22) Uninitialized last assert time in multicast router code, from Tom
      Goff.

  23) Skip raw sockets in sock_diag destruction broadcast, from Willem
      de Bruijn.

  24) Fix link status reporting in thunderx, from Sunil Goutham.

  25) Limit resegmentation of retransmit queue so that we do not
      retransmit too large GSO frames.  From Eric Dumazet.

  26) Delay bpf program release after grace period, from Daniel
      Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
  openvswitch: fix conntrack netlink event delivery
  qed: Protect the doorbell BAR with the write barriers.
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  e1000e: keep VLAN interfaces functional after rxvlan off
  cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
  qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
  bpf, perf: delay release of BPF prog after grace period
  net: bridge: fix vlan stats continue counter
  tcp: do not send too big packets at retransmit time
  ibmvnic: fix to use list_for_each_safe() when delete items
  net: thunderx: Fix TL4 configuration for secondary Qsets
  net: thunderx: Fix link status reporting
  net/mlx5e: Reorganize ethtool statistics
  net/mlx5e: Fix number of PFC counters reported to ethtool
  net/mlx5e: Prevent adding the same vxlan port
  net/mlx5e: Check for BlueFlame capability before allocating SQ uar
  net/mlx5e: Change enum to better reflect usage
  net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
  net/mlx5: Update command strings
  net: marvell: Add separate config ANEG function for Marvell 88E1111
  ...
2016-06-29 11:50:42 -07:00
David S. Miller 751ad819b0 Just two small fixes
* fix mesh peer link counter, decrement wasn't always done at all
  * fix ethertype (length) for packets without RFC 1042 or bridge
    tunnel header
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXc5wQAAoJEGt7eEactAAd2HQP/RS3lZaNn97spZp9Vb9Swqd6
 UjhT4vHL2wHPMhPals2M2ztRhL8df/55tOhSuCwUki0gTsAfP4+Jz1PZ4okFUSBn
 6EHvRtAgSAzjBElZVEsNO2sM/qm0UOnlqm4XpOXcKzpnRzDSlIa6vpnctp2MJMBI
 PB46x8BPQoi192aTzleMqgBwjrMX1ohsM6rgOIEf7rvLhftx6LRRm7/uKaiwenVT
 ZxkP6jIOs5ofkUHLHS4OPyId4yReO8oMMbRRA4fW1z2T8hME9PH3jTZL3Znaj7VF
 s2LLSqwUlv06k0WFu3SzJh750jCJ6cnJiIqggDlPChrb0/DPicqgWR7/tJAvNVfj
 WJqugeNtEVI+4ElGNxFzFAvjAKqO8fHY5U8Ko4LtZuwhjpo5GDvA+FErFESRXvCG
 c647cDMhn+6OXwpUwsggQK4c+AS4QX/PzHjmhW5tKORPXlYVXAAz+JDXn195WnzV
 w4UAO4ZWai9XY6oSc63uudaHMt8Xkmq8PRQsH1hHG20CAJ+7bGugaWVTci5fzeNh
 JC4X+Xh6t+5qq1jEWgr+KguVKXtP3pVetqGy9kYQrJov0fTe2Z6deEeY08sH1Ffz
 xdnQUsZphvqSkosTw76JtNPXz2ajxZCD3KNWjcpkL/NNs9lnIZPZc/mnzKP7ltro
 ZEYw0HbcXVt/uFhHlwdG
 =dAbs
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-06-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two small fixes
 * fix mesh peer link counter, decrement wasn't always done at all
 * fix ethertype (length) for packets without RFC 1042 or bridge
   tunnel header
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 08:33:46 -04:00
Samuel Gauthier d913d3a763 openvswitch: fix conntrack netlink event delivery
Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.

When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.

This commit fixes the problem by calling nf_ct_deliver_cached_events in the
non-commit case as well.

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
CC: Joe Stringer <joestringer@nicira.com>
CC: Justin Pettit <jpettit@nicira.com>
CC: Andy Zhou <azhou@nicira.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 08:13:59 -04:00
David Barroso b560f03ddf neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e8 ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarroso@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuytenhek@fastly.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 07:58:28 -04:00
Felix Fietkau c041778c96 cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
The PDU length of incoming LLC frames is set to the total skb payload size
in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
includes the length of the IEEE 802.11 header.

The resulting LLC frame header has a too large PDU length, causing the
llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
skb, effectively breaking STP.

Solve the problem by properly substracting the IEEE 802.11 frame header size
from the PDU length, allowing the LLC processor to pick up the incoming
control messages.

Special thanks to Gerry Rozema for tracking down the regression and proposing
a suitable patch.

Fixes: 2d1c304cb2 ("cfg80211: add function for 802.3 conversion with separate output buffer")
Cc: stable@vger.kernel.org
Reported-by: Gerry Rozema <gerryr@rozeware.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-29 11:50:33 +02:00
Nikolay Aleksandrov 565ce8f32a net: bridge: fix vlan stats continue counter
I made a dumb off-by-one mistake when I added the vlan stats counter
dumping code. The increment should happen before the check, not after
otherwise we miss one entry when we continue dumping.

Fixes: a60c090361 ("bridge: netlink: export per-vlan stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:33:35 -04:00
Eric Dumazet a3d2e9f8eb tcp: do not send too big packets at retransmit time
Arjun reported a bug in TCP stack and bisected it to a recent commit.

In case where we process SACK, we can coalesce multiple skbs
into fat ones (tcp_shift_skb_data()), to lower write queue
overhead, because we do not expect to retransmit these packets.

However, SACK reneging can happen, forcing the sender to retransmit
all these packets. If skb->len is above 64KB, we then send buggy
IP packets that could hang TSO engine on cxgb4.

Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
so that we cook packets of optimal size vs TCP/pacing.

Thanks to Arjun for reporting the bug and running the tests !

Fixes: 10d3be5692 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun V <arjun@chelsio.com>
Tested-by: Arjun V <arjun@chelsio.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:25:11 -04:00
Sven Eckelmann 420cb1b764 batman-adv: Clean up untagged vlan when destroying via rtnl-link
The untagged vlan object is only destroyed when the interface is removed
via the legacy sysfs interface. But it also has to be destroyed when the
standard rtnl-link interface is used.

Fixes: 5d2c05b213 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:01:48 -04:00