I broke them in this commit:
commit 1be374a051
Author: Andy Lutomirski <luto@amacapital.net>
Date: Wed May 22 14:07:44 2013 -0700
net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg
This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept
MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints. It
also reverts some unnecessary checks in sys_socketcall.
Apparently I was suffering from underscore blindness the first time around.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix timeouts with direct mode authentication in mac80211, from
Stanislaw Gruszka.
2) Aggregation sessions can deadlock in ath9k, from Felix Fietkau.
3) Netfilter's xt_addrtype doesn't work with ipv6 due to route lookups
creating undesirable cache entries, from Florian Westphal.
4) Fix netfilter's ipt_ULOG from generating non-NULL terminated
strings.
5) Fix netdev transmit queue crashes in mac80211, from Johannes Berg.
6) Fix copy and paste error in 802.11 stack that broke reporting of
64-bit station tx statistics, from Felix Fietkau.
7) When qlge_probe fails, it leaks the netdev. Fix from Wei Yongjun.
8) SKB control block (where we store the IP options information,
amongst other things) must be cleared properly otherwise ICMP
sending can crash for IP tunnels. Fix from Eric Dumazet.
9) Verification of Energy Efficient Ether support was coded wrongly,
the test was inversed. Fix from Giuseppe CAVALLARO.
10) TCP handles redirects improperly because the wrong flow key is used
for the route lookup. From Michal Kubecek.
11) Don't interpret MSG_CMSG_COMPAT from userspace, fix from Andy
Lutomirski.
12) The new AF_VSOCK was missing from the lockdep string table, fix from
Federico Vaga.
13) be2net doesn't handle checksumming of IP fragments properly, from
Somnath Kotur.
14) Fix several bugs in the device address list code that lead to
crashes and other misbehaviors. From Jay Vosburgh.
15) Fix ipv6 segmentation handling of fragmented GRE tunnel traffic,
from Pravin B Shalr.
16) Fix usage of stale policies in IPSEC layer, from Paul Moore.
17) Fix team driver dump of ports when there are a large number of them,
from Jiri Pirko.
18) Fix softlockups in UDP ipv4 socket lookup causes by and error in the
hlist_nulls_for_each_entry_rcu() macro. From Eric Dumazet.
19) Fix several regressions added by the high rate accuracy changes to
the htb packet scheduler. From Eric Dumazet.
20) Fix DMA'ing onto the stack in esd_usb2 and peak_usb CAN drivers,
from Olivier Sobrie and Marc Kleine-Budde.
21) Fix unremovable network devices due to missing route pointer
installation in the per-device ipv6 address list entries. From Gao
feng.
22) Apply the tg3 5719 DMA workaround on 5720 chips as well, otherwise
we get stalls. From Nithin Sujir.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
net_sched: htb: do not mix 1ns and 64ns time units
net: fix sk_buff head without data area
tg3: Add read dma workaround for 5720
net: ethernet: xilinx_emaclite: set protocol selector bits when writing ANAR
bnx2x: Fix bridged GSO for 57710/57711 chips
net: fec: add fallback to random MAC address
bnx2x: fix TCP offload for tunneling ipv4 over ipv6
ipv6: assign rt6_info to inet6_ifaddr in init_loopback
net/mlx4_core: Keep VF assigned MAC in the PF admin table
net/mlx4_en: Handle unassigned VF MAC address correctly
net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized
net/mlx4_en: Fix adaptive moderation cq update
net: can: peak_usb: Do not do dma on the stack
net: can: esd_usb2: Do not do dma on the stack
net: can: kvaser_usb: fix reception on "USBcan Pro" and "USBcan R" type hardware.
net_sched: restore "overhead xxx" handling
net: force a reload of first item in hlist_nulls_for_each_entry_rcu
hyperv: Fix vlan_proto setting in netvsc_recv_callback()
team: fix port list dump for big number of ports
list: introduce list_first_entry_or_null
...
commit 56b765b79 ("htb: improved accuracy at high rates") added another
regression for low rates, because it mixes 1ns and 64ns time units.
So the maximum delay (mbuffer) was not 60 second, but 937 ms.
Lets convert all time fields to 1ns as 64bit arches are becoming the
norm.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet spotted that we have to check skb->head instead
of skb->data as skb->head points to the beginning of the
data area of the skbuff. Similarly, we have to initialize the
skb->head pointer, not skb->data in __alloc_skb_head.
After this fix, netlink crashes in the release path of the
sk_buff, so let's fix that as well.
This bug was introduced in (0ebd0ac net: add function to
allocate sk_buff head without data area).
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 25fb6ca4ed
"net IPv6 : Fix broken IPv6 routing table after loopback down-up"
forgot to assign rt6_info to the inet6_ifaddr.
When disable the net device, the rt6_info which allocated
in init_loopback will not be destroied in __ipv6_ifa_notify.
This will trigger the waring message below
[23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "overhead xxx" handling, as well as the "linklayer atm"
attribute.
tc class add ... htb rate X ceil Y linklayer atm overhead 10
This patch restores the "overhead xxx" handling, for htb, tbf
and act_police
The "linklayer atm" thing needs a separate fix.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vimalkumar <j.vimal@gmail.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases after deleting a policy from the SPD the policy would
remain in the dst/flow/route cache for an extended period of time
which caused problems for SELinux as its dynamic network access
controls key off of the number of XFRM policy and state entries.
This patch corrects this problem by forcing a XFRM garbage collection
whenever a policy is sucessfully removed.
Reported-by: Ondrej Moris <omoris@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
udp6 over GRE tunnel does not work after to GRE tso changes. GRE
tso handler passes inner packet but keeps track of outer header
start in SKB_GSO_CB(skb)->mac_offset. udp6 fragment need to
take care of outer header, which start at the mac_offset, while
adding fragment header.
This bug is introduced by commit 68c3316311 (GRE: Add TCP
segmentation offload for GRE).
Reported-by: Dmitry Kravkov <dkravkov@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_mc_sync_multiple function is currently calling
__hw_addr_sync, and not __hw_addr_sync_multiple. This will result in
addresses only being synced to the first device from the set.
Corrected by calling the _multiple variant.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, __hw_addr_sync_one is called in a loop by
__hw_addr_sync_multiple to sync each of a "from" device's hw addresses
to a "to" device. __hw_addr_sync_one calls __hw_addr_add_ex to attempt
to add each address. __hw_addr_add_ex is called with global=false, and
sync=true.
__hw_addr_add_ex checks to see if the new address matches an
address already on the list. If so, it tests global and sync. In this
case, sync=true, and it then checks if the address is already synced,
and if so, returns 0.
This 0 return causes __hw_addr_sync_one to increment the sync_cnt
and refcount for the "from" list's address entry, even though the address
is already synced and has a reference and sync_cnt. This will cause
the sync_cnt and refcount to increment without bound every time an
addresses is added to the "from" device and synced to the "to" device.
The fix here has two parts:
First, when __hw_addr_add_ex finds the address already exists
and is synced, return -EEXIST instead of 0.
Second, __hw_addr_sync_one checks the error return for -EEXIST,
and if so, it (a) does not add a refcount/sync_cnt, and (b) returns 0
itself so that __hw_addr_sync_multiple will not return an error.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an address is added to a subordinate interface (the "to"
list), the address entry in the "from" list is not marked "synced" as
the entry added to the "to" list is.
When performing the unsync operation (e.g., dev_mc_unsync),
__hw_addr_unsync_one calls __hw_addr_del_entry with the "synced"
parameter set to true for the case when the address reference is being
released from the "from" list. This causes a test inside to fail,
with the result being that the reference count on the "from" address
is not properly decremeted and the address on the "from" list will
never be freed.
Correct this by having __hw_addr_unsync_one call the
__hw_addr_del_entry function with the "sync" flag set to false for the
"remove from the from list" case.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sync_cnt field is not being initialized, which can result
in arbitrary values in the field. Fixed by initializing it to zero.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This stat is not relevant in IPv6, there is no checksum in IPv6 header.
Just leave a comment to explain the hole.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull this batch of fixes intended for the 3.10 stream...
Regarding the NFC bits, Samuel says:
"This is the first batch of NFC fixes for 3.10, and it contains:
- 3 fixes for the NFC MEI support:
* We now depend on the correct Kconfig symbol.
* We register an MEI event callback whenever we enable an NFC device,
otherwise we fail to read anything after an enable/disable cycle.
* We only disable an MEI device from its disable mey_phy_ops,
preventing useless consecutive disable calls.
- An NFC Makefile cleanup, as I forgot to remove a commented out line when
moving the LLCP code to the NFC top level directory."
As for the mac80211 bits, Johannes says:
"This time I have a fix from Stanislaw for a stupid mistake I made in the
auth/assoc timeout changes, a fix from Felix for 64-bit traffic counters
and one from Helmut for address mask handling in mac80211. I also have a
few fixes myself for four different crashes reported by a few people."
And Johannes says this about the iwlwifi bit:
"This fixes a brown paper-bag bug that we really should've caught in
review. More details in the changelog for the fix."
On top of that...
Arend van Spriel and Hante Meuleman cooperate to send a series of AP
and P2P mode fixes for brcmfmac.
Gabor Juhos corrects a register offset for AR9550, avoiding a bus
error.
Dan Carpenter provides a fixup to some dmesg output in the atmel
driver.
And, finally...
Felix Fietkau not only gives us a trio of small AR934x fixes, but
also refactors the ath9k aggregation session start/stop handling
(using the generic mac80211 support) in order to avoid a deadlock.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd fixes from Bruce Fields:
"A couple minor fixes for the (new to 3.10) gss-proxy code.
And one regression from user-namespace changes. (XBMC clients were
doing something admittedly weird--sending -1 gid's--but something that
we used to allow.)"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
svcrpc: fix failures to handle -1 uid's and gid's
svcrpc: implement O_NONBLOCK behavior for use-gss-proxy
svcauth_gss: fix error code in use_gss_proxy()
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
they are:
* fix xt_addrtype with IPv6, from Florian Westphal. This required
a new hook for IPv6 functions in the netfilter core to avoid
hard dependencies with the ipv6 subsystem when this match is
only used for IPv4.
* fix connection reuse case in IPVS. Currently, if an reused
connection are directed to the same server. If that server is
down, those connection would fail. Therefore, clear the
connection and choose a new server among the available ones.
* fix possible non-nul terminated string sent to user-space if
ipt_ULOG is used as the default netfilter logging stub, from
Chen Gang.
* fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
This bug has been there since 2.6.26.
* Fix breakage ip_vs_sh due to incorrect structure layout for
RCU, from Jan Beulich.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree_rcu() requires offsetof(..., rcu_head) < 4096, which can
get violated with a sufficiently high CONFIG_IP_VS_SH_TAB_BITS.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As of f025adf191 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.
Reported symptoms were xmbc clients failing on upgrade of the NFS
server; examination of the network trace showed them sending -1 as the
gid.
Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In dump_ipv6_packet(), the "recurse" parameter is zero only if
dumping contents of a packet embedded into an ICMPv6 error
message. Therefore we want to log packet mark if recurse is
non-zero, not when it is zero.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The three arrays of strings: af_family_key_strings,
af_family_slock_key_strings and af_family_clock_key_strings have not
VSOCK's string
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, trinity@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>, netdev@vger.kernel.org, "David S.
Miller" <davem@davemloft.net>
Subject: [PATCH 5/5] net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg
MSG_CMSG_COMPAT is (AFAIK) not intended to be part of the API --
it's a hack that steals a bit to indicate to other networking code
that a compat entry was used. So don't allow it from a non-compat
syscall.
This prevents an oops when running this code:
int main()
{
int s;
struct sockaddr_in addr;
struct msghdr *hdr;
char *highpage = mmap((void*)(TASK_SIZE_MAX - 4096), 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (highpage == MAP_FAILED)
err(1, "mmap");
s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (s == -1)
err(1, "socket");
addr.sin_family = AF_INET;
addr.sin_port = htons(1);
addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
if (connect(s, (struct sockaddr*)&addr, sizeof(addr)) != 0)
err(1, "connect");
void *evil = highpage + 4096 - COMPAT_MSGHDR_SIZE;
printf("Evil address is %p\n", evil);
if (syscall(__NR_sendmmsg, s, evil, 1, MSG_CMSG_COMPAT) < 0)
err(1, "sendmmsg");
return 0;
}
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Somebody noticed LTP was complaining about O_NONBLOCK opens of
/proc/net/rpc/use-gss-proxy succeeding and then a following read
hanging.
I'm not convinced LTP really has any business opening random proc files
and expecting them to behave a certain way. Maybe this isn't really a
bug.
But in any case the O_NONBLOCK behavior could be useful for someone that
wants to test whether gss-proxy is up without waiting.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
doesn't call __build_flow_key() directly but via
ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
getting pointer to IPv4 header of the ICMP redirect packet
rather than pointer to the embedded IPv4 header of the packet
initiating the redirect.
As a result, handling of ICMP redirects initiated by TCP packets
is broken. Issue was introduced by
4895c771c ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expire cached connection for new TCP/SCTP connection if real
server is down. Otherwise, IPVS uses the dead server for the
reused connection, instead of a new working one.
Signed-off-by: Grzegorz Lyczba <grzegorz.lyczba@gmail.com>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When changing the MAC address of a single vif mac80211 will check if
the new address fits into the address mask specified by the driver.
This only needs to be done when using multiple BSSIDs. Hence, check
the new address only against all other vifs.
Also fix the MAC address assignment on new interfaces if the user
changed the address of a vif such that perm_addr is not covered by
addr_mask anymore.
Resolves:
https://bugzilla.kernel.org/show_bug.cgi?id=57371
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Alessandro Lannocca <alessandro.lannocca@gmail.com>
Cc: Alessandro Lannocca <alessandro.lannocca@gmail.com>
Cc: Bruno Randolf <br1@thinktube.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since Eric's commit efe117ab8 ("Speedup ieee80211_remove_interfaces")
there's a bug in mac80211 when it unregisters with AP_VLAN interfaces
up. If the AP_VLAN interface was registered after the AP it belongs
to (which is the typical case) and then we get into this code path,
unregister_netdevice_many() will crash because it isn't prepared to
deal with interfaces being closed in the middle of it. Exactly this
happens though, because we iterate the list, find the AP master this
AP_VLAN belongs to and dev_close() the dependent VLANs. After this,
unregister_netdevice_many() won't pick up the fact that the AP_VLAN
is already down and will do it again, causing a crash.
Cc: stable@vger.kernel.org [2.6.33+]
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A lot of code in mac80211 assumes that the hw queues are
set up correctly for all interfaces (except for monitor)
but this isn't true for AP_VLAN interfaces. Fix this by
copying the AP master configuration when an AP VLAN is
brought up, after this the AP interface can't change its
configuration any more and needs to be brought down to
change it, which also forces AP_VLAN interfaces down, so
just copying in open() is sufficient.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRomGNAAoJEGcL54qWCgDyzUIQALj4hsOtdcmCCAWu0m+Pr+Gz
/6rpO2xJ5cLWyTYS2J9wmPkU+mb/g/OB/OhANyQQsDbfoW3e27TGaXRB8hUs29vk
LHTkFcrBTi4ohlw2Gyb6LWDF6cyHkC7cpH7tfIjLDff77V/qK1uqo41MtYpqUvy3
41tMoFOhvpwsy7HuKqVyYtNlwxnXrdJ5XvK0ycaQYQ9t8om9Hxu/scCgLfl/qwRr
eVhIesC2/Y1eC46UK7/TWCC7aTLkvi85UQY+fywMkajt/gPzjjh6nuhc45aOT9d7
pc0sXgIs8fLEjC+CRa0ojrziJZCHZY93U1kzA+CotRqPq76nPeFeG/1VhViKKb4C
1AU9IA4ntViUqNDQiGrCxE6ZeMewTOKksrCErwSS16Hv9Gqh4SHq84R/bF6MFgBc
dqQsexUbf/vaWHmuosARgbyc/QcBXDwWwbkq0hXSWbHA8vpc8/Lw1wkp49eyKz0V
0zwJ9xInakXr5/DIC72IKEF0Dg26L+GTXWLmDZVcdpVBp5A403trxUKckcsNa/Xf
FXaGMhyv2ZfV4AohW1Z8klkLiMHt4dr+YB7SQAgR/gb81p3odgKgMz51PfmHxFqs
4nxwRQqWplBTGiLrOFSgaRC50jLqEloUweWC2qf56KYEb9N6M6XDIXhllLIMyWLD
94cvihpKj+/ecKH5kQWF
=B8go
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available
* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix SETCLIENTID fallback if GSS is not available
SUNRPC: Prevent an rpc_task wakeup race
NFSv4.1 Fix a pNFS session draining deadlock
SUNRPC: Convert auth_gss pipe detection to work in namespaces
SUNRPC: Faster detection if gssd is actually running
SUNRPC: Fix a bug in gss_create_upcall
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:
#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
Daniel found a similar problem mentioned in
http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.
We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.
A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.
Many thanks to Daniel for providing reports, patches and testing !
Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"It's been a while since my last pull request so quite a few fixes have
piled up."
Indeed.
1) Fix nf_{log,queue} compilation with PROC_FS disabled, from Pablo
Neira Ayuso.
2) Fix data corruption on some tg3 chips with TSO enabled, from Michael
Chan.
3) Fix double insertion of VLAN tags in be2net driver, from Sarveshwar
Bandi.
4) Don't have TCP's MD5 support pass > PAGE_SIZE page offsets in
scatter-gather entries into the crypto layer, the crypto layer can't
handle that. From Eric Dumazet.
5) Fix lockdep splat in 802.1Q MRP code, also from Eric Dumazet.
6) Fix OOPS in netfilter log module when called from conntrack, from
Hans Schillstrom.
7) FEC driver needs to use netif_tx_{lock,unlock}_bh() rather than the
non-BH disabling variants. From Fabio Estevam.
8) TCP GSO can generate out-of-order packets, fix from Eric Dumazet.
9) vxlan driver doesn't update 'used' field of fdb entries when it
should, from Sridhar Samudrala.
10) ipv6 should use kzalloc() to allocate inet6 socket cork options,
otherwise we can OOPS in ip6_cork_release(). From Eric Dumazet.
11) Fix races in bonding set mode, from Nikolay Aleksandrov.
12) Fix checksum generation regression added by "r8169: fix 8168evl
frame padding.", from Francois Romieu.
13) ip_gre can look at stale SKB data pointer, fix from Eric Dumazet.
14) Fix checksum handling when GSO is enabled in bnx2x driver with
certain chips, from Yuval Mintz.
15) Fix double free in batman-adv, from Martin Hundebøll.
16) Fix device startup synchronization with firmware in tg3 driver, from
Nithin Sujit.
17) perf networking dropmonitor doesn't work at all due to mixed up
trace parameter ordering, from Ben Hutchings.
18) Fix proportional rate reduction handling in tcp_ack(), from Nandita
Dukkipati.
19) IPSEC layer doesn't return an error when a valid state is detected,
causing an OOPS. Fix from Timo Teräs.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
be2net: bug fix on returning an invalid nic descriptor
tcp: xps: fix reordering issues
net: Revert unused variable changes.
xfrm: properly handle invalid states as an error
virtio_net: enable napi for all possible queues during open
tcp: bug fix in proportional rate reduction.
net: ethernet: sun: drop unused variable
net: ethernet: korina: drop unused variable
net: ethernet: apple: drop unused variable
qmi_wwan: Added support for Cinterion's PLxx WWAN Interface
perf: net_dropmonitor: Remove progress indicator
perf: net_dropmonitor: Use bisection in symbol lookup
perf: net_dropmonitor: Do not assume ordering of dictionaries
perf: net_dropmonitor: Fix symbol-relative addresses
perf: net_dropmonitor: Fix trace parameter order
net: fec: use a more proper compatible string for MVF type device
qlcnic: Fix updating netdev->features
qlcnic: remove netdev->trans_start updates within the driver
qlcnic: Return proper error codes from probe failure paths
tg3: Update version to 3.132
...
commit 3853b5841c ("xps: Improvements in TX queue selection")
introduced ooo_okay flag, but the condition to set it is slightly wrong.
In our traces, we have seen ACK packets being received out of order,
and RST packets sent in response.
We should test if we have any packets still in host queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Copy & paste mistake - STATION_INFO_TX_BYTES64 is the name of the flag,
not NL80211_STA_INFO_TX_BYTES64.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The code I added in "mac80211: don't start new netdev queues
if driver stopped" crashes for monitor and AP VLAN interfaces
because while they have a netdev, they don't have queues set
up by the driver.
To fix the crash, exclude these from queue accounting here
and just start their netdev queues unconditionally.
For monitor, this is the best we can do, as we can redirect
frames there to any other interface and don't know which one
that will since it can be different for each frame.
For AP VLAN interfaces, we can do better later and actually
properly track the queue status. Not doing this is really a
separate bug though.
Reported-by: Ilan Peer <ilan.peer@intel.com>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a P2P-Device is present and another virtual interface triggers
the connection work, the system crash because it tries to check
if the P2P-Device's netdev (which doesn't exist) is up. Skip any
wdevs that have no netdev to fix this.
Cc: stable@vger.kernel.org
Reported-by: YanBo <dreamfly281@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If nf_log uses ipt_ULOG as logging output, we can deliver non-null
terminated strings to user-space since the maximum length of the
prefix that is passed by nf_log is NF_LOG_PREFIXLEN but pm->prefix
is 32 bytes long (ULOG_PREFIX_LEN).
This is actually happening already from nf_conntrack_tcp if ipt_ULOG
is used, since it is passing strings longer than 32 bytes.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:
[ ip6tables -m addrtype ]
When I tried to use in the nat/PREROUTING it messes up the
routing cache even if the rule didn't matched at all.
[..]
If I remove the --limit-iface-in from the non-working scenario, so just
use the -m addrtype --dst-type LOCAL it works!
This happens when LOCAL type matching is requested with --limit-iface-in,
and the default ipv6 route is via the interface the packet we test
arrived on.
Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
creates an unwanted cached entry, and the packet won't make it to the
real/expected destination.
Silently ignoring --limit-iface-in makes the routing work but it breaks
rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
match if the dst address is configured on the incoming interface;
without --limit-iface-in it will match if the address is reachable
via lo).
The test should call ipv6_chk_addr() instead. However, this would add
a link-time dependency on ipv6.
There are two possible solutions:
1) Revert the commit that moved ipt_addrtype to xt_addrtype,
and put ipv6 specific code into ip6t_addrtype.
2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.
While the former might seem preferable, Pablo pointed out that there
are more xt modules with link-time dependeny issues regarding ipv6,
so lets go for 2).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The error exit path needs err explicitly set. Otherwise it
returns success and the only caller, xfrm_output_resume(),
would oops in skb_dst(skb)->ops derefence as skb_dst(skb) is
NULL.
Bug introduced in commit bb65a9cb (xfrm: removes a superfluous
check and add a statistic).
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Li RongQing <roy.qing.li@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a fix for a bug triggering newly_acked_sacked < 0
in tcp_ack(.).
The bug is triggered by sacked_out decreasing relative to prior_sacked,
but packets_out remaining the same as pior_packets. This is because the
snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
prior_sacked is captured before tcp_sacktag_write_queue(). The problem
is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
adjusts the pcount for packets_out and sacked_out (MSS change or other
reason). As a result, this delta in pcount is reflected in
(prior_sacked - sacked_out) but not in (prior_packets - packets_out).
This patch does the following:
1) initializes prior_packets at the start of tcp_ack() so as to
capture the delta in packets_out created by tcp_fragment.
2) introduces a new "previous_packets_out" variable that snapshots
packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
correctly computed as before.
3) Computes pkts_acked using previous_packets_out, and computes
newly_acked_sacked using prior_packets.
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
be careful about ordering the calls to rpc_test_and_set_running(task) and
rpc_clear_queued(task). If we get the order wrong, then we may end up
testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
and changed the state of the rpc_task.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
On errors in batadv_mesh_init(), bat_counters will be freed in both
batadv_mesh_free() and batadv_softif_init_late(). This patch fixes this
by returning earlier from batadv_softif_init_late() in case of errors in
batadv_mesh_init() and by setting bat_counters to NULL after freeing.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
The Kconfig symbol NFC_LLCP was removed in commit 30cc458765 ("NFC: Move
LLCP code to the NFC top level diirectory"). But the reference to its
macro in this Makefile was only commented out. Remove it now.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Another fix needed in ipgre_err(), as parse_gre_header() might change
skb->head.
Bug added in commit c544193214 (GRE: Refactor GRE tunneling code.)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!
That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.
In addition, commit 6d4f0139d6 added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.
socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
'discovery->data.info' length is 22, NICKNAME_MAX_LEN is 21, so the
strncpy() will always left the last byte of 'discovery->data.info'
uninitialized.
When 'text' length is longer than 21 (NICKNAME_MAX_LEN), if still left
the last byte of 'discovery->data.info' uninitialized, the next
strlen() will cause issue.
Also 'discovery->data' is 'struct irda_device_info' which defined in
"include/uapi/...", it may copy to user mode, so need whole initialized.
All together, need use kzalloc() instead of kmalloc() to initialize all
members firstly.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net/netlabel/netlabel_domainhash.c:netlbl_domhsh_add() function
does not properly validate new domain hash entries resulting in
potential problems when an administrator attempts to add an invalid
entry. One such problem, as reported by Vlad Halilov, is a kernel
BUG (found in netlabel_domainhash.c:netlbl_domhsh_audit_add()) when
adding an IPv6 outbound mapping with a CIPSO configuration.
This patch corrects this problem by adding the necessary validation
code to netlbl_domhsh_add() via the newly created
netlbl_domhsh_validate() function.
Ideally this patch should also be pushed to the currently active
-stable trees.
Reported-by: Vlad Halilov <vlad.halilov@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 0178b695fd ("ipv6: Copy cork options in ip6_append_data")
added some code duplication and bad error recovery, leading to potential
crash in ip6_cork_release() as kfree() could be called with garbage.
use kzalloc() to make sure this wont happen.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
We send direct probe to broadcast address, as some APs do not respond to
unicast PROBE frames when unassociated. Broadcast frames are not acked,
so we can not use that for trigger MLME state machine, but we need to
use old timeout mechanism.
This fixes authentication timed out like below:
[ 1024.671974] wlan6: authenticate with 54:e6:fc:98:63:fe
[ 1024.694125] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3)
[ 1024.695450] wlan6: direct probe to 54:e6:fc:98:63:fe (try 2/3)
[ 1024.700586] wlan6: send auth to 54:e6:fc:98:63:fe (try 3/3)
[ 1024.701441] wlan6: authentication with 54:e6:fc:98:63:fe timed out
With fix, we have:
[ 4524.198978] wlan6: authenticate with 54:e6:fc:98:63:fe
[ 4524.220692] wlan6: direct probe to 54:e6:fc:98:63:fe (try 1/3)
[ 4524.421784] wlan6: send auth to 54:e6:fc:98:63:fe (try 2/3)
[ 4524.423272] wlan6: authenticated
[ 4524.423811] wlan6: associate with 54:e6:fc:98:63:fe (try 1/3)
[ 4524.427492] wlan6: RX AssocResp from 54:e6:fc:98:63:fe (capab=0x431 status=0 aid=1)
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>