Commit Graph

23096 Commits

Author SHA1 Message Date
Florian Westphal 7a909ac70f netfilter: limit, hashlimit: avoid duplicated inline
credit_cap can be set to credit, which avoids inlining user2credits
twice. Also, remove inline keyword and let compiler decide.

old:
    684     192       0     876     36c net/netfilter/xt_limit.o
   4927     344      32    5303    14b7 net/netfilter/xt_hashlimit.o
now:
    668     192       0     860     35c net/netfilter/xt_limit.o
   4793     344      32    5169    1431 net/netfilter/xt_hashlimit.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:54:06 +02:00
Hans Schillstrom cf308a1fae netfilter: add xt_hmark target for hash-based skb marking
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:54:05 +02:00
Hans Schillstrom 84018f55ab netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:

* know if this is a fragment.
* stop at the AH header, so the information contained in that header
  can be used for some specific packet handling.

This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09 12:53:47 +02:00
Ashok Nagarajan 70c33eaae7 {nl,cfg,mac}80211: Allow user to see/configure HT protection mode
This patch introduces a new mesh configuration parameter "ht_opmode" and will
allow user to check the current HT protection mode selected. Users could
configure the protection mode by the command "iw mesh_iface set mesh_param
mesh_ht_protection_mode=2". The default protection mode of mesh is set to
non-HT mixed mode.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:59 -04:00
Ashok Nagarajan b91e64aad2 mac80211: Allow nonHT/HT peering in mesh
Now that we have protection enabled, allow non-HT and HT20 stations to peer
with HT40+/- stations. Peering is still disallowed for HT40+/- mismatch.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:58 -04:00
Ashok Nagarajan 57aac7c51c mac80211: Implement HT mixed protection mode
Section 9.23.3.5 of IEEE 80211s standard describes the protection rules for
HT mesh STA in a MBSS. Three HT protection modes are supported for now:

non-HT mixed mode - is selected if any non-HT peers are present in our MBSS.
20MHz-protection mode - is selected if all peers in our 20/40MHz MBSS support
			HT and atleast one HT20 peer is present.
no-protection mode - is selected otherwise.

This is a limited implementation of 9.23.3.5, which only considers mesh peers
when determining the HT protection mode. Station's channel_type needs to be
maintained.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:57 -04:00
Ashok Nagarajan 431e315423 mac80211: Advertise HT protection mode in IEs
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:57 -04:00
Emmanuel Grumbach ef9456a85d cfg80211: fix BSS comparison
Since the BSS table is organized in a RB tree, the BSSs need to be
comparable. This means that we must define a < and > operator to
the BSS object.
compare_ethr_addr isn't enough since it returns only a binary value.

Since Felix's

cfg80211: use compare_ether_addr on MAC addresses instead of memcmp

    Because of the constant size and guaranteed 16 bit alignment, the inline
    compare_ether_addr function is much cheaper than calling memcmp.

    Signed-off-by: Felix Fietkau <nbd@openwrt.org>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

The BSS table is corrupted: rb_find_bss can't find the bss.
As a result BSSes are duplicated in the BSS table, and we get stuck
while probing an AP before associating (in STA mode).

Change-Id: I85928756f4328028230832c1565ece7f412f3843
CC: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:56 -04:00
Ben Greear 3073a7c20c mac80211: Add more ethtools stats: survey, rates, etc
The signal and noise are forced to be positive since ethtool
deals in unsigned 64-bit values and this number should be human
readable.  This gives easy access to some of the data formerly
exposed in the deprecated /proc/net/wireless file.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:52 -04:00
Ben Greear e352114fd6 mac80211: Framework to get wifi-driver stats via ethtool.
This adds hooks to call into the driver to get additional
stats for the ethtool API.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:51 -04:00
Ben Greear b1ab79255c mac80211: Support getting sta_info stats via ethtool.
This lets ethtool print out stats related to stations
connected to the interface.  Does not yet get stats
from the underlying driver.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:50 -04:00
Ben Greear d61992182e cfg80211: Add framework to support ethtool stats.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:49 -04:00
Thomas Pedersen c7d258288f mac80211: don't transmit 40MHz frames to 20MHz peer
If a mesh peer indicates it is operating as 20MHz-only in its HT
operation IE, have the rate control algorithm respect this by disabling
the equivalent bit in the ieee80211_sta HT capabilities.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:42 -04:00
Thomas Pedersen e87278e730 mac80211: insert mesh peer after init
Drivers need the station rate info when inserting a new sta_info. The
patch "mac80211: refactor mesh peer initialization" wrongly assumed the
rate info could be applied after insertion. After further review, this
is clearly not the case.

This fixes a regression where HT parameters were not applied before
inserting the sta_info, causing performance degradation.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:42 -04:00
Rajkumar Manoharan 91a0099c19 mac80211: fix rate control update on 2040 bss change
The rate control updation never be called on 2040 BSS change.
The station should update its rate control on receiving beacon
with different HT mode in the HT operation IE. Not doing so,
leads to sending frames with higher(ht40) rates whereas AP is
operating in lower mode (ht20).

Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-08 21:53:39 -04:00
David S. Miller c5baa80ac8 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-05-08 19:31:32 -04:00
Pablo Neira Ayuso d16cf20e2f netfilter: remove ip_queue support
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 20:25:42 +02:00
Pablo Neira Ayuso 6714cf5465 netfilter: nf_conntrack: fix explicit helper attachment and NAT
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:44:42 +02:00
Kelvie Wong 9768e1ace4 netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
This refreshes the "timeout" attribute in existing expectations if one is
given.

The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.

I use this specifically for forwarding DCERPC traffic through:

DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.

Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:40:59 +02:00
Hans Schillstrom 6d8ebc8a27 net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync
To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.

The dependency was added by "ipvs: wakeup master thread" patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:40:56 +02:00
H Hartley Sweeten 068d522067 ipvs: ip_vs_proto: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:54 +02:00
H Hartley Sweeten d5cce20874 ipvs: ip_vs_ftp: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:52 +02:00
Pablo Neira Ayuso 6b324dbfc3 ipvs: optimize the use of flags in ip_vs_bind_dest
cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:49 +02:00
Pablo Neira Ayuso f73181c828 ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:33 +02:00
Julian Anastasov 749c42b620 ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:10 +02:00
Pablo Neira Ayuso 1c003b1580 ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:39:53 +02:00
Julian Anastasov cdcc5e905d ipvs: always update some of the flags bits in backup
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:31 +02:00
Julian Anastasov 882a844bd5 ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.

	As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:28 +02:00
Julian Anastasov 06611f82cc ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:26 +02:00
Julian Anastasov 82cfc06278 ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:24 +02:00
Sasha Levin 9615e61e6f ipvs: use GFP_KERNEL allocation where possible
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.

This is safe since it will always run from a process context.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:38:22 +02:00
Julian Anastasov d6318f08e8 ipvs: SH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:28 +02:00
Julian Anastasov 45d4e71a39 ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:26 +02:00
Julian Anastasov 4f2a94dcb6 ipvs: WRR scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:22 +02:00
Julian Anastasov 4beddbe38c ipvs: DH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:20 +02:00
Julian Anastasov 748d845ca9 ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:17 +02:00
Julian Anastasov 41cff6d5c9 ipvs: timeout tables do not need GFP_ATOMIC allocation
They are called only on initialization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:37:09 +02:00
Pablo Neira Ayuso 4981682cc1 netfilter: bridge: optionally set indev to vlan
if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:36:47 +02:00
Eric Leblond a900689264 netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:35:18 +02:00
Tony Zelenoff 031d7709f2 netfilter: nf_ct_ecache: refactor notifier registration
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
  flow became more plain

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:17:23 +02:00
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Jiri Pirko 3a084ddb4b net: IP_MULTICAST_IF setsockopt now recognizes struct mreq
Until now, struct mreq has not been recognized and it was worked with
as with struct in_addr. That means imr_multiaddr was copied to
imr_address. So do recognize struct mreq here and copy that correctly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:03:22 -04:00
Pravin B Shelar 072ae6314a openvswitch: Validation of IPv6 set port action uses IPv4 header
When the kernel validates set TCP/UDP port actions, it looks at
the ports in the existing flow to make sure that the L4 header exists.
However, these actions always use the IPv4 version of the struct.
Following patch fixes this by checking for flow ip protocol first.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-07 17:23:10 -07:00
Alexander Duyck ec47ea8247 skb: Add inline helper for getting the skb end offset from head
With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head.  Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-06 13:13:19 -04:00
Alexander Duyck 3e24591a19 skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
Since there is now only one spot that actually uses "fastpath" there isn't
much point in carrying it.  Instead we can just use a check for skb_cloned
to verify if we can perform the fast-path free for the head or not.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-06 13:13:19 -04:00
Alexander Duyck 9202e31d46 skb: Drop bad code from pskb_expand_head
The fast-path for pskb_expand_head contains a check where the size plus the
unaligned size of skb_shared_info is compared against the size of the data
buffer.  This code path has two issues.  First is the fact that after the
recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
is always placed in the optimal spot for a buffer size making this check
unnecessary.  The second issue is the fact that the check doesn't take into
account the aligned size of shared info.  As a result the code burns cycles
doing a memcpy with nothing actually being shifted.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-06 13:13:19 -04:00
Eric Dumazet bd14b1b2e2 tcp: be more strict before accepting ECN negociation
It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Perry Lorier <perryl@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Wilmer van der Gaast <wilmer@google.com>
Cc: Ankur Jain <jankur@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Dave Täht <dave.taht@bufferbloat.net>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 12:05:27 -04:00
Eric Dumazet 1704575519 net: sched: factorize code (qdisc_drop())
Use qdisc_drop() helper where possible.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 11:50:05 -04:00
Ansis Atteka 4cb6e116bb openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.
This patch fixes a possible lock-up bug where rtnl_lock might not
get released.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-03 18:40:38 -07:00
Linus Torvalds c42f1d4b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix
    from Ingo van Lil.

 2) Propagate the negative packet offset fix into the PowerPC BPF JIT.
    From Jan Seiffert.

 3) dl2k driver's private ioctls were letting unprivileged tasks make
    MII writes and other ugly bits like that.  Fix from Jeff Mahoney.

 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund.

 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and
    Julian Anastasov.

 6) Fix races and sleeping in locked context bugs in drop_monitor, from
    Neil Horman.

 7) Fix link status indication in smsc95xx driver, from Paolo Pisati.

 8) Fix bridge netfilter OOPS, from Peter Huang.

 9) L2TP sendmsg can return on error conditions with the socket lock
    held, oops.  Fix from Sasha Levin.

10) udp_diag should return meaningful values for socket memory usage,
    from Shan Wei.

11) Eric Dumazet is so awesome he gets his own section:

       Socket memory cgroup code (I never should have applied those
       patches, grumble...) made erroneous changes to
       sk_sockets_allocated_read_positive().  It was changed to
       use percpu_counter_sum_positive (which requires BH disabling)
       instead of percpu_counter_read_positive (which does not).
       Revert back to avoid crashes and lockdep warnings.

       Adjust the default tcp_adv_win_scale and tcp_rmem[2] values
       to fix throughput regressions.  This is necessary as a result
       of our more precise skb->truesize tracking.

       Fix SKB leak in netem packet scheduler.

12) New device IDs for various bluetooth devices, from Manoj Iyer,
    AceLan Kao, and Steven Harms.

13) Fix command completion race in ipw2200, from Stanislav Yakovlev.

14) Fix rtlwifi oops on unload, from Larry Finger.

15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver.
    From Stephane Fillod.

16) ehea driver registers it's IRQ before all the necessary state is
    setup, resulting in crashes.  Fix from Thadeu Lima de Souza
    Cascardo.

17) Fix PHY connection failures in davinci_emac driver, from Anatolij
    Gustschin.

18) Missing break; in switch statement in bluetooth's
    hci_cmd_complete_evt().  Fix from Szymon Janc.

19) Fix queue programming in iwlwifi, from Johannes Berg.

20) Interrupt throttling defaults not being actually programmed into the
    hardware, fix from Jeff Kirsher and Ying Cai.

21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from
    Benjamin Poirier.

22) Fix blind status block RX producer pointer deref in TG3 driver, from
    Matt Carlson.

23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de
    Souza Cascardo.

24) Fix crashes in 6lowpan, from Alexander Smirnov.

25) tcp_complete_cwr() needs to be careful to not rewind the CWND to
    ssthresh if ssthresh has the "infinite" value.  Fix from Yuchung
    Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  sungem: Fix WakeOnLan
  tcp: change tcp_adv_win_scale and tcp_rmem[2]
  net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
  drop_monitor: prevent init path from scheduling on the wrong cpu
  usbnet: fix failure handling in usbnet_probe
  usbnet: fix leak of transfer buffer of dev->interrupt
  ucc_geth: Add 16 bytes to max TX frame for VLANs
  net: ucc_geth, increase no. of HW RX descriptors
  netem: fix possible skb leak
  sky2: fix receive length error in mixed non-VLAN/VLAN traffic
  sky2: propogate rx hash when packet is copied
  net: fix two typos in skbuff.h
  cxgb3: Don't call cxgb_vlan_mode until q locks are initialized
  ixgbe: fix calling skb_put on nonlinear skb assertion bug
  ixgbe: Fix a memory leak in IEEE DCB
  igbvf: fix the bug when initializing the igbvf
  smsc75xx: enable mac to detect speed/duplex from phy
  smsc75xx: declare smsc75xx's MII as GMII capable
  smsc75xx: fix phy interrupt acknowledge
  smsc75xx: fix phy init reset loop
  ...
2012-05-03 17:10:39 -07:00
Alexander Duyck 3a7c1ee4ab skb: Add skb_head_is_locked helper function
This patch adds support for a skb_head_is_locked helper function.  It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag.  If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 13:18:37 -04:00
Eric Dumazet 715dc1f342 net: Fix truesize accounting in skb_gro_receive()
GRO is very optimistic in skb truesize estimates, only taking into
account the used part of fragments.

Be conservative, and use more precise computation, so that bloated GRO
skbs can be collapsed eventually.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 13:18:37 -04:00
Steve Dickson 7bdf7415a6 auth_gss: the list of pseudoflavors not being parsed correctly
gss_mech_list_pseudoflavors() parses a list of registered mechanisms.
On that list contains a list of pseudo flavors which was not being
parsed correctly, causing only the first pseudo flavor to be found.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-03 12:35:33 -04:00
Alexander Duyck 34a802a5b9 tcp: move stats merge to the end of tcp_try_coalesce
This change cleans up the last bits of tcp_try_coalesce so that we only
need one goto which jumps to the end of the function.  The idea is to make
the code more readable by putting things in a linear order so that we start
execution at the top of the function, and end it at the bottom.

I also made a slight tweak to the code for handling frags when we are a
clone.  Instead of making it an if (clone) loop else nr_frags = 0 I changed
the logic so that if (!clone) we just set the number of frags to 0 which
disables the for loop anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
Alexander Duyck 57b55a7ec6 tcp: Move code related to head frag in tcp_try_coalesce
This change reorders the code related to the use of an skb->head_frag so it
is placed before we check the rest of the frags.  This allows the code to
read more linearly instead of like some sort of loop.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
Alexander Duyck c73c3d9c49 tcp: Fix truesize accounting in tcp_try_coalesce
This patch addresses several issues in the way we were tracking the
truesize in tcp_try_coalesce.

First it was using ksize which prevents us from having a 0 sized head frag
and getting a usable result.  To resolve that this patch uses the end
pointer which is set based off either ksize, or the frag_size supplied in
build_skb.  This allows us to compute the original truesize of the entire
buffer and remove that value leaving us with just what was added as pages.

The second issue was the use of skb->len if there is a mergeable head frag.
We should only need to remove the size of an data aligned sk_buff from our
current skb->truesize to compute the delta for a buffer with a reused head.
By using skb->len the value of truesize was being artificially reduced
which means that head frags could use more memory than buffers using
standard allocations.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
David S. Miller 8c1ae10d79 net: Add missing linux/prefetch.h include to net/core/sock.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 02:25:55 -04:00
Alexander Duyck 2996d31f9f net: Stop decapitating clones that have a head_frag
This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 01:34:37 -04:00
Eric Dumazet b081f85c29 net: implement tcp coalescing in tcp_queue_rcv()
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Eric Dumazet 923dd347b8 net: take care of cloned skbs in tcp_try_coalesce()
Before stealing fragments or skb head, we must make sure skbs are not
cloned.

Alexander was worried about destination skb being cloned : In bridge
setups, a driver could be fooled if skb->data_len would not match skb
nr_frags.

If source skb is cloned, we must take references on pages instead.

Bug happened using tcpdump (if not using mmap())

Introduce kfree_skb_partial() helper to cleanup code.

Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Eric Dumazet b49960a05e tcp: change tcp_adv_win_scale and tcp_rmem[2]
tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:08:58 -04:00
Sasha Levin 84768edbb2 net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
l2tp_ip_sendmsg could return without releasing socket lock, making it all the
way to userspace, and generating the following warning:

[  130.891594] ================================================
[  130.894569] [ BUG: lock held when returning to user space! ]
[  130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G        W
[  130.900336] ------------------------------------------------
[  130.902996] trinity/8384 is leaving the kernel with locks still held!
[  130.906106] 1 lock held by trinity/8384:
[  130.907924]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550

Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c").

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:04:33 -04:00
Neil Horman 4fdcfa1284 drop_monitor: prevent init path from scheduling on the wrong cpu
I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error.  drop monitor maintains a per cpu structure,
that gets initalized from a single cpu.  Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself .  Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu.  If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.

This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from.  In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu

Tested successfully by myself.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:02:48 -04:00
Yuchung Cheng 750ea2bafa tcp: early retransmit: delayed fast retransmit
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng eed530b6c6 tcp: early retransmit
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng 1fbc340514 tcp: early retransmit: tcp_enter_recovery()
This a prepartion patch that refactors the code to enter recovery
into a new function tcp_enter_recovery(). It's needed to implement
the delayed fast retransmit in ER.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:09 -04:00
John W. Linville 076e7779c0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-01 14:14:05 -04:00
Eric Dumazet 116a0fc31c netem: fix possible skb leak
skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 13:40:48 -04:00
Eric Dumazet e4ae004b84 netem: add ECN capability
Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
Eric Dumazet e4cbb02a10 net: add a prefetch in socket backlog processing
TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
James Chapman 5dac94e109 l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6
The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston a32e0eec70 l2tp: introduce L2TPv3 IP encapsulation support for IPv6
L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston a495f8364e ipv6: Export ipv6 functions for use by other protocols
For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston f9bac8df90 l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston 2121c3f571 l2tp: show IPv6 addresses in l2tp debugfs file
If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
James Chapman b79585f537 l2tp: pppol2tp_connect() handles ipv6 sockaddr variants
Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
James Chapman c8657fd50a l2tp: remove unused stats from l2tp_ip socket
The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
James Chapman de3c7a1827 l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()
Cleanup the l2tp_ip code to make use of an existing ipv4 support function.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
James Chapman 5de7aee541 l2tp: fix locking of 64-bit counters for smp
L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
David S. Miller b6d151bb82 Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-04-30 21:42:30 -04:00
Eric Dumazet 1d0c0b328a net: makes skb_splice_bits() aware of skb->head_frag
__skb_splice_bits() can check if skb to be spliced has its skb->head
mapped to a page fragment, instead of a kmalloc() area.

If so we can avoid a copy of the skb head and get a reference on
underlying page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:50 -04:00
Eric Dumazet 329033f645 tcp: makes tcp_try_coalesce aware of skb->head_frag
TCP coalesce can check if skb to be merged has its skb->head mapped to a
page fragment, instead of a kmalloc() area.

We had to disable coalescing in this case, for performance reasons.

We 'upgrade' skb->head as a fragment in itself.

This reduces number of cache misses when user makes its copies, since a
less sk_buff are fetched.

This makes receive and ofo queues shorter and thus reduce cache line
misses in TCP stack.

This is a followup of patch "net: allow skb->head to be a page fragment"

Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce"
counter being incremented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:49 -04:00
Eric Dumazet d7e8883cfc net: make GRO aware of skb->head_frag
GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:49 -04:00
Eric Dumazet d3836f21b0 net: allow skb->head to be a page fragment
skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.

We have three spots were it hurts :

1) GRO aggregation

 When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.

2) splice(socket -> pipe).

 We must copy the linear part to a page fragment.
 This kind of defeats splice() purpose (zero copy claim)

3) TCP coalescing.

 Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)

Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.

build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.

Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.

This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:11 -04:00
Paul Gortmaker 617d3c7a50 tipc: compress out gratuitous extra carriage returns
Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-30 15:53:56 -04:00
Felix Fietkau 66f2c99af3 mac80211: fix AP mode EAP tx for VLAN stations
EAP frames for stations in an AP VLAN are sent on the main AP interface
to avoid race conditions wrt. moving stations.
For that to work properly, sta_info_get_bss must be used instead of
sta_info_get when sending EAP packets.
Previously this was only done for cooked monitor injected packets, so
this patch adds a check for tx->skb->protocol to the same place.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-30 14:40:05 -04:00
Yuchung Cheng 1cebce36d6 tcp: fix infinite cwnd in tcp_complete_cwr()
When the cwnd reduction is done, ssthresh may be infinite
if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
infinite ssthresh value. The correct operation is to keep cwnd
intact because it has been updated in ECN or F-RTO.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:44:39 -04:00
Herbert Xu bb63f1f8a0 bridge: Fix fatal typo in setup of multicast_querier_expired
Unfortunately it seems that I didn't properly test the case of
an expired external querier in the recent multicast bridge series.

The setup of the timer in that case is completely broken and leads
to a NULL-pointer dereference.  This patch fixes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:30:56 -04:00
David S. Miller 5414fc12e3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-30 13:23:22 -04:00
David S. Miller d499bd2ee9 l2tp: Add missing net/net/ip6_checksum.h include.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:21:28 -04:00
Trond Myklebust cbbb34498f SUNRPC: RPC client must use the current utsname hostname string
Now that the rpc client is namespace aware, it needs to use the
utsname of the process that created it instead of using the
init_utsname. Both rpc_new_client and rpc_clone_client need to
be fixed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-04-30 11:58:51 -04:00
Pablo Neira Ayuso 6cf5185248 netfilter: xt_CT: fix wrong checking in the timeout assignment path
The current checking always succeeded. We have to check the first
character of the string to check that it's empty, thus, skipping
the timeout path.

This fixes the use of the CT target without the timeout option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-30 10:40:36 +02:00
Hans Schillstrom 8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 4b984cd50b ipvs: null check of net->ipvs in lblc(r) shedulers
Avoid crash when registering shedulers after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:14 +02:00
Benjamin LaHaise d2cf336167 net/l2tp: add support for L2TP over IPv6 UDP
Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
Support has been tested with and without hardware offloading.  This
version fixes the L2TP over localhost issue with incorrect checksums
being reported.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise d7f3f62167 net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6
Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise cb80ef463d net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb()
In order to make sure that when the encap_rcv() hook is introduced it is
not called with the socket lock held, move socket locking from callers into
udpv6_queue_rcv_skb(), matching what happens in IPv4.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise f7ad74fef3 net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb
This is the first step in reworking the IPv6 UDP code to be structured more
like the IPv4 UDP code.  This patch creates __udpv6_queue_rcv_skb() with
the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the
backlog_rcv method.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:50 -04:00
Jeffrin Jose cb75a36c8a net: Fixed a coding style issue related to spaces.
Fixed a coding style issue relating to spaces
in net/core/sock.c

Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 21:45:00 -04:00
Neil Horman 3885ca785a drop_monitor: Make updating data->skb smp safe
Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections.  Specifically, its possible to replace data->skb while its
being written.  This patch corrects that by making data->skb an rcu protected
variable.  That will prevent it from being overwritten while a tracepoint is
modifying it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 02:18:48 -04:00
Neil Horman cde2e9a651 drop_monitor: fix sleeping in invalid context warning
Eric Dumazet pointed out this warning in the drop_monitor protocol to me:

[   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[   38.352582] Call Trace:
[   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
[   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b

It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 02:18:48 -04:00
John W. Linville 4dcc0637fc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-27 15:16:43 -04:00
Stanislav Kinsbursky ea8cfa0679 SUNRPC: traverse clients tree on PipeFS event
v2: recursion was replaced by loop

If client is a clone, then it's parent can not be in the list.
But parent's Pipefs dentries have to be created and destroyed.

Note: event skip helper for clients introduced

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky 37629b572c SUNRPC: set per-net PipeFS superblock before notification
There can be a case, when on MOUNT event RPC client (after it's dentries were
created) is not longer hold by anyone except notification callback.
I.e. on release this client will be destoroyed. And it's dentries have to be
destroyed as well. Which in turn requires per-net PipeFS superblock to be set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky 7aab449e5a SUNRPC: skip clients with program without PipeFS entries
1) This is sane.
2) Otherwise there will be soft lockup:

do {
	rpc_get_client_for_event (clnt->cl_dentry == NULL ==> choose)
	__rpc_pipefs_event (clnt->cl_program->pipe_dir_name == NULL ==> return)
} while (1)

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Stanislav Kinsbursky a4dff1bc49 SUNRPC: skip dead but not buried clients on PipeFS events
These clients can't be safely dereferenced if their counter in 0.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Neal Cardwell 651913ce9d tcp: clean up use of jiffies in tcp_rcv_rtt_measure()
Clean up a reference to jiffies in tcp_rcv_rtt_measure() that should
instead reference tcp_time_stamp. Since the result of the subtraction
is passed into a function taking u32, this should not change any
behavior (and indeed the generated assembly does not change on
x86_64). However, it seems worth cleaning this up for consistency and
clarity (and perhaps to avoid bugs if this is copied and pasted
somewhere else).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 12:34:39 -04:00
Allan Stephens aad585473f tipc: Reject payload messages with invalid message type
Adds check to ensure TIPC sockets reject incoming payload messages
that have an unrecognized message type.

Remove the old open question about whether TIPC_ERR_NO_PORT is
the proper return value.  It is appropriate here since there are
valid instances where another node can make use of the reply,
and at this point in time the host is already broadcasting TIPC
data, so there are no real security concerns.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-27 10:08:00 -04:00
Eric Dumazet 8298193012 net: cleanups in sock_setsockopt()
Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to
match !!sock_flag()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 02:14:21 -04:00
hartleys feb50ac19e crush: include header for global symbols
Include the header to pickup the definitions of the global symbols.

Quiets the following sparse warnings:

warning: symbol 'crush_find_rule' was not declared. Should it be static?
warning: symbol 'crush_do_rule' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
Eric Dumazet 6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
Allan Stephens 8f17789693 tipc: Enhance error checking of published names
Consolidates validation of scope and name sequence range values into
a single routine where it applies both to local name publications
and to name publications issued by other nodes in the network. This
change means that the scope value for non-local publications is now
validated and the name sequence range for local publications is now
validated only once. Additionally, a publication attempt that fails
validation now creates an entry in the system log file only if debugging
capabilities have been enabled; this prevents the system log from being
cluttered up with messages caused by a defective application or network
node.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:48 -04:00
Allan Stephens f7fb9d20ad tipc: Create helper routine to delete unused name sequence structure
Replaces two identical chunks of code that delete an unused name
sequence structure from TIPC's name table with calls to a new routine
that performs this operation.

This change is cosmetic and doesn't impact the operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:47 -04:00
Allan Stephens bbe6a295d0 tipc: remove redundant memset and stale comment from subscr.c
Eliminate code to zero-out the main topology service structure,
which is already zeroed-out.

Get rid of a comment documenting a field of the main topology
service structure that no longer exists.

Both are cosmetic changes with no impact on runtime behaviour.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:46 -04:00
Allan Stephens 2d98abb9fe tipc: Optimize initialization of network topology service
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet.  With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:45 -04:00
Allan Stephens eb3865a99d tipc: Enhance re-initialization of network topology service
Streamlines the job of re-initializing TIPC's network topology service
when a node's network address is first assigned. Rather than destroying
the topology server port and breaking its connections to existing
subscribers, TIPC now simply lets the service continue running (since
the change to the port identifier of each port used by the topology
service no longer impacts the flow of messages between the service and
its subscribers).

This enhancement means that applications that utilize the topology
service prior to the assignment of TIPC's network address no longer need
to re-establish their subscriptions when the address is finally assigned.

However, it is worth noting that any subsequent events for existing
subscriptions report the new port identifier of the publishing port,
rather than the original port identifier. (For example, a name that was
previously reported as being published by <0.0.0:ref> may be subsequently
withdrawn by <Z.C.N:ref>.)

This doesn't impact any of the existing known userspace in tipc-utils,
since (a) TIPC continues to treat references to the original port ID
correctly and (b) normal use cases assign an address before active use.

However if there does happen to be some rare/custom application out
there that was relying on this, they can simply bypass the enhancement
by issuing a subscription to {0,0} and break its connection to the
topology service, if an associated withdrawal event occurs.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:40 -04:00
Allan Stephens eb323b075a tipc: Optimize termination of configuration service
Termination no longer tests to see if the configuration service
port was successfully created or not. In the unlikely event that the
port was not created, attempting to delete the non-existent port is
detected gracefully and causes no harm.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:43 -04:00
Allan Stephens 861d3a0e5b tipc: Optimize initialization of configuration service
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet.  With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:42 -04:00
Allan Stephens a2cfd45b52 tipc: Optimize re-initialization of configuration service
Streamlines the job of re-initializing TIPC's configuration service
when a node's network address is first assigned. Rather than destroying
the configuration server port and then recreating it, TIPC now simply
withdraws the existing {0,<0.0.0>} name publication and creates a new
{0,<Z.C.N>} name publication that identifies the node's network address
to interested subscribers.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:07 -04:00
David S. Miller a85c9bb895 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-04-26 15:54:45 -04:00
John W. Linville d9b8ae6bd8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
2012-04-26 15:03:48 -04:00
Pavel Emelyanov de248a75c3 tcp repair: Fix unaligned access when repairing options (v2)
Don't pick __u8/__u16 values directly from raw pointers, but instead use
an array of structures of code:value pairs. This is OK, since the buffer
we take options from is not an skb memory, but a user-to-kernel one.

For those options which don't require any value now, require this to be
zero (for potential future extension of this API).

v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:13:51 -04:00
alex.bluesman.smirnov@gmail.com 2d319508a3 6lowpan: duplicate definition of IEEE802154_ALEN
The same macros is defined in 'include/net/af_ieee802154.h' and is called
IEEE802154_ADDR_LEN. No need another one, so remove it.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:01:09 -04:00
alex.bluesman.smirnov@gmail.com c2e94d73ea 6lowpan: move frame allocation code to a separate function
Separate frame allocation routine from data processing function.
This makes code more human readable and easier for understanding.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:01:09 -04:00
alex.bluesman.smirnov@gmail.com 768f7c7c12 6lowpan: add missing spin_lock_init()
Add missing spin_lock_init() for frames list lock.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
alex.bluesman.smirnov@gmail.com 8deff4af87 6lowpan: clean up fragments list if module unloaded
Clean all the pending fragments and relative timers if 6lowpan link
is going to be deleted.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
alex.bluesman.smirnov@gmail.com 0848e40430 6lowpan: fix segmentation fault caused by mlme request
Add nescesary mlme callbacks to satisfy "iz list" request from user space.
Due to 6lowpan device doesn't have its own phy, mlme implemented as a pipe
to a real phy to which 6lowpan is attached.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
Julian Anastasov 39f618b4fd ipvs: reset ipvs pointer in netns
Make sure net->ipvs is reset on netns cleanup or failed
initialization. It is needed for IPVS applications to know that
IPVS core is not loaded in netns.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Julian Anastasov 8d08d71ce5 ipvs: add check in ftp for initialized core
Avoid crash when registering ip_vs_ftp after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Linus Torvalds 2300fd67b4 NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPlbzwAAoJEGcL54qWCgDy24oQALZE67vBft7M2j0BiWhVbV15
 YLbCf6x/h+0BJAkKWdrBaw7N6GX6OYBOX2SsmrBkzYf5mgHeju5+dH0CmRAR5xib
 5d+Lwxif1l+rABfdzzJf8gY1L1THyJCnfmarKKyYEJ5OC1pJyulKLanXSPzPfzlm
 APV5Jf6NM2WRgkCqzP6zf61NG0HbDSR7C//HQ3k21Sdt9XDLf5qLHBSuPIQ+BlZY
 EvpbERTtJgp7rPJsLQv1F2dgasDUQNg8G+tmZatGcqEiNxVyQ2YqwshaldOVqftv
 3Kocs6OW5C1ESj1dFJZmeMZ/+GSHjRJx8fpqHJjmCsh4kPGgFviQDdYwu4FDhhPI
 FZslC5nVi8JMTPNJAFmfvbwPQId/TSRPCWYO5PtW1LSfRT/+25b6M5duro1eGIbJ
 /FDoOCYQmepNOfobU9Q3roDWyNSLYFaUaMJUrccRcAuS3S2NEXisTAT49kmqa1Vm
 ZArOJBnXTgmGi30nKhqqLJ43P61ekhX0AQ6PycZAXkjeRlkQs7AAQbMJZMB2X0r9
 KtRCDPiH2NuR0FwxNMkMP4BXdsaY7Sz/xiSZXLOUf1SeWBiBtYoDdrQ3z67SGOeG
 qxI3qXXl0KC2+l2jnezcWhBf4CDpxftGIBi+rKWJt8stoYzbemB/M1lkoTCwrVzq
 8Gwyy0QTVzE9VkY77oVW
 =hQAK
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.

* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Keep dropped state owners on the LRU list for a while
  NFSv4: Ensure that we don't drop a state owner more than once
  NFSv4: Ensure we do not reuse open owner names
  nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
  NFS: put open context on error in nfs_flush_multi
  NFS: put open context on error in nfs_pagein_multi
  NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
  NFSv4: Ensure that we check lock exclusive/shared type against open modes
  NFSv4: Ensure that the LOCK code sets exception->inode
  NFS: check for req==NULL in nfs_try_to_update_request cleanup
  SUNRPC: register PipeFS file system after pernet sybsystem
2012-04-25 21:38:44 -07:00
Shan Wei 8dcf01fc00 net: sock_diag_handler structs can be const
read only, so change it to const.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:46:59 -04:00
Shan Wei 62ad6fcd74 udp_diag: implement idiag_get_info for udp/udplite to get queue information
When we use netlink to monitor queue information for udp socket,
idiag_rqueue and idiag_wqueue of inet_diag_msg are returned with 0.

Keep consistent with netstat, just return back allocated rmem/wmem size.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:43:01 -04:00
Eric Dumazet 808db80a7e ipv6: call consume_skb() in frag/reassembly
Some kfree_skb() calls should be replaced by consume_skb() to avoid
drop_monitor/dropwatch false positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:39:46 -04:00
John Fastabend 081579840b net: dcb: add CEE notify calls
This adds code to trigger CEE events when an APP change or setall
command is made from user space. This simplifies user space code
significantly by creating a single interface to listen on that
works with both firmware and userland agents.

And if we end up with multiple agents this keeps every thing in
sync userland agents, firmware agents, and kernel notifier consumers.

For an example agent that listens for these events see:

https://github.com/jrfastab/cgdcbxd

cgdcbxd is a daemon used to monitor DCB netlink events and manage
the net_prio control group sub-system.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 19:47:17 -04:00
John W. Linville 395836282f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-04-25 13:41:25 -04:00
Julian Anastasov 8f9b9a2fad ipvs: fix crash in ip_vs_control_net_cleanup on unload
commit 14e405461e (2.6.39)
("Add __ip_vs_control_{init,cleanup}_sysctl()")
introduced regression due to wrong __net_init for
__ip_vs_control_cleanup_sysctl. This leads to crash when
the ip_vs module is unloaded.

	Fix it by changing __net_init to __net_exit for
the function that is already renamed to ip_vs_control_net_cleanup_sysctl.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:30 +02:00
Sasha Levin 7118c07a84 ipvs: Verify that IP_VS protocol has been registered
The registration of a protocol might fail, there were no checks
and all registrations were assumed to be correct. This lead to
NULL ptr dereferences when apps tried registering.

For example:

[ 1293.226051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 1293.227038] IP: [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] PGD 391de067 PUD 6c20b067 PMD 0
[ 1293.227038] Oops: 0000 [#1] PREEMPT SMP
[ 1293.227038] CPU 1
[ 1293.227038] Pid: 19609, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120405-sasha-dirty #57
[ 1293.227038] RIP: 0010:[<ffffffff822aacb0>]  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] RSP: 0018:ffff880038c1dd18  EFLAGS: 00010286
[ 1293.227038] RAX: ffffffffffffffc0 RBX: 0000000000001500 RCX: 0000000000010000
[ 1293.227038] RDX: 0000000000000000 RSI: ffff88003a2d5888 RDI: 0000000000000282
[ 1293.227038] RBP: ffff880038c1dd48 R08: 0000000000000000 R09: 0000000000000000
[ 1293.227038] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a2d5668
[ 1293.227038] R13: ffff88003a2d5988 R14: ffff8800696a8ff8 R15: 0000000000000000
[ 1293.227038] FS:  00007f01930d9700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
[ 1293.227038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1293.227038] CR2: 0000000000000018 CR3: 0000000065dfc000 CR4: 00000000000406e0
[ 1293.227038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.227038] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1293.227038] Process trinity (pid: 19609, threadinfo ffff880038c1c000, task ffff88002dc73000)
[ 1293.227038] Stack:
[ 1293.227038]  ffff880038c1dd48 00000000fffffff4 ffff8800696aada0 ffff8800694f5580
[ 1293.227038]  ffffffff8369f1e0 0000000000001500 ffff880038c1dd98 ffffffff822a716b
[ 1293.227038]  0000000000000000 ffff8800696a8ff8 0000000000000015 ffff8800694f5580
[ 1293.227038] Call Trace:
[ 1293.227038]  [<ffffffff822a716b>] ip_vs_app_inc_new+0xdb/0x180
[ 1293.227038]  [<ffffffff822a7258>] register_ip_vs_app_inc+0x48/0x70
[ 1293.227038]  [<ffffffff822b2fea>] __ip_vs_ftp_init+0xba/0x140
[ 1293.227038]  [<ffffffff821c9060>] ops_init+0x80/0x90
[ 1293.227038]  [<ffffffff821c90cb>] setup_net+0x5b/0xe0
[ 1293.227038]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[ 1293.227038]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[ 1293.227038]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[ 1293.227038]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[ 1293.227038]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1293.227038]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[ 1293.227038] Code: 89 c7 e8 34 91 3b 00 89 de 66 c1 ee 04 31 de 83 e6 0f 48 83 c6 22 48 c1 e6 04 4a 8b 14 26 49 8d 34 34 48 8d 42 c0 48 39 d6 74 13 <66> 39 58 58 74 22 48 8b 48 40 48 8d 41 c0 48 39 ce 75 ed 49 8d
[ 1293.227038] RIP  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038]  RSP <ffff880038c1dd18>
[ 1293.227038] CR2: 0000000000000018
[ 1293.379284] ---[ end trace 364ab40c7011a009 ]---
[ 1293.381182] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:12 +02:00
Andrei Emeltchenko 94c514fe24 mac80211: Adds clean sdata helper
Adds hepler to clean sdata ieee80211_clean_sdata similar way as
ieee80211_setup_sdata is implemented. The function will be used by other
interfaces later.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:56:10 -04:00
Felix Fietkau 7e3ed02c6e mac80211: fix num_mcast_sta counting issues
Moving a STA to an AP VLAN prevents num_mcast_sta from being decremented
once the STA leaves, because sta->sdata changes. Fix this by checking
for AP VLANs as well.

Also exclude 4-addr VLAN stations from num_mcast_sta - remote 4-addr
stations ignore 3-address multicast frames anyway. In a typical bridge
configuration they receive the same packets as 4-address unicast.

This patch also fixes clearing the sdata->u.vlan.sta pointer when the
STA is removed from a 4-addr VLAN.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:28 -04:00
Felix Fietkau 030ef8f8a5 mac80211: rename AP variable num_sta_authorized to num_mcast_sta
It is only used to test for BSS multicast receivers.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:28 -04:00
Wey-Yi Guy be6bcabc79 mac80211: check for non-managed interface
Average beacon signal only keep tracked by managed interface,
give warning and return 0 for the others.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:27 -04:00
Eliad Peller afa762f687 mac80211: call ieee80211_mgd_stop() on interface stop
ieee80211_mgd_teardown() is called on netdev removal, which
occurs after the vif was already removed from the low-level
driver, resulting in the following warning:

[ 4809.014734] ------------[ cut here ]------------
[ 4809.019861] WARNING: at net/mac80211/driver-ops.h:12 ieee80211_bss_info_change_notify+0x200/0x2c8 [mac80211]()
[ 4809.030388] wlan0:  Failed check-sdata-in-driver check, flags: 0x4
[ 4809.036862] Modules linked in: wlcore_sdio(-) wl12xx wlcore mac80211 cfg80211 [last unloaded: cfg80211]
[ 4809.046849] [<c001bd4c>] (unwind_backtrace+0x0/0x12c)
[ 4809.055937] [<c047cf1c>] (dump_stack+0x20/0x24)
[ 4809.065385] [<c003e334>] (warn_slowpath_common+0x5c/0x74)
[ 4809.075589] [<c003e408>] (warn_slowpath_fmt+0x40/0x48)
[ 4809.088291] [<bf033630>] (ieee80211_bss_info_change_notify+0x200/0x2c8 [mac80211])
[ 4809.102844] [<bf067f84>] (ieee80211_destroy_auth_data+0x80/0xa4 [mac80211])
[ 4809.116276] [<bf068004>] (ieee80211_mgd_teardown+0x5c/0x74 [mac80211])
[ 4809.129331] [<bf043f18>] (ieee80211_teardown_sdata+0xb0/0xd8 [mac80211])
[ 4809.141595] [<c03b5e58>] (rollback_registered_many+0x228/0x2f0)
[ 4809.153056] [<c03b5f48>] (unregister_netdevice_many+0x28/0x50)
[ 4809.165696] [<bf041ea8>] (ieee80211_remove_interfaces+0xb4/0xdc [mac80211])
[ 4809.179151] [<bf032174>] (ieee80211_unregister_hw+0x50/0xf0 [mac80211])
[ 4809.191043] [<bf0bebb4>] (wlcore_remove+0x5c/0x7c [wlcore])
[ 4809.201491] [<c02c6918>] (platform_drv_remove+0x24/0x28)
[ 4809.212029] [<c02c4d50>] (__device_release_driver+0x8c/0xcc)
[ 4809.222738] [<c02c4e84>] (device_release_driver+0x30/0x3c)
[ 4809.233099] [<c02c4258>] (bus_remove_device+0x10c/0x128)
[ 4809.242620] [<c02c26f8>] (device_del+0x11c/0x17c)
[ 4809.252150] [<c02c6de0>] (platform_device_del+0x28/0x68)
[ 4809.263051] [<bf0df49c>] (wl1271_remove+0x3c/0x50 [wlcore_sdio])
[ 4809.273590] [<c03806b0>] (sdio_bus_remove+0x48/0xf8)
[ 4809.283754] [<c02c4d50>] (__device_release_driver+0x8c/0xcc)
[ 4809.293729] [<c02c4e2c>] (driver_detach+0x9c/0xc4)
[ 4809.303163] [<c02c3d7c>] (bus_remove_driver+0xc4/0xf4)
[ 4809.312973] [<c02c5a98>] (driver_unregister+0x70/0x7c)
[ 4809.323220] [<c03809c4>] (sdio_unregister_driver+0x24/0x2c)
[ 4809.334213] [<bf0df458>] (wl1271_exit+0x14/0x1c [wlcore_sdio])
[ 4809.344930] [<c009b1a4>] (sys_delete_module+0x228/0x2a8)
[ 4809.354734] ---[ end trace 515290ccf5feb522 ]---

Rename ieee80211_mgd_teardown() to ieee80211_mgd_stop(),
and call it on ieee80211_do_stop().

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:42:42 -04:00
Szymon Janc 16cde9931b Bluetooth: Fix missing break in hci_cmd_complete_evt
Command complete event for HCI_OP_USER_PASSKEY_NEG_REPLY would result
in calling handler function also for HCI_OP_LE_SET_SCAN_PARAM. This
could result in undefined behaviour.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-04-24 11:38:41 -03:00
Paul Gortmaker 872f24dbc6 tipc: remove inline instances from C source files.
Untie gcc's hands and let it do what it wants within the
individual source files.  There are two files, node.c and
port.c -- only the latter effectively changes (gcc-4.5.2).
Objdump shows gcc deciding to not inline port_peernode().

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:41:03 -04:00
Eric Dumazet bfb253c9b2 af_netlink: drop_monitor/dropwatch friendly
Need to consume_skb() instead of kfree_skb() in netlink_dump() and
netlink_unicast_kernel() to avoid false dropwatch positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 658cb354ed af_netlink: cleanups
netlink_destroy_callback() move to avoid forward reference

CodingStyle cleanups

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 38ba0a65fa net: skb_can_coalesce returns a boolean
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:18:02 -04:00
Peter Huang (Peng) a881e963c7 set fake_rtable's dst to NULL to avoid kernel Oops
bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
  implementation -DaveM ]

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Huang <peter.huangpeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:16:24 -04:00
Eric Dumazet 783c175f90 tcp: tcp_try_coalesce returns a boolean
This clarifies code intention, as suggested by David.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:36:58 -04:00