Commit Graph

313951 Commits

Author SHA1 Message Date
Yuchung Cheng aab4874355 net-tcp: Fast Open client - detecting SYN-data drops
On paths with firewalls dropping SYN with data or experimental TCP options,
Fast Open connections will have experience SYN timeout and bad performance.
The solution is to track such incidents in the cookie cache and disables
Fast Open temporarily.

Since only the original SYN includes data and/or Fast Open option, the
SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
such drops. If a path has recurring Fast Open SYN drops, Fast Open is
disabled for 2^(recurring_losses) minutes starting from four minutes up to
roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
it behaves as connect() then write().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng cf60af03ca net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng 8e4178c1c7 net-tcp: Fast Open client - receiving SYN-ACK
On receiving the SYN-ACK after SYN-data, the client needs to
a) update the cached MSS and cookie (if included in SYN-ACK)
b) retransmit the data not yet acknowledged by the SYN-ACK in the final ACK of
   the handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng 783237e8da net-tcp: Fast Open client - sending SYN-data
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
  the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng 1fe4c481ba net-tcp: Fast Open client - cookie cache
With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
The basic ones are MSS and the cookies. Later patch will cache more to
handle unfriendly middleboxes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
Yuchung Cheng 2100c8d2d9 net-tcp: Fast Open base
This patch impelements the common code for both the client and server.

1. TCP Fast Open option processing. Since Fast Open does not have an
   option number assigned by IANA yet, it shares the experiment option
   code 254 by implementing draft-ietf-tcpm-experimental-options
   with a 16 bits magic number 0xF989. This enables global experiments
   without clashing the scarce(2) experimental options available for TCP.

   When the draft status becomes standard (maybe), the client should
   switch to the new option number assigned while the server supports
   both numbers for transistion.

2. The new sysctl tcp_fastopen

3. A place holder init function

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
Thadeu Lima de Souza Cascardo 4cce66cdd1 mlx4_en: map entire pages to increase throughput
In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:53:13 -07:00
Michal Schmidt a9ec6bd1f7 sfc: initialize dynamic sysfs attributes for lockdep
Dynamically allocated sysfs attributes must be initialized using
sysfs_attr_init(), otherwise lockdep complains:
BUG: key <address> not in .data!

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:08 -07:00
stephen hemminger 8427b2acfd bridge: update documentation references
Update the references to bridge utilities and web pages
to current locations

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:07 -07:00
Bjørn Mork 8b0d2f9ed3 net: e100: ucode is optional in some cases
commit 9ac32e1b firmware: convert e100 driver to request_firmware()

did a straight conversion of the in-driver ucode to external
files.  This introduced the possibility of the driver failing
to enable an interface due to missing ucode. There was no
evaluation of the importance of the ucode at the time.

Based on comments in earlier versions of this driver, and in
the source code for the FreeBSD fxp driver, we can assume that
the ucode implements the "CPU Cycle Saver" feature on supported
adapters.  Although generally wanted, this is an optional
feature. The ucode source is not available, preventing it from
being included in free distributions. This creates unnecessary
problems for the end users. Doing a network install based on a
free distribution installer requires the user to download and
insert the ucode into the installer.

Making the ucode optional when possible improves the user
experience and driver usability.

The ucode for some adapters include a bugfix, making it
essential.  We continue to fail for these adapters unless the
ucode is available.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:07 -07:00
Christian Riesch 215029375c asix: AX88172A driver depends on phylib
Since commit 16626b0cc3 the asix
driver depends on the phylib. Select phylib when the asix driver is
selected.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kernel-janitors@vger.kernel.org
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:07 -07:00
Christian Riesch cb7b24cdc6 asix: Add support for programming the EEPROM
This patch adds the asix_set_eeprom() function to provide support for
programming the configuration EEPROM via ethtool.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:07 -07:00
Christian Riesch ceb02c91dd asix: Rework reading from EEPROM
The current code for reading the EEPROM via ethtool in the asix
driver has a few issues. It cannot handle odd length values
(accesses must be aligned at 16 bit boundaries) and interprets the
offset provided by ethtool as 16 bit word offset instead as byte offset.

The new code for asix_get_eeprom() introduced by this patch is
modeled after the code in
drivers/net/ethernet/atheros/atl1e/atl1e_ethtool.c
and provides read access to the entire EEPROM with arbitrary
offsets and lengths.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:07 -07:00
Dinh Nguyen 84c9f8c41d net: stmmac: Add ip version to dts bindings
Because there are multiple variants to the stmmac/dwmac driver, the
dts bindings should be updated to include version of the IP used.

Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
Acked-by: Stefan Roese <sr@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:00 -07:00
brenohl@br.ibm.com 1d962ecf1e cxgb3: Set vlan_feature on net_device
cxgb3 interface has a bad performance when VLAN is set. On my current
setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
(8 instances, 4k message).
With this patch, I am able to reach 9.5 Gbps.

Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:00 -07:00
stephen hemminger 83bd1b793e ipx: move peII functions
The Ethernet II wrapper is only used by IPX protocol, may have once
been used by Appletalk but not currently. Therefore it makes sense to
move it to the IPX dust bin and drop the exports.

Build tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:48:00 -07:00
David S. Miller d8f1641b58 net: Fix warnings in dst_ops.h
include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:43:03 -07:00
Eric Dumazet be9f4a44e7 ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:35:30 -07:00
Julian Anastasov aee06da672 ipv4: use seqlock for nh_exceptions
Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:30:14 -07:00
Duan Jiong 36eb22e97a libertas: firmware.c: remove duplicated include
Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
2012-07-19 12:36:34 -04:00
John W. Linville 3e497e0215 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-07-19 12:35:00 -04:00
David S. Miller 7fed84f622 ipv4: Fix time difference calculation in rt_bind_exception().
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:46:59 -07:00
Amir Vadai 1eb8c695bd net/mlx4_en: Add accelerated RFS support
Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai 122733a189 net/rps: Protect cpu_rmap.h from double inclusion
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai af22d9de45 net/mlx4: Move MAC_MASK to a common place
Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Julian Anastasov 0cc535a299 ipv4: fix address selection in fib_compute_spec_dst
ip_options_compile can be called for forwarded packets,
make sure the specific-destionation address is a local one as
specified in RFC 1812, 4.2.2.2 Addresses in Options

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:30:49 -07:00
Julian Anastasov 6255e5ead0 ipv4: optimize fib_compute_spec_dst call in ip_options_echo
Move fib_compute_spec_dst at the only place where it
is needed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:30:49 -07:00
Linus Torvalds 3e4b9459fb md: 3 bugfixes for 3.5-rc
One of the bugs was introduced in 3.5-rc1.  Others have
 been there for longer.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAUAeiTznsnt1WYoG5AQLzcRAAqAIBtmGRzTpCTGzfJxF3ciDptQLfDKzx
 JDwcKsmD3+70bjkKUsHhu22EK/Fgdi2T7XirjlJTnMZFtwfQFbRBNFe+6AneefzI
 fJvTwEOsvgNBEJvEUtp9hboZBZBZBmujdXYuEf9NbSEoK+bOcxtTh6V+CwcCKAEI
 ulreMNCBX/e9RRP/ayUsj33TJGvDGEJWmFOoEj/3sZ9soKC3GYFkr5I50FcRhrgh
 J78mX64Qf1KCsIG2zwN5w/pE9Nnz5mJ4iBElhl3xQT6nDikhe4AZv2Z51s6UMQ9K
 oQSVgJg9IkAH+Vl/IFzvK/4mU1/xnCydA/Q+CEXxLXor0kFnl9XxpSwHhjQTQ/Ag
 l5cA+U5RR0wkCS7OGv0mxwY2rfw0wg7I+v9GQu9hg+XyeZTjWC4rU1EiANAWDHiE
 eCoUTi4MkwAA5vkL+G7B2I9fXD11eigTPFbiHvm5a3SNzqiKklMQgh7SoonnRZ9L
 iL3qfIBwhpQAbAChqTS92WofYvNKTfE6qTrQUfEomgr4EWtr2teiR36shDnfKRCw
 vdmL7d+ql9qHbD83NfV7tE08clK4h5MtKDmJtHoOdeeGK1UUI+VmMhQQSHEX3UAW
 TduJnL4bhaw5Z4QCrpS5IO5R/mTBX1VRrbzjcJl+Te4HrmRl/ccQKachvn1N597L
 ah3SlO5LBy0=
 =zpJG
 -----END PGP SIGNATURE-----

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull three md bugfixes from NeilBrown:
 "One of the bugs was introduced in 3.5-rc1.  Others have been there for
  longer."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md/raid1: close some possible races on write errors during resync
  md: avoid crash when stopping md array races with closing other open fds.
  md: fix bug in handling of new_data_offset
2012-07-19 08:27:13 -07:00
Linus Torvalds 309d4b000b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:
 "Ok, we should be good to go now"

1) We have to statically initialize the init_net device list head rather
   than do so in an initcall, otherwise netprio_cgroup crashes if it's
   built statically rather than modular (Mark D.  Rustad)

2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)

3) Qlogic maintainers update (Anirban Chakraborty)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Statically initialize init_net.dev_base_head
  MAINTAINERS: Changes in qlcnic and qlge maintainers list
  cipso: don't follow a NULL pointer when setsockopt() is called
2012-07-19 08:21:13 -07:00
Linus Torvalds 61c901c569 Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID update from Jiri Kosina:
 "A final round of changes for HID for 3.5: just device ID additions."

* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-multitouch: add support for Zytronic panels
  HID: add Sennheiser BTD500USB device support
  HID: add battery quirk for Apple Wireless ANSI
2012-07-19 08:15:55 -07:00
Ezequiel Garcia 380e99fc44 cx25821: Remove bad strcpy to read-only char*
The strcpy was being used to set the name of the board.  Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.

The type of char* is changed to const char* to prevent future errors.

Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-19 08:15:33 -07:00
Benjamin Tissoires e9a09aed3e HID: hid-multitouch: add support for Zytronic panels
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-19 13:56:16 +02:00
NeilBrown 58e94ae184 md/raid1: close some possible races on write errors during resync
commit 4367af5561
   md/raid1: clear bad-block record when write succeeds.

Added a 'reschedule_retry' call possibility at the end of
end_sync_write, but didn't add matching code at the end of
sync_request_write.  So if the writes complete very quickly, or
scheduling makes it seem that way, then we can miss rescheduling
the request and the resync could hang.

Also commit 73d5c38a95
    md: avoid races when stopping resync.

Fix a race condition in this same code in end_sync_write but didn't
make the change in sync_request_write.

This patch updates sync_request_write to fix both of those.
Patch is suitable for 3.1 and later kernels.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Original-version-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-19 15:59:18 +10:00
NeilBrown a05b7ea03d md: avoid crash when stopping md array races with closing other open fds.
md will refuse to stop an array if any other fd (or mounted fs) is
using it.
When any fs is unmounted of when the last open fd is closed all
pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
so there will be no pending IO to worry about when the array is
stopped.

However in order to send the STOP_ARRAY ioctl to stop the array one
must first get and open fd on the block device.
If some fd is being used to write to the block device and it is closed
after mdadm open the block device, but before mdadm issues the
STOP_ARRAY ioctl, then there will be no last-close on the md device so
__blkdev_put will not call sync_blockdev.

If this happens, then IO can still be in-flight while md tears down
the array and bad things can happen (use-after-free and subsequent
havoc).

So in the case where do_md_stop is being called from an open file
descriptor, call sync_block after taking the mutex to ensure there
will be no new openers.

This is needed when setting a read-write device to read-only too.

Cc: stable@vger.kernel.org
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-19 15:59:18 +10:00
NeilBrown 25f7fd470b md: fix bug in handling of new_data_offset
commit c6563a8c38
    md: add possibility to change data-offset for devices.

introduced a 'new_data_offset' attribute which should normally
be the same as 'data_offset', but can be explicitly set to a different
value to allow a reshape operation to move the data.

Unfortunately when the 'data_offset' is explicitly set through
sysfs, the new_data_offset is not also set, so the two would become
out-of-sync incorrectly.

One result of this is that trying to set the 'size' after the
'data_offset' would fail because it is not permitted to set the size
when the 'data_offset' and 'new_data_offset' are different - as that
can be confusing.
Consequently when mdadm tried to do this while assembling an IMSM
array it would fail.

This bug was introduced in 3.5-rc1.

Reported-by: Brian Downing <bdowning@lavos.net>
Bisected-by: Brian Downing <bdowning@lavos.net>
Tested-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-19 15:59:18 +10:00
Linus Torvalds 8a7298b780 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
 "This includes a bugfix from MDR to address a NULL pointer OOPs with
  FCoE aborts, along with a WRITE_SAME emulation bugfix for NOLB=0
  cases, and persistent reservation return cleanups from Roland.

  All three patches are CC'ed to stable."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix range calculation in WRITE SAME emulation when num blocks == 0
  target: Clean up returning errors in PR handling code
  tcm_fc: Fix crash seen with aborts and large reads
2012-07-18 18:40:38 -07:00
Olaf Hering b1bdd2eb31 kexec: update URL of kexec homepage
The referenced html file does not exist anymore. Replace the URL with
the current project homepage.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-18 18:35:57 -07:00
Yoichi Yuasa 893a0574de mips: fix bug.h build regression
Commit 377780887 ("bug.h: need linux/kernel.h for TAINT_WARN.") broke
all MIPS builds:

    CC      arch/mips/kernel/machine_kexec.o
  include/linux/log2.h: In function '__ilog2_u32':
  include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
  include/linux/log2.h: In function '__ilog2_u64':
  include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
  ...

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Tested-by: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-18 18:35:57 -07:00
Linus Torvalds eea03c20ae Make wait_for_device_probe() also do scsi_complete_async_scans()
Commit a7a20d1039 ("sd: limit the scope of the async probe domain")
make the SCSI device probing run device discovery in it's own async
domain.

However, as a result, the partition detection was no longer synchronized
by async_synchronize_full() (which, despite the name, only synchronizes
the global async space, not all of them).  Which in turn meant that
"wait_for_device_probe()" would not wait for the SCSI partitions to be
parsed.

And "wait_for_device_probe()" was what the boot time init code relied on
for mounting the root filesystem.

Now, most people never noticed this, because not only is it
timing-dependent, but modern distributions all use initrd.  So the root
filesystem isn't actually on a disk at all.  And then before they
actually mount the final disk filesystem, they will have loaded the
scsi-wait-scan module, which not only does the expected
wait_for_device_probe(), but also does scsi_complete_async_scans().

[ Side note: scsi_complete_async_scans() had also been partially broken,
  but that was fixed in commit 43a8d39d01 ("fix async probe
  regression"), so that same commit a7a20d1039 had actually broken
  setups even if you used scsi-wait-scan explicitly ]

Solve this problem by just moving the scsi_complete_async_scans() call
into wait_for_device_probe().  Everybody who wants to wait for device
probing to finish really wants the SCSI probing to complete, so there's
no reason not to do this.

So now "wait_for_device_probe()" really does what the name implies, and
properly waits for device probing to finish.  This also removes the now
unnecessary extra calls to scsi_complete_async_scans().

Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-18 18:15:46 -07:00
Linus Torvalds e2f3b78557 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull SELinux regression fixes from James Morris.

Andrew Morton has a box that hit that open perms problem.

I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit d9914cf661
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  SELinux: do not check open perms if they are not known to policy
  SELinux: include definition of new capabilities
2012-07-18 13:42:44 -07:00
Rustad, Mark D 734b65417b net: Statically initialize init_net.dev_base_head
This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.

With thanks to Eric Dumazet for catching a bug.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 13:32:27 -07:00
Alexander Duyck a16a0d2fb8 ixgbe: Cleanup holes in flags after removing several of them
This change is just meant to defragment the flags as there are several hole
that have been introduced since several features, or the flags for them,
have been removed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:30:08 -07:00
Alexander Duyck fbe7ca7f9b ixgbe: Retire RSS enabled and capable flags
All of our hardware supports RSS even if it is only for a single queue.  So
instead of toting around the RSS enable flag I am updating the code so that
all devices are enabled and if we want to disable RSS it is indicated via
the RSS mask.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:29:05 -07:00
Alexander Duyck 73079ea041 ixgbe: Add support for SR-IOV w/ DCB or RSS
This change essentially makes it so that we can enable almost all of the
features all at once.  This patch allows for the combination of SR-IOV,
DCB, and FCoE in the case of the x540.  It also beefs up the SR-IOV by
adding support for RSS to the PF.

The testing matrix gets to be very complex for this patch as there are a
number of different features and subsets for queueing options.  I tried to
narrow these down a bit by restricting the PF to only supporting 4TC DCB
when it is enabled in addition to SR-IOV.

Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:26:22 -07:00
Alexander Duyck 435b19f621 ixgbe: Update configure virtualization to allow for multiple PF pools
This change allows all pools from the default pool forward to be enabled vi
ixgbe_configure_virtualization.  This is needed as we are planning to use
queues belonging to adjacent pools for FCoE when SR-IOV and FCoE are both
enabled.

In addition this patch contains some minor formatting changes as there were
a few spots that seemed to be in need of some cleanup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:21:56 -07:00
Alexander Duyck eb022d058f ixgbevf: Fix multiple issues in ixgbevf_get/set_ringparam
In ixgbevf_get_ringparam we could run into a NULL pointer dereference
if the rings were not allocated when we attempted the call.  To prevent
that we can just access the tx/rx_ring_count values instead of attempting
to access the rings to get the count.

This change corrects a memory leak and memory corruption in
ixgbevf_set_ringparam.

The memory leak was due to us not freeing the resources from the ring
before overwriting them.  This change corrects the memory leak by making
certain to call ixgbe_free_tx/rx_resources on the rings prior to freeing
them.

The memory corruption was because we were replacing the rings but not
updating the q_vectors.  It addresses the memory corruption by leaving the
rings in place and instead just copying the contents of the new rings into
the existing rings.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:18:24 -07:00
Alexander Duyck 70a10e258c ixgbevf: Consolidate Tx context descriptor creation code
There is a good bit of redundancy between the Tx checksum and segmentation
offloads.  In order to reduce some of this I am moving the code for
creating a context descriptor into a separate function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:16:37 -07:00
Alexander Duyck fb40195cc9 ixgbevf: Add netdev to ring structure
This change adds the netdev to the ring structure.  This allows for a
quicker transition from ring to netdev without having to go from ring to
adapter to netdev.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:14:36 -07:00
Alexander Duyck 18c6308971 ixgbevf: Do not rewind the Rx ring before bumping tail
The driver is going back one step from its' previous location before
bumping tail. This is incorrect.  We should just be writing the value of
next_to_use into the tail register.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-18 13:12:08 -07:00