Commit Graph

334538 Commits

Author SHA1 Message Date
David Daney 748e787eb6 MIPS: Optimize TLB refill for RI/XI configurations.
We don't have to do a separate shift to eliminate the software bits,
just rotate them into the fill and they will be ignored.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4294/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:11:20 +02:00
Ralf Baechle 981ef0de49 MIPS: proc: Cleanup printing of ASEs.
The number of %s was just getting ridiculous.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:10:43 +02:00
Ralf Baechle 475032564e MIPS: Hardwire detection of DSP ASE Rev 2 for systems, as required.
Most supported systems currently hardwire cpu_has_dsp to 0, so we also
can disable support for cpu_has_dsp2 resulting in a slightly smaller
kernel.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:10:43 +02:00
Steven J. Hill ee80f7c73d MIPS: Add detection of DSP ASE Revision 2.
[ralf@linux-mips.org: This patch really only detects the ASE and passes its
existence on to userland via /proc/cpuinfo.  The DSP ASE Rev 2. adds new
resources but no resources that would need management by the kernel.]

Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4165/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:05:03 +02:00
David Daney f59a2d22a0 MIPS: Optimize pgd_init and pmd_init
On a dual issue processor GCC generates code that saves a couple of
clock cycles per loop if we rearrange things slightly.  Checking for
p != end saves a SLTU per loop, moving the increment to the middle can
let it dual issue on multi-issue processors.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4249/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:35 +02:00
Al Cooper a7911a8fd1 MIPS: perf: Add perf functionality for BMIPS5000
Add hardware performance counter support to kernel "perf" code for
BMIPS5000. The BMIPS5000 performance counters are similar to MIPS
MTI cores, so the changes were mostly made in perf_event_mipsxx.c
which is typically for MTI cores.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4109/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper 399aaa2568 MIPS: perf: Split the Kconfig option CONFIG_MIPS_MT_SMP
Split the Kconfig option CONFIG_MIPS_MT_SMP into CONFIG_MIPS_MT_SMP
and CONFIG_MIPS_PERF_SHARED_TC_COUNTERS so some of the code used
for performance counters that are shared between threads can be used
for MIPS cores that are not MT_SMP.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4108/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper ecb8ee8a89 MIPS: perf: Remove unnecessary #ifdef
The #ifdef for CONFIG_HW_PERF_EVENTS is not needed because the
Makefile will only compile the module if this config option is set.
This means that the code under #else would never be compiled. This
may have been done to leave the original broken code around for
reference, but the FIXME comment above the code already shows the
broken code.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4107/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper da4b62cd67 MIPS: perf: Add cpu feature bit for PCI (performance counter interrupt)
The PCI (Program Counter Interrupt) bit in the "cause" register
is mandatory for MIPS32R2 cores, but has also been added to some R1
cores (BMIPS5000). This change adds a cpu feature bit to make it
easier to check for and use this feature.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4106/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:04:34 +02:00
Al Cooper c5600b2dd9 MIPS: perf: Change the "mips_perf_event" table unsupported indicator.
Change the indicator from 0xffffffff in the "event_id" member to
zero in the "cntr_mask" member. This removes the need to initialize
entries that are unsupported. This also solves a problem where the
number of entries in the table was increased based on a globel enum
used for all platforms, but the new unsupported entries were not added
for mips. This was leaving new table entries of all zeros that we not
marked UNSUPPORTED.

Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4110/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:41 +02:00
David Daney 485172b3df MIPS: Align swapper_pg_dir to 64K for better TLB Refill code.
We can save an instruction in the TLB Refill path for kernel mappings
by aligning swapper_pg_dir on a 64K boundary.  The address of
swapper_pg_dir can be generated with a single LUI instead of
LUI/{D}ADDUI.

The alignment of __init_end is bumped up to 64K so there are no holes
between it and swapper_pg_dir, which is placed at the very beginning
of .bss.

The alignment of invalid_pmd_table and invalid_pte_table can be
relaxed to PAGE_SIZE.  We do this by using __page_aligned_bss, which
has the added benefit of eliminating alignment holes in .bss.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org,
Cc: linux-kernel@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Patchwork: https://patchwork.linux-mips.org/patch/4220/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:40 +02:00
David Daney c87728ca82 vmlinux.lds.h: Allow architectures to add sections to the front of .bss
Follow-on MIPS patch will put an object here that needs 64K alignment
to minimize padding.

For those architectures that don't define BSS_FIRST_SECTIONS, there is
no change.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org,
Cc: linux-kernel@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Patchwork: https://patchwork.linux-mips.org/patch/4221/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:37 +02:00
Joshua Kinard b4f2a17ba9 Improve atomic.h robustness
I've maintained this patch, originally from Thiemo Seufer in 2004, for a
really long time, but I think it's time for it to get a look at for
possible inclusion.  I have had no problems with it across various SGI
systems over the years.

To quote the post here:
http://www.linux-mips.org/archives/linux-mips/2004-12/msg00000.html

"the atomic functions use so far memory references for the inline
assembler to access the semaphore. This can lead to additional
instructions in the ll/sc loop, because newer compilers don't
expand the memory reference any more but leave it to the assembler.

The appended patch uses registers instead, and makes the ll/sc
arguments more explicit. In some cases it will lead also to better
register scheduling because the register isn't bound to an output
any more."

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4029/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-10-11 11:02:36 +02:00
Dmitry Torokhov 0cc8d6a9d2 Merge branch 'next' into for-linus
Prepare second set of updates for 3.7 merge window (Wacom driver update
and patches extending number of input minors).
2012-10-11 00:45:21 -07:00
Andrei Emeltchenko 716e4ab5c9 Bluetooth: AMP: Hanlde AMP_LINK case in conn_put
Handle AMP link when setting up disconnect timeout.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:34:46 +08:00
Andrei Emeltchenko bd1eb66ba4 Bluetooth: AMP: Handle AMP_LINK connection
AMP_LINK represents physical link between AMP controllers.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:34:24 +08:00
Andrei Emeltchenko 76ef7cf772 Bluetooth: AMP: Handle number of compl blocks for AMP_LINK
Add handling blocks count for AMP link.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:34:02 +08:00
Andrei Emeltchenko 42c4e53e7a Bluetooth: AMP: Add handle to hci_chan structure
hci_chan will be identified by handle used in logical link creation
process. This handle is used in AMP ACL-U packet handle field.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:33:05 +08:00
Andrei Emeltchenko 53502d69be Bluetooth: AMP: Handle AMP_LINK timeout
When AMP_LINK timeouts execute HCI_OP_DISCONN_PHY_LINK as analog to
HCI_OP_DISCONNECT for ACL_LINK.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:30:58 +08:00
Andrei Emeltchenko 12d5978165 Bluetooth: Allow to set flush timeout
Enable setting of flush timeout via setsockopt

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:29:02 +08:00
Syam Sidhardhan 5bcb80944d Bluetooth: Use __constant modifier for RFCOMM PSM
Since the RFCOMM_PSM is constant, __constant_cpu_to_le16() is
the right go here.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:26:28 +08:00
Syam Sidhardhan d8aece2af3 Bluetooth: Use __constant modifier for L2CAP SMP CID
Since the L2CAP_CID_SMP is constant, __constant_cpu_to_le16() is
the right go here.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:25:58 +08:00
Andrei Emeltchenko 78c1b8e822 Bluetooth: btmrv: Use %*ph specifier instead of print_hex_dump_bytes
Use standard print specifier and remove print_hex_dump_bytes call.
Makes output more sensible:

...
[18809.401218] 00000000: 0b 00 00 fe 5b fc 01 f2 00 00 00    ....[......
...

would be changed to

...
[18809.401218] Bluetooth: hex: 0b 00 00 fe 5b fc 01 f2 00 00 00
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-11 14:21:15 +08:00
NeilBrown 72f36d5972 md: refine reporting of resync/reshape delays.
If 'resync_max' is set to 0 (as is often done when starting a
reshape, so the mdadm can remain in control during a sensitive
period), and if the reshape request is initially delayed because
another array using the same array is resyncing or reshaping etc,
when user-space cannot easily tell when the delay changes from being
due to a conflicting reshape, to being due to resync_max = 0.

So introduce a new state: (curr_resync == 3) to reflect this, make
sure it is visible both via /proc/mdstat and via the "sync_completed"
sysfs attribute, and ensure that the event transition from one delay
state to the other is properly notified.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:25:57 +11:00
NeilBrown e56108d65f md/raid5: be careful not to resize_stripes too big.
When a RAID5 is reshaping, conf->raid_disks is increased
before mddev->delta_disks becomes zero.
This can result in check_reshape calling resize_stripes with a
number that is too large.  This particularly happens
when md_check_recovery calls ->check_reshape().

If we use ->previous_raid_disks, we don't risk this.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:24:13 +11:00
NeilBrown db07d85ef6 md: make sure manual changes to recovery checkpoint are saved.
If you make an array bigger but suppress resync of the new region with
  mdadm --grow /dev/mdX --size=max --assume-clean

then stop the array before anything is written to it, the effect of
the "--assume-clean" is lost and the array will resync the new space
when restarted.
So ensure that we update the metadata in the case.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:22:17 +11:00
Dan Carpenter 91502f099d md/raid10: use correct limit variable
Clang complains that we are assigning a variable to itself.  This should
be using bad_sectors like the similar earlier check does.

Bug has been present since 3.1-rc1.  It is minor but could
conceivably cause corruption or other bad behaviour.

Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:20:58 +11:00
NeilBrown 48c26ddc9f md: writing to sync_action should clear the read-auto state.
In some cases array are started in 'read-auto' state where in
nothing gets written to any device until the array is written
to.  The purpose of this is to make accidental auto-assembly
of the wrong arrays less of a risk, and to allow arrays to be
started to read suspend-to-disk images without actually changing
anything (as might happen if the array were dirty and a
resync seemed necessary).

Explicitly writing the 'sync_action' for a read-auto array currently
doesn't clear the read-auto state, so the sync action doesn't
happen, which can be confusing.

So allow any successful write to sync_action to clear any read-auto
state.

Reported-by: Alexander Kühn <alexander.kuehn@nagilum.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:19:39 +11:00
Jianpeng Ma 7f7583d420 Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races
Now that multiple threads can handle stripes, it is safer to
use an atomic64_t for resync_mismatches, to avoid update races.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 14:17:59 +11:00
Hiroaki SHIMODA 8edc0e624d e1000e: Change wthresh to 1 to avoid possible Tx stalls
This patch originated from Hiroaki SHIMODA but has been modified
by Intel with some minor cleanups and additional commit log text.

Denys Fedoryshchenko and others reported Tx stalls on e1000e with
BQL enabled.  Issue was root caused to hardware delays. They were
introduced because some of the e1000e hardware with transmit
writeback bursting enabled, waits until the driver does an
explict flush OR there are WTHRESH descriptors to write back.

Sometimes the delays in question were on the order of seconds,
causing visible lag for ssh sessions and unacceptable tx
completion latency, especially for BQL enabled kernels.

To avoid possible Tx stalls, change WTHRESH back to 1.

The current plan is to investigate a method for re-enabling
WTHRESH while not harming BQL, but those patches will be later
for net-next if they work.

please enqueue for stable since v3.3 as this bug was introduced in
commit 3f0cfa3bc1
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 28 16:33:16 2011 +0000

    e1000e: Support for byte queue limits

    Changes to e1000e to use byte queue limits.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
CC: eric.dumazet@gmail.com
CC: therbert@google.com
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:59:18 -04:00
David S. Miller 959859d2fe Merge branch 'uapi-for-3.7' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:

====================
this pull request for net, i.e. the v3.7 release cycle, contains the patch by
David Howells to move the UAPI related headers for the CAN subsystem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:57:16 -04:00
stephen hemminger 68aaed54e7 ipv4: fix route mark sparse warning
Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46:    expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46:    got unsigned int [unsigned] [usertype] flowic_mark

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:54:59 -04:00
Ian Campbell 6a8ed462f1 xen: netback: handle compound page fragments on transmit.
An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Sarveshwar Bandi 6caab7b054 bridge: Pull ip header into skb->data before looking into ip header.
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Dan Carpenter 435f08a721 isdn: fix a wrapping bug in isdn_ppp_ioctl()
"protos" is an array of unsigned longs and "i" is the number of bits in
an unsigned long so we need to use 1UL as well to prevent the shift
from wrapping around.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
NeilBrown 1ed850f356 md/raid5: make sure to_read and to_write never go negative.
to_read and to_write are part of the result of analysing
a stripe before handling it.
Their use is to avoid some loops and tests if the values are
known to be zero.  Thus it is not a problem if they are a
little bit larger than they should be.

So decrementing them in handle_failed_stripe serves little value, and
due to races it could cause some loops to be skipped incorrectly.

So remove those decrements.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:13 +11:00
Alexander Lyakas a7854487cd md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Suggested-by: Yair Hershko <yair@zadarastorage.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
NeilBrown b97390aec4 md/raid5: protect debug message against NULL derefernce.
The pr_debug in add_stripe_bio could race with something
changing *bip, so it is best to hold the lock until
after the pr_debug.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
NeilBrown 143c4d0573 md/raid5: add some missing locking in handle_failed_stripe.
We really should hold the stripe_lock while accessing
'toread' else we could race with add_stripe_bio and corrupt
a list.

Reported-by: "Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:50:12 +11:00
Shaohua Li 9e44476851 MD: raid5 avoid unnecessary zero page for trim
We want to avoid zero discarded dev page, because it's useless for discard.
But if we don't zero it, another read/write hit such page in the cache and will
get inconsistent data.

To avoid zero the page, we don't set R5_UPTODATE flag after construction is
done. In this way, discard write request is still issued and finished, but read
will not hit the page. If the stripe gets accessed soon, we need reread the
stripe, but since the chance is low, the reread isn't a big deal.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:49:49 +11:00
Shaohua Li 620125f2bf MD: raid5 trim support
Discard for raid4/5/6 has limitation. If discard request size is
small, we do discard for one disk, but we need calculate parity and
write parity disk.  To correctly calculate parity, zero_after_discard
must be guaranteed. Even it's true, we need do discard for one disk
but write another disks, which makes the parity disks wear out
fast. This doesn't make sense. So an efficient discard for raid4/5/6
should discard all data disks and parity disks, which requires the
write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
is smaller than chunk_size, such pattern is almost impossible in
practice. So in this patch, I only handle the case that A's size
equals to chunk_size. That is discard request should be aligned to
stripe size and its size is multiple of stripe size.

Since we can only handle request with specific alignment and size (or
part of the request fitting stripes), we can't guarantee
zero_after_discard even zero_after_discard is true in low level
drives.

The block layer doesn't send down correctly aligned requests even
correct discard alignment is set, so I must filter out.

For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
zero_after_discard is true for all disks, data is consistent after
discard.  Otherwise, data might be lost. Let's consider a scenario:
discard a stripe, write data to one disk and write parity disk. The
stripe could be still inconsistent till then depending on using data
from other data disks or parity disks to calculate new parity. If the
disk is broken, we can't restore it. So in this patch, we only enable
discard support if all disks have zero_after_discard.

If discard fails in one disk, we face the similar inconsistent issue
above. The patch will make discard follow the same path as normal
write request. If discard fails, a resync will be scheduled to make
the data consistent. This isn't good to have extra writes, but data
consistency is important.

If a subsequent read/write request hits raid5 cache of a discarded
stripe, the discarded dev page should have zero filled, so the data is
consistent. This patch will always zero dev page for discarded request
stripe. This isn't optimal because discard request doesn't need such
payload. Next patch will avoid it.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:49:05 +11:00
Jianpeng Ma 582e2e056a md/bitmap:Don't use IS_ERR to judge alloc_page().
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:45:36 +11:00
NeilBrown 7ad4d4a68a md/raid1: Don't release reference to device while handling read error.
When we get a read error, we arrange for raid1d to handle it.
Currently we release the reference on the device.  This can result
in
   conf->mirrors[read_disk].rdev
being NULL in fix_read_error, if the device happens to get removed
before the read error is handled.

So instead keep the reference until the read error has been fully
handled.

Reported-by: hank <pyu@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:44:30 +11:00
Michael Wang fd177481b4 raid: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to save a few lines
of code and allow removing list_for_each_continue_rcu().

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:43:21 +11:00
Jan Beulich af7cf25dd1 add further __init annotations to crypto/xor.c
Allow particularly do_xor_speed() to be discarded post-init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:42:32 +11:00
Jonathan Brassow 761becff01 DM RAID: Fix for "sync" directive ineffectiveness
There are two table arguments that can be given to a DM RAID target
that control whether the array is forced to (re)synchronize or skip
initialization: "sync" and "nosync".  When "sync" is given, we set
mddev->recovery_cp to 0 in order to cause the device to resynchronize.
This is insufficient if there is a bitmap in use, because the array
will simply look at the bitmap and see that there is no recovery
necessary.

The fix is to skip over the loading of the superblocks when "sync" is
given, causing new superblocks to be written that will force the array
to go through initialization (i.e. synchronization).

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:42:19 +11:00
stephen hemminger 34e02aa1fb vxlan: fix oops when give unknown ifindex
If vxlan is created and the ifindex is passed; there are two cases which
are incorrectly handled by the existing code. The ifindex could be zero
(i.e. no device) or there could be no device with that ifindex.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00
stephen hemminger d97c00a321 vxlan: fix receive checksum handling
Vxlan was trying to use postpull_rcsum to allow receive checksum
offload to work on drivers using CHECKSUM_COMPLETE method. But this
doesn't work correctly. Just force full receive checksum on received
packet.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00
stephen hemminger 2840bf2286 vxlan: add additional headroom
Tell upper layer protocols to allocate skb with additional headroom.
This avoids allocation and copy in local packet sends.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:22 -04:00
stephen hemminger 05f47d69c4 vxlan: allow configuring port range
VXLAN bases source UDP port based on flow to help the
receiver to be able to load balance based on outer header flow.

This patch restricts the port range to the normal UDP local
ports, and allows overriding via configuration.

It also uses jhash of Ethernet header when looking at flows
with out know L3 header.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:41:21 -04:00