Commit Graph

276216 Commits

Author SHA1 Message Date
Christoph Hellwig 6f21475576 target: remove the unused se_dev_list
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:57 +00:00
Sebastian Andrzej Siewior 9649fa1b87 target/file: walk properly over sg list
This patch changes fileio to use for_each_sg() when walking se_task->task_sg
memory passed into from loopback LLD struct scsi_cmnd scatterlist memory.

This addresses an issue where FILEIO backends with loopback where hitting the
following OOPs with mkfs.ext2:

|kernel BUG at include/linux/scatterlist.h:97!
|invalid opcode: 0000 [#1] PREEMPT SMP
|Modules linked in: sd_mod tcm_loop target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs scsi_mod
|
|Pid: 671, comm: LIO_fileio Not tainted 3.1.0-rc10+ #139 Bochs Bochs
|EIP: 0060:[<e0afd746>] EFLAGS: 00010202 CPU: 0
|EIP is at fd_do_task+0x396/0x420 [target_core_file]
| [<e0aa7884>] __transport_execute_tasks+0xd4/0x190 [target_core_mod]
| [<e0aa797c>] transport_execute_tasks+0x3c/0xf0 [target_core_mod]
|EIP: [<e0afd746>] fd_do_task+0x396/0x420 [target_core_file] SS:ESP 0068:dea47e90

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:57 +00:00
Jörn Engel 5c73b678f7 target: remove unused struct fields
Some are never used, some are set but never read, dev_hoq_count is
incremented and decremented, but never read.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:56 +00:00
Roland Dreier 1289a0571c target: Fix page length in emulated INQUIRY VPD page 86h
The LSB of the page length is at offset 3, not 2.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:56 +00:00
Roland Dreier 9b5cd7f37e target: Handle 0 correctly in transport_get_sectors_6()
SBC-3 says:

    A TRANSFER LENGTH field set to zero specifies that 256 logical
    blocks shall be written.  Any other value specifies the number
    of logical blocks that shall be written.

The old code was always just returning the value in the TRANSFER LENGTH
byte.  Fix this to return 256 if the byte is 0.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:55 +00:00
Roland Dreier 410f670202 target: Don't return an error status for 0-length READ and WRITE
IO commands with a TRANSFER LENGTH of 0 are not an error; for example,
for READ (10) and WRITE (10), SBC-3 says:

    A TRANSFER LENGTH field set to zero specifies that no logical blocks
    shall be read. This condition shall not be considered an error.

In case we have nothing to do, just complete the command with good status.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:55 +00:00
Thomas Meyer 1c3d5794fc iscsi-target: Use kmemdup rather than duplicating its implementation
The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:55 +00:00
Nicholas Bellinger 7ae0b1038f iscsi-target: Add missing F_BIT for iscsi_tm_rsp
This patch sets the missing ISCSI_FLAG_CMD_FINAL bit in
iscsit_send_task_mgt_rsp() for a struct iscsi_tm_rsp PDU.

This usage is hardcoded for all TM response PDUs in RFC-3720
section 10.6.

Reported-by: whucecil <whucecil1999@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:54 +00:00
Nicholas Bellinger 7e46cf0268 iscsi-target: Fix residual count hanlding + remove iscsi_cmd->residual_count
This patch fixes iscsi-target handling of underflow where residual data is
causing an OOPs by using the incorrect iscsi_cmd_t->data_length initially
assigned in iscsit_allocate_se_cmd().  It resets iscsi_cmd_t->data_length
from se_cmd_t->data_length after transport_generic_allocate_tasks()
has been invoked in iscsit_handle_scsi_cmd() RX context, and converts
iscsi_cmd->residual_count usage to access iscsi_cmd->se_cmd.residual_count
to get the proper residual count set by target-core.

Reported-by: <lists@internyc.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:54 +00:00
Nicholas Bellinger fef58a6096 target: Reject SCSI data overflow for fabrics using transport_generic_map_mem_to_cmd
This patch changes transport_generic_map_mem_to_cmd() to reject SCSI data
overflow and to send exception status with CHECK_CONDITION + TCM_INVALID_CDB_FIELD
for fabrics that are passing a pre-populated struct scatterlist (eg: tcm_loop
and iscsi-target) being mapped into se_cmd->t_data_sg and se_cmd->t_data_nents.

This addresses an OOPs where transport_allocate_data_tasks() would walk
the incorrect post OVERFLOW cmd->data_length value beyond the end of
the passed scatterlist.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:53 +00:00
Christoph Hellwig 6fd126ffeb target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fields
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:53 +00:00
Christoph Hellwig 33c3fafc43 target: remove the t_tasks_bidi se_cmd field
And use a SCF_BIDI flag instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:53 +00:00
Christoph Hellwig 2d3a4b51df target: remove the t_tasks_fua se_cmd field
And use a SCF_FUA flag instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:52 +00:00
Christoph Hellwig aad13ca20d target: remove the se_ordered_node se_cmd field
We never walk ordered_cmd_list in the se_device, so remove all code related
to supporting it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:52 +00:00
Christoph Hellwig 58a2801a4b target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fields
We already have a perfectly valid se_device pointer in the command, so
remove the mostly useless duplicates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:52 +00:00
Nicholas Bellinger 6297b07cbc target: Drop config_item_name usage in fabric TFO->free_wwn()
This patch removes config_item_name() informational usage of
TFO->free_wwn() treewide in loopback, tcm_fc, ib_srpt and
tcm_vhost module code.

Using v4 target_core_fabric_configfs.c logic, a fabric call for
config_item_name() in TFO->drop_wwn() context returns NULL as
target_fabric_drop_wwn() invoking config_item_put() ->
config_group_put() will release fabric_port->port_wwn.wwn_group
before the last config_item_put() -> TFO->drop_wwn() is
invoked.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:51 +00:00
Roland Dreier 97c34f3b04 target: Get rid of unused se_cmd_cache
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:51 +00:00
Bart Van Assche 5f655e8d2a target: Avoid compiler warnings about signed one-bit bitfields
Convert to unsigned bit fields for active I/O shutdown fields.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:50 +00:00
Bart Van Assche 330694a50f target: Improve system responsivity during I/O
While testing ib_srpt I noticed that the target system became
rather unresponsive during intensive I/O. The patch below made
my target system responsive again during I/O without decreasing
performance.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:50 +00:00
Nicholas Bellinger 0957627a99 iscsi-target: Fix sess allocation leak in iscsi_login_zero_tsih_s1
This patch adds missing kfree() for an allocation in iscsi_login_zero_tsih_s1()
code, and make transport_init_session() check for IS_ERR() returns.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:49 +00:00
Nicholas Bellinger 03e98c9eb9 target: Address legacy PYX_TRANSPORT_* return code breakage
This patch removes legacy usage of PYX_TRANSPORT_* return codes in a number
of locations and addresses cases where transport_generic_request_failure()
was returning the incorrect sense upon CHECK_CONDITION status after the
v3.1 converson to use errno return codes.

This includes the conversion of transport_generic_request_failure() to
process cmd->scsi_sense_reason and handle extra TCM_RESERVATION_CONFLICT
before calling transport_send_check_condition_and_sense() to queue up
response status.  It also drops PYX_TRANSPORT_OUT_OF_MEMORY_RESOURCES legacy
usgae, and returns TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE w/ a response
for these cases.

transport_generic_allocate_tasks(), transport_generic_new_cmd(), backend
SCF_SCSI_DATA_SG_IO_CDB ->do_task(), and emulated ->execute_task() have
all been updated to set se_cmd->scsi_sense_reason and return errno codes
universally upon failure.  This includes cmd->scsi_sense_reason assignment
in target_core_alua.c, target_core_pr.c and target_core_cdb.c emulation code.

Finally it updates fabric modules to remove the legacy usage, and for
TFO->new_cmd_map() callers forwards return values outside of fabric code.
iscsi-target has also been updated to remove a handful of special cases
related to the cleanup and signaling QUEUE_FULL handling w/ ft_write_pending()

(v2: Drop extra SCF_SCSI_CDB_EXCEPTION check during failure from
     transport_generic_new_cmd, and re-add missing task->task_error_status
     assignment in transport_complete_task)

Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06 06:00:49 +00:00
Linus Torvalds 5611cc4572 Linux 3.2-rc4 2011-12-01 14:56:01 -08:00
Linus Torvalds 0a4ebed781 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
  ocfs2: avoid unaligned access to dqc_bitmap
  ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
  ocfs2: honor O_(D)SYNC flag in fallocate
  ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
  ocfs2: send correct UUID to cleancache initialization
  ocfs2: Commit transactions in error cases -v2
  ocfs2: make direntry invalid when deleting it
  fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
  ocfs2: Avoid livelock in ocfs2_readpage()
  ocfs2: serialize unaligned aio
  ocfs2: Implement llseek()
  ocfs2: Fix ocfs2_page_mkwrite()
  ocfs2: Add comment about orphan scanning
  ocfs2: Clean up messages in the fs
  ocfs2/cluster: Cluster up now includes network connections too
  ocfs2/cluster: Add new function o2net_fill_node_map()
  ocfs2/cluster: Fix output in file elapsed_time_in_ms
  ocfs2/dlm: dlmlock_remote() needs to account for remastery
  ocfs2/dlm: Take inflight reference count for remotely mastered resources too
  ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
  ...
2011-12-01 14:55:34 -08:00
Akinobu Mita 939255798a ocfs2: avoid unaligned access to dqc_bitmap
The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-12-01 14:39:32 -08:00
Linus Torvalds 3b120ab762 Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: 7182/1: ARM cpu topology: fix warning
  ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
  ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
  ARM: 7177/1: GIC: avoid skipping non-existent PPIs in irq_start calculation
  ARM: 7176/1: cpu_pm: register GIC PM notifier only once
  ARM: 7175/1: add subname parameter to mfp_set_groupg callers
  ARM: 7174/1: Fix build error in kprobes test code on Thumb2 kernels
  ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
  ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
  ARM: 7170/2: fix compilation breakage in entry-armv.S
  ARM: 7168/1: use cache type functions for arch_get_unmapped_area
  ARM: perf: check that we have a platform device when reserving PMU
  ARM: 7166/1: Use PMD_SHIFT instead of PGDIR_SHIFT in dma-consistent.c
  ARM: 7165/2: PL330: Fix typo in _prepare_ccr()
  ARM: 7163/2: PL330: Only register usable channels
  ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds
  ARM: 7161/1: errata: no automatic store buffer drain
  ARM: perf: initialise used_mask for fake PMU during validation
  ARM: PMU: remove pmu_init declaration
  ARM: PMU: re-export release_pmu symbol to modules
2011-12-01 11:53:54 -08:00
Linus Torvalds b930c26416 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix meta data raid-repair merge problem
  Btrfs: skip allocation attempt from empty cluster
  Btrfs: skip block groups without enough space for a cluster
  Btrfs: start search for new cluster at the beginning
  Btrfs: reset cluster's max_size when creating bitmap
  Btrfs: initialize new bitmaps' list
  Btrfs: fix oops when calling statfs on readonly device
  Btrfs: Don't error on resizing FS to same size
  Btrfs: fix deadlock on metadata reservation when evicting a inode
  Fix URL of btrfs-progs git repository in docs
  btrfs scrub: handle -ENOMEM from init_ipath()
2011-12-01 08:28:53 -08:00
Jan Schmidt f4a8e6563e Btrfs: fix meta data raid-repair merge problem
Commit 4a54c8c16 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-01 09:30:36 -05:00
Linus Torvalds 11d814a201 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Fix RCU lockdep splats
  IB/ipoib: Prevent hung task or softlockup processing multicast response
  IB/qib: Fix over-scheduling of QSFP work
  RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2
  RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic
  IB/qib: Don't use schedule_work()
2011-11-30 16:25:02 -08:00
Linus Torvalds c290b2f2b0 Merge branch 'dt-for-linus' of git://sources.calxeda.com/kernel/linux
* 'dt-for-linus' of git://sources.calxeda.com/kernel/linux:
  of: Add Silicon Image vendor prefix
  of/irq: of_irq_init: add check for parent equal to child node
2011-11-30 16:24:43 -08:00
Linus Torvalds d6e92d360c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: twl: fix twl4030 support for smps regulators
  regulator: fix use after free bug
  regulator: aat2870: Fix the logic of checking if no id is matched in aat2870_get_regulator
2011-11-30 16:24:24 -08:00
Linus Torvalds cd5b49bce3 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (45 commits)
  ARM: ux500: update defconfig
  ARM: u300: update defconfig
  ARM: at91: enable additional boards in existing soc defconfig files
  ARM: at91: refresh soc defconfig files for 3.2
  ARM: at91: rename defconfig files appropriately
  ARM: OMAP2+: Fix Compilation error when omap_l3_noc built as module
  ARM: OMAP2+: Remove empty io.h
  ARM: OMAP2: select ARM_AMBA if OMAP3_EMU is defined
  ARM: OMAP: smartreflex: fix IRQ handling bug
  ARM: OMAP: PM: only register TWL with voltage layer when device is present
  ARM: OMAP: hwmod: Fix the addr space, irq, dma count APIs
  arm: mx28: fix bit operation in clock setting
  ARM: imx: export imx_ioremap
  ARM: imx/mm-imx3: conditionally compile i.MX31 and i.MX35 code
  ARM: mx5: Fix checkpatch warnings in cpu-imx5.c
  MAINTAINERS: Add missing directory
  ARM: imx: drop 'ARCH_MX31' and 'ARCH_MX35'
  ARM: imx6q: move clock register map to machine_desc.map_io
  ARM: pxa168/gplugd: add the correct SSP device
  ARM: Update mach-types to fix mxs build breakage
  ...
2011-11-30 16:23:59 -08:00
Vincent Guittot 4cbd6b167f ARM: 7182/1: ARM cpu topology: fix warning
kernel/sched.c:7354:2: warning: initialization from incompatible pointer type

Align cpu_coregroup_mask prototype interface with sched_domain_mask_f typedef
use int cpu instead of unsigned int cpu

Cc: <stable@vger.kernel.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-11-30 23:55:21 +00:00
Jon Medhurst (Tixy) b5bed7fe80 ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
The SWP instruction is deprecated on ARMv6 and with ARMv7 it will be
UNDEFINED when CONFIG_SWP_EMULATE is selected. In this case, probing a
SWP instruction will cause an oops when the kprobes emulation code
executes an undefined instruction.

As the SWP instruction should be rare or non-existent in kernels for
ARMv6 and later, we can simply avoid these problems by not allowing
probing of these.

Reported-by: Leif Lindholm <leif.lindholm@arm.com>
Tested-by: Leif Lindholm <leif.lindholm@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-11-30 23:54:54 +00:00
Jon Medhurst (Tixy) 14383c295a ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
There is a kprobes testcase for the instruction "strd r2, [r3], r4".
This has unpredictable behaviour as it uses r3 for register writeback
addressing and also stores it to memory.

On a cortex A9, this testcase would fail because the instruction writes
the updated value of r3 to memory, whereas the kprobes emulation code
writes the original value.

Fix this by changing testcase to used r5 instead of r3.

Reported-by: Leif Lindholm <leif.lindholm@arm.com>
Tested-by: Leif Lindholm <leif.lindholm@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-11-30 23:54:53 +00:00
Alexandre Oliva be064d1139 Btrfs: skip allocation attempt from empty cluster
If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva 425d83156c Btrfs: skip block groups without enough space for a cluster
We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing.  Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva 1b22bad779 Btrfs: start search for new cluster at the beginning
Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva b78d09bceb Btrfs: reset cluster's max_size when creating bitmap
The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk.  We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva f2d0f6765d Btrfs: initialize new bitmaps' list
We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation.  For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.
2011-11-30 18:46:06 +01:00
Li Zefan b772a86ea6 Btrfs: fix oops when calling statfs on readonly device
To reproduce this bug:

  # dd if=/dev/zero of=img bs=1M count=256
  # mkfs.btrfs img
  # losetup -r /dev/loop1 img
  # mount /dev/loop1 /mnt
  OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

  # df -h /dev/loop1
  Filesystem            Size  Used Avail Use% Mounted on
  /dev/loop1            250M   28K  238M   1% /mnt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-11-30 18:46:05 +01:00
Mike Fleetwood ece7d20e8b Btrfs: Don't error on resizing FS to same size
It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed.  User app GParted trips
over this error.  Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>
2011-11-30 18:46:04 +01:00
Miao Xie aa38a711a8 Btrfs: fix deadlock on metadata reservation when evicting a inode
When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
   start transaction
        |
	v
   reserve meta-data space
	|
	v
   flush delay allocation -> iput inode -> evict inode
	^					|
	|					v
   wait for delay allocation flush <- reserve meta-data space

And besides that, the flush on evicting inode will block the thread, which
is reclaiming the memory, and make oom happen easily.

Fix this bug by skipping the flush step when evicting inode.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2011-11-30 18:46:03 +01:00
Arnd Hannemann b52f75a595 Fix URL of btrfs-progs git repository in docs
The location of the btrfs-progs repository has been changed.
	This patch updates the documentation accordingly.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
2011-11-30 18:46:02 +01:00
Dan Carpenter 26bdef541d btrfs scrub: handle -ENOMEM from init_ipath()
init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-11-30 18:46:01 +01:00
Roland Dreier a493f1a24a Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next 2011-11-29 18:01:53 -08:00
Linus Torvalds 8cd7920370 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: Update comments describing device power management callbacks
  PM / Sleep: Update documentation related to system wakeup
  PM / Runtime: Make documentation follow the new behavior of irq_safe
  PM / Sleep: Correct inaccurate information in devices.txt
  PM / Domains: Document how PM domains are used by the PM core
  PM / Hibernate: Do not leak memory in error/test code paths
2011-11-29 14:43:22 -08:00
Eric Dumazet 580da35a31 IB: Fix RCU lockdep splats
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:37:11 -08:00
Mike Marciniszyn 3874397c0b IB/ipoib: Prevent hung task or softlockup processing multicast response
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
    [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
    [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
    [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
    [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
    [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
    [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
    [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
    [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
    [<ffffffff8100c1ca>] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv->pd, &av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast->ah) {
                if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&mcast->pkt_queue, skb);
                else {
                        ++dev->stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:20:02 -08:00
Linus Torvalds 57db53b074 Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
  slub: avoid potential NULL dereference or corruption
  slub: use irqsafe_cpu_cmpxchg for put_cpu_partial
  slub: move discard_slab out of node lock
  slub: use correct parameter to add a page to partial list tail
2011-11-29 11:13:22 -08:00
Linus Torvalds 883381d9f1 Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix racy use-after-free in ext4_end_io_dio()
2011-11-29 08:59:12 -08:00