linux

Commit Graph

Author	SHA1	Message	Date
Tejun Heo	cd0aca2d55	block: fix queue bounce limit setting Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily All DMA address limits are expressed in terms of the last addressable unit (byte or page) instead of one plus that. However, when determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(), it compares the specified limit against 0x100000000UL to determine whether it's below 4G ending up falsely setting GFP_DMA in q->bounce_gfp. As DMA zone is very small on x86_64, this makes larger SG_IO transfers very eager to trigger OOM killer. Fix it. While at it, rename the parameter to @dma_mask for clarity and convert comment to proper winged style. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-22 08:35:09 +02:00
Alan Cox	8feb4d20b4	pata_artop: typo Fix a typo (this was in the original patch but was not merged when the code fixes were for some reason) Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-04-06 20:00:29 -04:00
FUJITA Tomonori	18af8b2ca3	block: use min_not_zero in blk_queue_stack_limits zero is invalid for max_phys_segments, max_hw_segments, and max_segment_size. It's better to use use min_not_zero instead of min. min() works though (because the commit `0e435ac` makes sure that these values are set to the default values, non zero, if a queue is initialized properly). With this patch, blk_queue_stack_limits does the almost same thing that dm's combine_restrictions_low() does. I think that it's easy to remove dm's combine_restrictions_low. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Milan Broz	0e435ac26e	block: fix setting of max_segment_size and seg_boundary mask Fix setting of max_segment_size and seg_boundary mask for stacked md/dm devices. When stacking devices (LVM over MD over SCSI) some of the request queue parameters are not set up correctly in some cases by default, namely max_segment_size and and seg_boundary mask. If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping - queue attributes are set in DM this way: request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these parameters. During the generic_make_request() is segment cout recalculated and can increase bio->bi_phys_segments count over the allowed limit. (After bio_clone() in stack operation.) Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages (last page uses only 1024 bytes). For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS restriction and triggers BUG_ON(). The patch tries to fix it by: * initializing attributes above in queue request constructor blk_queue_make_request() * make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it blk_queue_stack_limits() was introduced later. It should probably switch to use generic stack limit function too.) * sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=471639 http://bugzilla.kernel.org/show_bug.cgi?id=8672 Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-03 12:55:55 +01:00
Peter Zijlstra	713ada9ba9	block: move q->unplug_work initialization modprobe loop; rmmod loop effectively creates a blk_queue and destroys it which results in q->unplug_work being canceled without it ever being initialized. Therefore, move the initialization of q->unplug_work from blk_queue_make_request() to blk_alloc_queue*(). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 08:46:57 +02:00
Kiyoshi Ueda	ef9e3facdf	block: add lld busy state exporting interface This patch adds an new interface, blk_lld_busy(), to check lld's busy state from the block layer. blk_lld_busy() calls down into low-level drivers for the checking if the drivers set q->lld_busy_fn() using blk_queue_lld_busy(). This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when they detect busy state on its low-level device like host/target/device. It allows other requests to stay in the I/O scheduler's queue for a chance of merging. Request stacking drivers like request-based dm should follow the same logic. However, there is no generic interface for the stacked device to check if the underlying device(s) are busy. If the request stacking driver dispatches and submits requests to the busy underlying device, the requests will stay in the underlying device's queue without a chance of merging. This causes performance problem on burst I/O load. With this patch, busy state of the underlying device is exported via q->lld_busy_fn(). So the request stacking driver can check it and stop dispatching requests if busy. The underlying device driver must return the busy state appropriately: 1: when the device driver can't process requests immediately. 0: when the device driver can process requests immediately, including abnormal situations where the device driver needs to kill all requests. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:20 +02:00
Jens Axboe	242f9dcb8b	block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:13 +02:00
Harvey Harrison	aeb3d3a81e	block: kmalloc args reversed, small function definition fixes Noticed by sparse: block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static? block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types) block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size block/genhd.c:659:17: got restricted gfp_t block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types) block/genhd.c:659:29: expected restricted gfp_t [usertype] flags block/genhd.c:659:29: got unsigned int block: kmalloc args reversed Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:11 +02:00
Jens Axboe	c7c22e4d5c	block: add support for IO CPU affinity This patch adds support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. We export a sysfs variable (rq_affinity) which, if set, migrates completions of requests to the CPU that originally submitted it. A bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired. This requires a little help from the architecture, so it'll only work as designed for archs that are using the new generic smp helper infrastructure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Randy Dunlap	710027a48e	Add some block/ source files to the kernel-api docbook. Fix kernel-doc notation in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:03 +02:00
David Woodhouse	fb2dce862d	Add 'discard' request handling Some block devices benefit from a hint that they can forget the contents of certain sectors. Add basic support for this to the block core, along with a 'blkdev_issue_discard()' helper function which issues such requests. The caller doesn't get to provide an end_io functio, since blkdev_issue_discard() will automatically split the request up into multiple bios if appropriate. Neither does the function wait for completion -- it's expected that callers won't care about when, or even _if_, the request completes. It's only a hint to the device anyway. By definition, the file system doesn't _care_ about these sectors any more. [With feedback from OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> and Jens Axboe <jens.axboe@oracle.com] Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:01 +02:00
FUJITA Tomonori	27f8221af4	block: add blk_queue_update_dma_pad This adds blk_queue_update_dma_pad to prevent LLDs from overwriting the dma pad mask wrongly (we added blk_queue_update_dma_alignment due to the same reason). This also converts libata to use blk_queue_update_dma_pad instead of blk_queue_dma_pad. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-04 09:52:13 +02:00
Neil Brown	e7e72bf641	Remove blkdev warning triggered by using md As setting and clearing queue flags now requires that we hold a spinlock on the queue, and as blk_queue_stack_limits is called without that lock, get the lock inside blk_queue_stack_limits. For blk_queue_stack_limits to be able to find the right lock, each md personality needs to set q->queue_lock to point to the appropriate lock. Those personalities which didn't previously use a spin_lock, us q->__queue_lock. So always initialise that lock when allocated. With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no longer cause warnings as it will be clear that the proper lock is held. Thanks to Dan Williams for review and fixing the silly bugs. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Alistair John Strachan <alistair@devzero.co.uk> Cc: Nick Piggin <npiggin@suse.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jacek Luczak <difrost.kernel@gmail.com> Cc: Prakash Punnoor <prakash@punnoor.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-14 19:11:15 -07:00
Harvey Harrison	24c03d47d0	block: remove remaining __FUNCTION__ occurrences __FUNCTION__ is gcc specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:04:02 -07:00
Nick Piggin	75ad23bc0f	block: make queue flags non-atomic We can save some atomic ops in the IO path, if we clearly define the rules of how to modify the queue flags. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:33 +02:00
Adrian Bunk	657e93be35	unexport blk_max_pfn blk_max_pfn can now be unexported. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 09:50:34 +02:00
Andrea Arcangeli	00d61e3e8c	Fix bounce setting for 64-bit Looking a bit closer into this regression the reason this can't be right is that dma_addr common default is BLK_BOUNCE_HIGH and most machines have less than 4G. So if you do: if (b_pfn <= (min_t(u64, 0xffffffff, BLK_BOUNCE_HIGH) >> PAGE_SHIFT)) dma = 1 that will translate to: if (BLK_BOUNCE_HIGH <= BLK_BOUNCE_HIGH) dma = 1 So for 99% of hardware this will trigger unnecessary GFP_DMA allocations and isa pooling operations. Also note how the 32bit code still does b_pfn < blk_max_low_pfn. I guess this is what you were looking after. I didn't verify but as far as I can tell, this will stop the regression with isa dma operations at boot for 99% of blkdev/memory combinations out there and I guess this fixes the setups with >4G of ram and 32bit pci cards as well (this also retains symmetry with the 32bit code). Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-02 09:06:44 +02:00
Harvey Harrison	448da4d262	block: remove extern on function definition Intoduced between 2.6.25-rc2 and -rc3 block/blk-settings.c:319:12: warning: function 'blk_queue_dma_drain' with external linkage has definition Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:30:18 +01:00
Yang Shi	419c434c35	Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory For some non-x86 systems with 4GB or upper 4GB memory, we need increase the range of addresses that can be used for direct DMA in 64-bit kernel. Signed-off-by: Yang Shi <yang.shi@windriver.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:20:51 +01:00
Tejun Heo	e3790c7d42	block: separate out padding from alignment Block layer alignment was used for two different purposes - memory alignment and padding. This causes problems in lower layers because drivers which only require memory alignment ends up with adjusted rq->data_len. Separate out padding such that padding occurs iff driver explicitly requests it. Tomo: restorethe code to update bio in blk_rq_map_user introduced by the commit `40b01b9bbd` according to padding alignment. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:18:17 +01:00
Randy Dunlap	5d87a052c7	block: fix kernel-docbook parameters and files kernel-doc for block/: - add missing parameters - fix one function's parameter list (remove blank line) - add 2 source files to docbook for non-exported kernel-doc functions Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:14:39 +01:00
Tejun Heo	2fb98e8414	block: implement request_queue->dma_drain_needed Draining shouldn't be done for commands where overflow may indicate data integrity issues. Add dma_drain_needed callback to request_queue. Drain buffer is appened iff this function returns non-zero. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-19 11:36:53 +01:00
Adrian Bunk	52ff4cae65	make blk_settings_init() static blk_settings_init() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2008-02-19 10:04:00 +01:00
Jens Axboe	6728cb0e63	block: make core bits checkpatch compliant Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 09:26:33 +01:00
Jens Axboe	86db1e2977	block: continue ll_rw_blk.c splitup Adds files for barrier handling, rq execution, io context handling, mapping data to requests, and queue settings. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-29 21:55:08 +01:00

25 Commits