linux/drivers
Hugh Dickins 164fc5dcd6 scsi: fix sense_slab/bio swapping livelock
Since 2.6.25-rc7, I've been seeing an occasional livelock on one x86_64
machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written; most
tasks of interest trying to reclaim, but each get_swap_bio waiting for a
bio in mempool_alloc's io_schedule_timeout(5*HZ); every five seconds an
atomic page allocation failure report from kblockd failing to allocate a
sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against this,
but rc1's [SCSI] use dynamically allocated sense buffer
de25deb180 upset that slightly.  When it
fails to allocate from the separate sense_slab, instead of giving up, it
must fall back to the command free_list, which is sure to have a
sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which merges
slab caches when it can, and on 64-bit happens to merge both bio cache
and sense_slab cache into kmalloc's 128-byte cache: so that under this
swapping load, bios above are liable to gobble up all the slots needed
for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.  Adding
a no-op constructor to the sense_slab inhibits SLUB from merging it, and
stops all the allocation failures I was seeing; but it's rather a hack,
and perhaps in different configurations we have other caches on the
swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from the one
kmem_cache; but that might waste more memory, and is only a way of
diverting around the known problem.

While I don't like seeing the allocation failures, and hate the idea of
all those bios piled up above a scsi host working one by one, it does
seem to emerge fairly soon with the livelock fix.  So lacking better
ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.ziljstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-06 16:10:08 -07:00
..
acorn/char
acpi Revert "ACPI: Ignore _BQC object when registering backlight device" 2008-04-05 12:14:13 -07:00
amba
ata pata_ali: disable ATAPI DMA 2008-04-04 02:43:38 -04:00
atm [ATM] drivers/atm/iphase.c: compilation warning fix 2008-04-02 00:03:00 -07:00
auxdisplay
base driver core: fix small mem leak in driver_add_kobj() 2008-03-28 14:45:23 -07:00
block nbd: prevent sock_xmit from attempting to use a NULL socket 2008-04-02 15:28:19 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-03-12 13:08:09 -07:00
cdrom
char x86: revert assign IRQs to hpet timer 2008-04-04 18:36:49 +02:00
clocksource
connector connector: convert to single-threaded workqueue 2008-03-23 21:51:12 -07:00
cpufreq [CPUFREQ] fix section mismatch warnings 2008-03-05 14:45:31 -05:00
cpuidle cpuidle: fix 100% C0 statistics regression 2008-03-26 00:58:19 -04:00
crypto drivers/crypto/hifn_795x.c trivial endianness annotations 2008-03-30 14:20:24 -07:00
dca
dio
dma [POWERPC] fsldma: Use compatiable binding as spec 2008-03-31 11:45:41 -05:00
edac
eisa
firewire firewire: fw-ohci: plug dma memory leak in AR handler 2008-03-27 21:01:14 +01:00
firmware ipmi: change device node ordering to reflect probe order 2008-04-04 14:46:26 -07:00
gpio gpio/pca953x bugfix: mark as can_sleep 2008-03-10 18:01:19 -07:00
hid HID: update key codes for Apple aluminium 2008-03-18 11:20:33 +01:00
hwmon hwmon: (w83781d) Fix I/O resource conflict with PNP 2008-03-27 08:40:41 -04:00
i2c i2c: Fix docbook problem 2008-03-23 20:28:20 +01:00
ide ide: use ->ata_input_data in ide_driveid_update() 2008-04-02 21:22:05 +02:00
ieee1394 ieee1394: sbp2: fix for SYM13FW500 bridge (Datafab disk) 2008-03-14 00:56:59 +01:00
infiniband trivial endianness annotations: infiniband core 2008-03-30 14:20:24 -07:00
input Input: appletouch - add product IDs for the 4th generation MacBooks 2008-04-02 10:14:29 -04:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-03-12 13:08:09 -07:00
leds leds: Remove incorrect use of preempt_count() from leds-gpio 2008-03-31 23:31:13 +01:00
lguest misc __user misannotations (pointless casts to long) 2008-03-30 14:20:23 -07:00
macintosh [POWERPC] Fix build of modular drivers/macintosh/apm_emu.c 2008-03-13 10:09:27 +11:00
mca
md dm io: write error bits form long not int 2008-03-28 14:45:23 -07:00
media V4L/DVB (7486): radio-cadet: wrap PNP probe code in #ifdef CONFIG_PNP 2008-04-01 19:35:47 -03:00
memstick memstick: suppress uninitialized-var warning 2008-03-28 14:45:23 -07:00
message [SCSI] mpt fusion: Power Management fixes for MPT SAS PCI-E controllers 2008-03-18 15:13:40 -05:00
mfd mfd/asic3: ioread/iowrite take pointer, not unsigned long 2008-03-30 14:20:24 -07:00
misc NULL noise: drivers/misc 2008-03-30 14:18:41 -07:00
mmc mmc: use sysfs groups to handle conditional attributes 2008-03-22 17:02:20 -07:00
mtd mtd: fix broken state in CFI driver caused by FL_SHUTDOWN 2008-04-04 14:46:26 -07:00
net Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2008-04-03 15:41:10 -07:00
nubus
of
oprofile
parisc [PARISC] make ptr_to_pide() static 2008-03-15 19:17:12 -07:00
parport parport_pc: make sure to release IO ports after probing for IT87XX 2008-04-04 14:30:31 -07:00
pci pci: revert SMBus unhide on HP Compaq nx6110 2008-03-28 14:45:22 -07:00
pcmcia
pnp pnpacpi: reduce printk severity for "pnpacpi: exceeded the max number of ..." 2008-03-26 14:22:20 -04:00
power
ps3
rapidio
rtc rtc-at91sam9 fixes 2008-03-19 18:53:37 -07:00
s390 [S390] zcrypt: fix ap_device_list handling 2008-03-05 12:37:19 +01:00
sbus
scsi scsi: fix sense_slab/bio swapping livelock 2008-04-06 16:10:08 -07:00
serial atmel_serial: fix uart/console concurrent access 2008-04-02 15:28:19 -07:00
sh
sn ioc3.c: replace remaining __FUNCTION__ occurrences 2008-03-17 08:11:48 -04:00
spi spi_bitbang: short transfer status fix 2008-03-13 13:11:43 -07:00
ssb
tc
telephony
thermal thermal: delete "default y" 2008-03-18 01:22:10 -04:00
uio UIO: add pgprot_noncached() to UIO mmap code 2008-03-24 22:33:49 -07:00
usb USB: ohci: fix 2 timers to fire at jiffies + 1s 2008-04-02 15:06:09 -07:00
video blackfin video driver: fix bug when opening/reading/mmaping BF54x and BF52x framebuffer simultaneously 2008-03-28 14:45:22 -07:00
virtio virtio_pci iomem annotations 2008-03-30 14:20:23 -07:00
w1
watchdog [WATCHDOG] Fix it8712f_wdt.c wrong byte order accessing WDT_TIMEOUT 2008-04-01 11:31:05 -07:00
xen xen: fix grant table bug 2008-04-04 18:36:46 +02:00
zorro
Kconfig
Makefile