Hugh Dickins 164fc5dcd6 scsi: fix sense_slab/bio swapping livelock
Since 2.6.25-rc7, I've been seeing an occasional livelock on one x86_64
machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written; most
tasks of interest trying to reclaim, but each get_swap_bio waiting for a
bio in mempool_alloc's io_schedule_timeout(5*HZ); every five seconds an
atomic page allocation failure report from kblockd failing to allocate a
sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against this,
but rc1's [SCSI] use dynamically allocated sense buffer
de25deb18016f66dcdede165d07654559bb332bc upset that slightly.  When it
fails to allocate from the separate sense_slab, instead of giving up, it
must fall back to the command free_list, which is sure to have a
sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which merges
slab caches when it can, and on 64-bit happens to merge both bio cache
and sense_slab cache into kmalloc's 128-byte cache: so that under this
swapping load, bios above are liable to gobble up all the slots needed
for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.  Adding
a no-op constructor to the sense_slab inhibits SLUB from merging it, and
stops all the allocation failures I was seeing; but it's rather a hack,
and perhaps in different configurations we have other caches on the
swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from the one
kmem_cache; but that might waste more memory, and is only a way of
diverting around the known problem.

While I don't like seeing the allocation failures, and hate the idea of
all those bios piled up above a scsi host working one by one, it does
seem to emerge fairly soon with the livelock fix.  So lacking better
ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.ziljstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-06 16:10:08 -07:00
..
2007-04-01 10:10:04 -05:00
2008-02-22 17:15:35 -06:00
2008-01-11 18:22:30 -06:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-02-06 10:41:01 -08:00
2008-02-06 10:41:01 -08:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-01-25 09:22:12 -06:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-02-06 10:41:01 -08:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-01-11 18:27:59 -06:00
2007-05-31 17:30:04 -04:00
2008-01-30 13:03:40 -06:00
2008-01-30 13:14:02 -06:00
2008-03-28 12:32:22 -05:00
2008-01-30 13:14:02 -06:00
2008-01-23 13:44:31 -06:00
2008-01-30 13:14:02 -06:00
2007-10-19 11:53:42 -07:00
2008-03-03 13:08:13 -06:00
2008-01-30 13:14:02 -06:00
2008-01-30 13:14:02 -06:00
2008-02-23 09:07:32 -06:00
2007-07-18 11:16:32 -05:00
2008-02-18 08:57:15 -06:00
2008-02-07 19:09:22 -08:00
2007-07-14 19:12:15 -05:00
2008-02-07 18:02:44 -06:00
2007-05-31 17:30:04 -04:00
2008-01-30 13:14:02 -06:00
2007-10-19 23:22:55 +02:00
2008-01-30 13:14:02 -06:00