linux/mm
Hugh Dickins 1b1b32f2c6 tmpfs: fix shmem_swaplist races
Intensive swapoff testing shows shmem_unuse spinning on an entry in
shmem_swaplist pointing to itself: how does that come about?  Days pass...

First guess is this: shmem_delete_inode tests list_empty without taking the
global mutex (so the swapping case doesn't slow down the common case); but
there's an instant in shmem_unuse_inode's list_move_tail when the list entry
may appear empty (a rare case, because it's actually moving the head not the
the list member).  So there's a danger of leaving the inode on the swaplist
when it's freed, then reinitialized to point to itself when reused.  Fix that
by skipping the list_move_tail when it's a no-op, which happens to plug this.

But this same spinning then surfaces on another machine.  Ah, I'd never
suspected it, but shmem_writepage's swaplist manipulation is unsafe: though we
still hold page lock, which would hold off inode deletion if the page were in
pagecache, it doesn't hold off once it's in swapcache (free_swap_and_cache
doesn't wait on locked pages).  Hmm: we could put the the inode on swaplist
earlier, but then shmem_unuse_inode could never prune unswapped inodes.

Fix this with an igrab before dropping info->lock, as in shmem_unuse_inode;
though I am a little uneasy about the iput which has to follow - it works, and
I see nothing wrong with it, but it is surprising that shmem inode deletion
may now occur below shmem_writepage.  Revisit this fix later?

And while we're looking at these races: the way shmem_unuse tests swapped
without holding info->lock looks unsafe, if we've more than one swap area: a
racing shmem_writepage on another page of the same inode could be putting it
in swapcache, just as we're deciding to remove the inode from swaplist -
there's a danger of going on swap without being listed, so a later swapoff
would hang, being unable to locate the entry.  Move that test and removal down
into shmem_unuse_inode, once info->lock is held.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:16 -08:00
..
Kconfig sh: Bump number of quicklists for SH-5. 2008-01-28 13:18:55 +09:00
Makefile
allocpercpu.c
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c
bounce.c
fadvise.c
filemap.c fix writev regression: pan hanging unkillable and un-straceable 2008-02-03 07:55:39 +11:00
filemap_xip.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
fremap.c sys_remap_file_pages: fix ->vm_file accounting 2008-02-05 09:44:07 -08:00
highmem.c
hugetlb.c fix hugepages leak due to pagetable page sharing 2008-01-24 08:07:27 -08:00
internal.h
madvise.c
memory.c swapin needs gfp_mask for loop on tmpfs 2008-02-05 09:44:14 -08:00
memory_hotplug.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
mempolicy.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
mempool.c
migrate.c Typo fixes retrun -> return 2007-10-20 02:13:26 +02:00
mincore.c
mlock.c
mmap.c vm audit: add VM_DONTEXPAND to mmap for drivers that need it 2008-02-04 07:55:38 -08:00
mmzone.c
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c
msync.c
nommu.c vmalloc: add const to void* parameters 2008-02-05 09:44:14 -08:00
oom_kill.c sched: sched_rt_entity 2008-01-25 21:08:27 +01:00
page-writeback.c Revert "writeback: introduce writeback_control.more_io to indicate more io" 2008-01-14 21:21:29 -08:00
page_alloc.c mm: fix section mismatch warning in page_alloc.c 2008-01-17 15:38:58 -08:00
page_io.c
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pdflush.c
prio_tree.c
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c
rmap.c [S390] Optimize storage key handling for anonymous pages 2007-11-20 11:13:46 +01:00
shmem.c tmpfs: fix shmem_swaplist races 2008-02-05 09:44:16 -08:00
shmem_acl.c
slab.c cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() 2008-01-25 21:08:02 +01:00
slob.c Avoid double memclear() in SLOB/SLUB 2007-12-09 10:17:52 -08:00
slub.c SLUB: Do not upset lockdep 2008-02-04 10:56:02 -08:00
sparse-vmemmap.c memory hotplug fix: fix section mismatch in vmammap_allock_block() 2007-11-29 09:24:54 -08:00
sparse.c is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
swap.c
swap_state.c tmpfs: move swap swizzling into shmem 2008-02-05 09:44:15 -08:00
swapfile.c tmpfs: open a window in shmem_unuse_inode 2008-02-05 09:44:15 -08:00
thrash.c
tiny-shmem.c
truncate.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c vmalloc: clean up page array indexing 2008-02-05 09:44:14 -08:00
vmscan.c
vmstat.c vmstat: fix section mismatch warning 2007-11-14 18:45:42 -08:00