linux

History

Naoya Horiguchi 54b9dd14d0 mm/memory-failure.c: shift page lock from head page to tail page after thp split After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked\|referenced\|uptodate\|dirty\|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 Call Trace: dump_stack+0x19/0x1b bad_page.part.59+0xcf/0xe8 free_pages_prepare+0x148/0x160 free_hot_cold_page+0x31/0x140 free_hot_cold_page_list+0x46/0xa0 release_pages+0x1c1/0x200 free_pages_and_swap_cache+0xad/0xd0 tlb_flush_mmu.part.46+0x4c/0x90 tlb_finish_mmu+0x55/0x60 exit_mmap+0xcb/0x170 mmput+0x67/0xf0 vhost_dev_cleanup+0x231/0x260 [vhost_net] vhost_net_release+0x3f/0x90 [vhost_net] __fput+0xe9/0x270 ____fput+0xe/0x10 task_work_run+0xc4/0xe0 do_exit+0x2bb/0xa40 do_group_exit+0x3f/0xa0 get_signal_to_deliver+0x1d0/0x6e0 do_signal+0x48/0x5e0 do_notify_resume+0x71/0xc0 retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.9+] # `a3e0f9e47d` "mm/memory-failure.c: transfer page count from head page to tail page after split thp" Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-01-23 16:36:52 -08:00
..
backing-dev.c
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
bounce.c
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
compaction.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
filemap.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
fremap.c	mm: fix use-after-free in sys_remap_file_pages	2014-01-02 14:40:30 -08:00
frontswap.c
highmem.c
huge_memory.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
hugetlb_cgroup.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
hugetlb.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
hwpoison-inject.c	mm/hwpoison: add '#' to hwpoison_inject	2014-01-21 16:19:48 -08:00
init-mm.c
internal.h	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
interval_tree.c
Kconfig	mm: add missing dependency in Kconfig	2013-12-18 19:04:52 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c	mm: kmemleak: avoid false negatives on vmalloc'ed objects	2013-11-13 12:09:07 +09:00
ksm.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
list_lru.c	mm: list_lru: fix almost infinite loop causing effective livelock	2013-10-30 12:57:46 -07:00
maccess.c
madvise.c
Makefile
memblock.c	mm: free memblock.memory in free_all_bootmem	2014-01-23 16:36:51 -08:00
memcontrol.c	memcg: rework memcg_update_kmem_limit synchronization	2014-01-23 16:36:51 -08:00
memory_hotplug.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
memory-failure.c	mm/memory-failure.c: shift page lock from head page to tail page after thp split	2014-01-23 16:36:52 -08:00
memory.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mempolicy.c	numa: add a sysctl for numa_balancing	2014-01-23 16:36:51 -08:00
mempool.c
migrate.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mincore.c
mlock.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mm_init.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mmap.c	mm/mmap.c: add mlock_future_check() helper	2014-01-21 16:19:44 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: numa: do not automatically migrate KSM pages	2014-01-21 16:19:48 -08:00
mremap.c	mm: revert mremap pud_free anti-fix	2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c	mm: free memblock.memory in free_all_bootmem	2014-01-23 16:36:51 -08:00
nommu.c	mm: add overcommit_kbytes sysctl variable	2014-01-21 16:19:44 -08:00
oom_kill.c	oom_kill: add rcu_read_lock() into find_lock_task_mm()	2014-01-21 16:19:46 -08:00
page_alloc.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
page_cgroup.c	Merge branch 'akpm' (incoming from Andrew)	2014-01-21 19:05:45 -08:00
page_io.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
page_isolation.c
page-writeback.c	writeback: fix negative bdi max pause	2013-10-16 21:35:53 -07:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c	Merge branch 'akpm' (incoming from Andrew)	2014-01-21 19:05:45 -08:00
pgtable-generic.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
process_vm_access.c
quicklist.c
readahead.c	readahead: fix sequential read cache miss detection	2013-11-13 12:09:09 +09:00
rmap.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
shmem.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
slab_common.c	slab: do not panic if we fail to create memcg cache	2014-01-23 16:36:51 -08:00
slab.c	Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2013-11-22 08:10:34 -08:00
slab.h	memcg, slab: RCU protect memcg_params for root caches	2014-01-23 16:36:51 -08:00
slob.c
slub.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
swap_state.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
swap.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
swapfile.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
truncate.c
util.c	mm: add overcommit_kbytes sysctl variable	2014-01-21 16:19:44 -08:00
vmalloc.c	mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}	2014-01-21 16:19:44 -08:00
vmpressure.c	memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state	2013-11-22 18:20:43 -05:00
vmscan.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
vmstat.c	mm: numa: return the number of base pages altered by protection changes	2013-11-13 12:09:11 +09:00
zbud.c
zswap.c	mm/zswap.c: change params from hidden to ro	2014-01-23 16:36:50 -08:00