linux

History

Michal Hocko eb48c07146 mm: hugetlbfs: correctly populate shared pmd Each page mapped in a process's address space must be correctly accounted for in _mapcount. Normally the rules for this are straightforward but hugetlbfs page table sharing is different. The page table pages at the PMD level are reference counted while the mapcount remains the same. If this accounting is wrong, it causes bugs like this one reported by Larry Woodman: kernel BUG at mm/filemap.c:135! invalid opcode: 0000 [#1] SMP CPU 22 Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi] Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2 RIP: 0010:[<ffffffff8112cfed>] [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170 Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20) Call Trace: delete_from_page_cache+0x40/0x80 truncate_hugepages+0x115/0x1f0 hugetlbfs_evict_inode+0x18/0x30 evict+0x9f/0x1b0 iput_final+0xe3/0x1e0 iput+0x3e/0x50 d_kill+0xf8/0x110 dput+0xe2/0x1b0 __fput+0x162/0x240 During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc() shared page tables with the check dst_pte == src_pte. The logic is if the PMD page is the same, they must be shared. This assumes that the sharing is between the parent and child. However, if the sharing is with a different process entirely then this check fails as in this diagram: parent \| ------------>pmd src_pte----------> data page ^ other--------->pmd--------------------\| ^ child-----------\| dst_pte For this situation to occur, it must be possible for Parent and Other to have faulted and failed to share page tables with each other. This is possible due to the following style of race. PROC A PROC B copy_hugetlb_page_range copy_hugetlb_page_range src_pte == huge_pte_offset src_pte == huge_pte_offset !src_pte so no sharing !src_pte so no sharing (time passes) hugetlb_fault hugetlb_fault huge_pte_alloc huge_pte_alloc huge_pmd_share huge_pmd_share LOCK(i_mmap_mutex) find nothing, no sharing UNLOCK(i_mmap_mutex) LOCK(i_mmap_mutex) find nothing, no sharing UNLOCK(i_mmap_mutex) pmd_alloc pmd_alloc LOCK(instantiation_mutex) fault UNLOCK(instantiation_mutex) LOCK(instantiation_mutex) fault UNLOCK(instantiation_mutex) These two processes are not poing to the same data page but are not sharing page tables because the opportunity was missed. When either process later forks, the src_pte == dst pte is potentially insufficient. As the check falls through, the wrong PTE information is copied in (harmless but wrong) and the mapcount is bumped for a page mapped by a shared page table leading to the BUG_ON. This patch addresses the issue by moving pmd_alloc into huge_pmd_share which guarantees that the shared pud is populated in the same critical section as pmd. This also means that huge_pte_offset test in huge_pmd_share is serialized correctly now which in turn means that the success of the sharing will be higher as the racing tasks see the pud and pmd populated together. Race identified and changelog written mostly by Mel Gorman. {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style] Reported-by: Larry Woodman <lwoodman@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Ken Chen <kenchen@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2012-08-21 16:45:02 -07:00
..
alpha	alpha: Fix fall-out from disintegrating asm/system.h	2012-08-19 08:41:19 -07:00
arm	Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm	2012-08-18 16:20:05 -07:00
avr32	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
blackfin	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
c6x	Enable atomic64 ops in C6X	2012-08-17 08:10:12 -07:00
cris	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
frv	Merge branch 'akpm' (Andrew's patch-bomb)	2012-07-30 17:25:34 -07:00
h8300	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
hexagon	Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild	2012-07-30 11:24:53 -07:00
ia64	[IA64] defconfig: Remove CONFIG_MISC_DEVICES	2012-08-20 13:04:29 -07:00
m32r	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
m68k	m68k: select CONFIG_GENERIC_ATOMIC64 for all m68k CPU types	2012-08-17 10:04:24 +10:00
microblaze	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
mips	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	2012-08-01 16:47:15 -07:00
mn10300	Merge branch 'akpm' (Andrew's patch-bomb)	2012-07-30 17:25:34 -07:00
openrisc	Remove useless wrappers of asm-generic/rmap.h	2012-06-28 11:29:26 +02:00
parisc	PCI changes for the 3.6 merge window:	2012-07-24 16:17:07 -07:00
powerpc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2012-08-01 10:26:23 -07:00
s390	s390/compat: fix mmap compat system calls	2012-08-08 07:32:57 -07:00
score	new helper: signal_delivered()	2012-06-01 12:58:52 -04:00
sh	Merge branches 'sh/urgent' and 'sh/gpiolib' into sh-latest	2012-08-09 13:21:13 +09:00
sparc	sparc64: Be less verbose during vmemmap population.	2012-08-15 00:37:29 -07:00
tile	memcg: rename config variables	2012-07-31 18:42:43 -07:00
um	Merge branch 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml	2012-08-01 16:45:02 -07:00
unicore32	PCI changes for the 3.6 merge window:	2012-07-24 16:17:07 -07:00
x86	mm: hugetlbfs: correctly populate shared pmd	2012-08-21 16:45:02 -07:00
xtensa	xtensa: select generic atomic64_t support	2012-07-31 18:42:39 -07:00
.gitignore	…
Kconfig	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00