linux/mm
Vlastimil Babka e47483bca2 mm, page_alloc: fix premature OOM when racing with cpuset mems update
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

The problem comes from insufficient protection against cpuset changes,
which can cause get_page_from_freelist() to consider all zones as
non-eligible due to nodemask and/or current->mems_allowed.  This was
masked in the past by sufficient retries, but since commit 682a3385e7
("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
the preferred_zoneref once, and don't iterate over the whole zonelist in
further attempts, thus the only eligible zones might be placed in the
zonelist before our starting point and we always miss them.

A previous patch fixed this problem for current->mems_allowed.  However,
cpuset changes also update the task's mempolicy nodemask.  The fix has
two parts.  We have to repeat the preferred_zoneref search when we
detect cpuset update by way of seqcount, and we have to check the
seqcount before considering OOM.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
..
kasan Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
Kconfig mm: THP page cache support for ppc64 2016-12-12 18:55:08 -08:00
Kconfig.debug PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO 2016-09-13 02:35:27 +02:00
Makefile Disable the __builtin_return_address() warning globally after all 2016-10-12 10:23:41 -07:00
backing-dev.c writeback: track if we're sleeping on progress in balance_dirty_pages() 2016-11-08 08:28:55 -07:00
balloon_compaction.c
bootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
cleancache.c
cma.c mm/cma.c: check the max limit for cma allocation 2016-11-11 08:12:37 -08:00
cma.h
cma_debug.c
compaction.c mm, compaction: allow compaction for GFP_NOFS requests 2016-12-14 16:04:07 -08:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c
filemap.c dax: fix deadlock with DAX 4k holes 2017-01-10 18:31:54 -08:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c
gup.c mm: unexport __get_user_pages_unlocked() 2016-12-14 16:04:09 -08:00
highmem.c
huge_memory.c mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-01-24 16:26:14 -08:00
hugetlb.c mm/hugetlb.c: fix reservation race when freeing surplus pages 2017-01-10 18:31:55 -08:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm: add PageWaiters indicating tasks are waiting for a page bit 2016-12-25 11:54:48 -08:00
interval_tree.c
khugepaged.c mm: get rid of __GFP_OTHER_NODE 2017-01-10 18:31:55 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: fix reference to Documentation 2016-12-12 18:55:07 -08:00
ksm.c mm,ksm: add __GFP_HIGH to the allocation in alloc_stable_node() 2016-10-07 18:46:29 -07:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c
madvise.c mm: add tlb_remove_check_page_size_change to track page size change 2016-12-12 18:55:07 -08:00
memblock.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
memcontrol.c mm, memcg: do not retry precharge charges 2017-01-24 16:26:14 -08:00
memory-failure.c mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked 2016-12-25 11:54:48 -08:00
memory.c dax: wrprotect pmd_t in dax_mapping_entry_mkclean 2017-01-10 18:31:54 -08:00
memory_hotplug.c memory_hotplug: make zone_can_shift() return a boolean value 2017-01-24 16:26:14 -08:00
mempolicy.c mm/mempolicy.c: do not put mempolicy before using its nodemask 2017-01-24 16:26:14 -08:00
mempool.c
memtest.c
migrate.c mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked 2016-12-25 11:54:48 -08:00
mincore.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mlock.c thp: fix corner case of munlock() of PTE-mapped THPs 2016-11-30 16:32:52 -08:00
mm_init.c
mmap.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mremap.c mremap: move_ptes: check pte dirty after its removal 2016-11-29 08:20:24 -08:00
msync.c
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
oom_kill.c oom: print nodemask in the oom report 2016-10-07 18:46:29 -07:00
page-writeback.c radix-tree: delete radix_tree_range_tag_if_tagged() 2016-12-14 16:04:10 -08:00
page_alloc.c mm, page_alloc: fix premature OOM when racing with cpuset mems update 2017-01-24 16:26:14 -08:00
page_counter.c
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm/page_isolation: fix typo: "paes" -> "pages" 2016-10-07 18:46:29 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2016-12-13 12:34:47 -08:00
pgtable-generic.c
process_vm_access.c mm: unexport __get_user_pages_unlocked() 2016-12-14 16:04:09 -08:00
quicklist.c
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm, rmap: handle anon_vma_prepare() common case inline 2016-12-12 18:55:08 -08:00
shmem.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
slab.c mm/slab.c: fix SLAB freelist randomization duplicate entries 2017-01-10 18:31:55 -08:00
slab.h mm, slab: maintain total slab count instead of active count 2016-12-12 18:55:07 -08:00
slab_common.c mm/slab_common.c: check kmem_create_cache flags are common 2016-12-12 18:55:06 -08:00
slob.c slub: move synchronize_sched out of slab_mutex on shrink 2016-12-12 18:55:06 -08:00
slub.c mm/slub.c: trace free objects at KERN_INFO 2017-01-24 16:26:14 -08:00
sparse-vmemmap.c
sparse.c
swap.c mm: add PageWaiters indicating tasks are waiting for a page bit 2016-12-25 11:54:48 -08:00
swap_cgroup.c
swap_state.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
swapfile.c mm: support anonymous stable page 2017-01-10 18:31:55 -08:00
truncate.c mm: Invalidate DAX radix tree entries only if appropriate 2016-12-26 20:29:24 -08:00
usercopy.c mm: usercopy: Check for module addresses 2016-09-20 16:07:39 -07:00
userfaultfd.c
util.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
vmacache.c mm: unrig VMA cache hit ratio 2016-10-07 18:46:27 -07:00
vmalloc.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
vmpressure.c
vmscan.c mm, memcg: fix the active list aging for lowmem requests when memcg is enabled 2017-01-10 18:31:55 -08:00
vmstat.c mm/vmstat: Convert to hotplug state machine 2016-12-02 00:52:35 +01:00
workingset.c mm: workingset: fix use-after-free in shadow node shrinker 2017-01-07 18:22:40 -08:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc: Convert to hotplug state machine 2016-12-02 00:52:36 +01:00
zswap.c mm/zswap: Convert pool to hotplug state machine 2016-12-02 00:52:36 +01:00