linux/mm
Johannes Weiner 739f79fc9d mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
    IP: test_clear_page_writeback+0x12e/0x2c0
    [...]
    RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
    Call Trace:
     <IRQ>
     end_page_writeback+0x47/0x70
     f2fs_write_end_io+0x76/0x180 [f2fs]
     bio_endio+0x9f/0x120
     blk_update_request+0xa8/0x2f0
     scsi_end_request+0x39/0x1d0
     scsi_io_completion+0x211/0x690
     scsi_finish_command+0xd9/0x120
     scsi_softirq_done+0x127/0x150
     __blk_mq_complete_request_remote+0x13/0x20
     flush_smp_call_function_queue+0x56/0x110
     generic_smp_call_function_single_interrupt+0x13/0x30
     smp_call_function_single_interrupt+0x27/0x40
     call_function_single_interrupt+0x89/0x90
    RIP: 0010:native_safe_halt+0x6/0x10

    (gdb) l *(test_clear_page_writeback+0x12e)
    0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
    614		mod_node_page_state(page_pgdat(page), idx, val);
    615		if (mem_cgroup_disabled() || !page->mem_cgroup)
    616			return;
    617		mod_memcg_state(page->mem_cgroup, idx, val);
    618		pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
    619		this_cpu_add(pn->lruvec_stat->count[idx], val);
    620	}
    621
    622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
    623							gfp_t gfp_mask,

The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback().  The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles.  But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.

It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared.  Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.

Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward.  It
is a partial revert of 62cccb8c8e ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.

Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: Bradley Bolen <bradleybolen@gmail.com>
Tested-by: Brad Bolen <bradleybolen@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
..
kasan kasan: avoid -Wmaybe-uninitialized warning 2017-08-02 16:34:46 -07:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c mm/balloon_compaction.c: don't zero ballooned pages 2017-08-10 15:54:07 -07:00
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c fs: switch ->s_uuid to uuid_t 2017-06-05 16:59:12 +02:00
cma_debug.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
cma.c cma: fix calculation of aligned offset 2017-07-10 16:32:32 -07:00
cma.h cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
compaction.c mm, compaction: skip over holes in __reset_isolation_suitable 2017-07-06 16:24:32 -07:00
debug_page_ref.c
debug.c mm: make tlb_flush_pending global 2017-08-10 15:54:07 -07:00
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache() 2017-07-10 16:32:30 -07:00
frame_vector.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00
frontswap.c
gup.c mm, gup: ensure real head page is ref-counted when using hugepages 2017-07-06 16:24:34 -07:00
highmem.c
huge_memory.c Revert "mm: numa: defer TLB flush for THP migration as long as possible" 2017-08-10 15:54:07 -07:00
hugetlb_cgroup.c
hugetlb.c userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case 2017-08-10 15:54:06 -07:00
hwpoison-inject.c mm: hwpoison: call shake_page() unconditionally 2017-05-03 15:52:12 -07:00
init-mm.c
internal.h mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
interval_tree.c
Kconfig mm/kasan: add support for memory hotplug 2017-07-10 16:32:33 -07:00
Kconfig.debug mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
khugepaged.c mm: make PR_SET_THP_DISABLE immediately active 2017-07-10 16:32:31 -07:00
kmemcheck.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
kmemleak-test.c
kmemleak.c mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects 2017-07-06 16:24:34 -07:00
ksm.c mm: fix KSM data corruption 2017-08-10 15:54:07 -07:00
list_lru.c mm/list_lru.c: fix list_lru_count_node() to be race free 2017-07-10 16:32:33 -07:00
maccess.c
madvise.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
Makefile percpu: expose statistics about percpu memory via debugfs 2017-06-20 15:31:38 -04:00
memblock.c mm, memory_hotplug: move movable_node to the hotplug proper 2017-07-06 16:24:35 -07:00
memcontrol.c mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() 2017-08-18 15:32:01 -07:00
memory_hotplug.c mm/memory-hotplug: switch locking to a percpu rwsem 2017-07-10 16:32:33 -07:00
memory-failure.c mm, hugetlb, soft_offline: use new_page_nodemask for soft offline migration 2017-07-10 16:32:32 -07:00
memory.c mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem 2017-08-10 15:54:07 -07:00
mempolicy.c mm, migration: do not trigger OOM killer when migrating memory 2017-07-12 16:26:04 -07:00
mempool.c sched/wait: Rename wait_queue_t => wait_queue_entry_t 2017-06-20 12:18:27 +02:00
memtest.c
migrate.c Revert "mm: numa: defer TLB flush for THP migration as long as possible" 2017-08-10 15:54:07 -07:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c mlock: fix mlock count can not decrease in race condition 2017-06-02 15:07:38 -07:00
mm_init.c
mmap.c mm: fix overflow check in expand_upwards() 2017-07-14 15:05:12 -07:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c mm: Use static initialization for "srcu" 2017-04-18 11:38:22 -07:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm: migrate: prevent racy access to tlb_flush_pending 2017-08-10 15:54:07 -07:00
mremap.c userfaultfd: non-cooperative: notify about unmap of destination during mremap 2017-08-02 16:34:46 -07:00
msync.c
nobootmem.c mm/nobootmem.c: return 0 when start_pfn equals end_pfn 2017-07-06 16:24:31 -07:00
nommu.c mm, vmalloc: use __GFP_HIGHMEM implicitly 2017-05-08 17:15:13 -07:00
oom_kill.c mm/oom_kill.c: add tracepoints for oom reaper-related events 2017-07-10 16:32:32 -07:00
page_alloc.c mm: ratelimit PFNs busy info message 2017-08-10 15:54:06 -07:00
page_counter.c
page_ext.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_idle.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
page_io.c mm/page_io.c: fix oops during block io poll in swapin path 2017-08-02 17:16:11 -07:00
page_isolation.c mm: unify new_node_page and alloc_migrate_target 2017-07-10 16:32:31 -07:00
page_owner.c mm: avoid taking zone lock in pagetypeinfo_showmixed() 2017-07-10 16:32:32 -07:00
page_poison.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_vma_mapped.c mm/hugetlb: add size parameter to huge_pte_offset() 2017-07-06 16:24:34 -07:00
page-writeback.c mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() 2017-08-18 15:32:01 -07:00
pagewalk.c mm/hugetlb: add size parameter to huge_pte_offset() 2017-07-06 16:24:34 -07:00
percpu-internal.h percpu: fix early calls for spinlock in pcpu_stats 2017-06-21 13:53:52 -04:00
percpu-km.c percpu: fix static checker warnings in pcpu_destroy_chunk 2017-06-29 11:23:38 -04:00
percpu-stats.c percpu: expose statistics about percpu memory via debugfs 2017-06-20 15:31:38 -04:00
percpu-vm.c percpu: fix static checker warnings in pcpu_destroy_chunk 2017-06-29 11:23:38 -04:00
percpu.c percpu: resolve err may not be initialized in pcpu_alloc 2017-06-21 12:00:45 -04:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c
readahead.c
rmap.c rmap: do not call mmu_notifier_invalidate_page() under ptl 2017-08-10 15:54:07 -07:00
rodata_test.c mm: remove rodata_test_data export, add pr_fmt 2017-05-03 15:52:09 -07:00
shmem.c mm: fix list corruptions on shmem shrinklist 2017-08-10 15:54:07 -07:00
slab_common.c mm: allow slab_nomerge to be set at build time 2017-07-06 16:24:31 -07:00
slab.c mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
slab.h mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
slob.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
slub.c mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
sparse-vmemmap.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
sparse.c mm, memory_hotplug: do not associate hotadded memory to zones until online 2017-07-06 16:24:32 -07:00
swap_cgroup.c mm, THP, swap: delay splitting THP during swap out 2017-07-06 16:24:31 -07:00
swap_slots.c mm/swap_slots.c: don't disable preemption while taking the per-CPU cache 2017-07-10 16:32:32 -07:00
swap_state.c swap: add block io poll in swapin path 2017-07-10 16:32:30 -07:00
swap.c mm: swap: provide lru_add_drain_all_cpuslocked() 2017-07-10 16:32:33 -07:00
swapfile.c swap: add block io poll in swapin path 2017-07-10 16:32:30 -07:00
truncate.c mm/truncate.c: fix THP handling in invalidate_mapping_pages() 2017-07-10 16:32:32 -07:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c mm: fix global NR_SLAB_.*CLAIMABLE counter reads 2017-08-10 15:54:06 -07:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
vmstat.c mm: avoid taking zone lock in pagetypeinfo_showmixed() 2017-07-10 16:32:32 -07:00
workingset.c mm: memcontrol: per-lruvec stats infrastructure 2017-07-06 16:24:35 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c
zpool.c
zsmalloc.c zram: do not free pool->size_class 2017-08-02 16:34:47 -07:00
zswap.c mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() 2017-07-06 16:24:35 -07:00