linux/mm
Linus Torvalds f3c64eda3e mm: avoid early COW write protect games during fork()
In commit 70e806e4e6 ("mm: Do early cow for pinned pages during fork()
for ptes") we write-protected the PTE before doing the page pinning
check, in order to avoid a race with concurrent fast-GUP pinning (which
doesn't take the mm semaphore or the page table lock).

That trick doesn't actually work - it doesn't handle memory ordering
properly, and doing so would be prohibitively expensive.

It also isn't really needed.  While we're moving in the direction of
allowing and supporting page pinning without marking the pinned area
with MADV_DONTFORK, the fact is that we've never really supported this
kind of odd "concurrent fork() and page pinning", and doing the
serialization on a pte level is just wrong.

We can add serialization with a per-mm sequence counter, so we know how
to solve that race properly, but we'll do that at a more appropriate
time.  Right now this just removes the write protect games.

It also turns out that the write protect games actually break on Power,
as reported by Aneesh Kumar:

 "Architecture like ppc64 expects set_pte_at to be not used for updating
  a valid pte. This is further explained in commit 56eecdb912 ("mm:
  Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"

and the code triggered a warning there:

  WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
  Call Trace:
    copy_present_page mm/memory.c:857 [inline]
    copy_present_pte mm/memory.c:899 [inline]
    copy_pte_range mm/memory.c:1014 [inline]
    copy_pmd_range mm/memory.c:1092 [inline]
    copy_pud_range mm/memory.c:1127 [inline]
    copy_p4d_range mm/memory.c:1150 [inline]
    copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
    dup_mmap kernel/fork.c:592 [inline]
    dup_mm+0x77c/0xab0 kernel/fork.c:1355
    copy_mm kernel/fork.c:1411 [inline]
    copy_process+0x1f00/0x2740 kernel/fork.c:2070
    _do_fork+0xc4/0x10b0 kernel/fork.c:2429

Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/
Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-08 10:11:32 -07:00
..
kasan Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
Kconfig mm/sparse: cleanup the code surrounding memory_present() 2020-08-07 11:33:27 -07:00
Kconfig.debug
Makefile mm: move lib/ioremap.c to mm/ 2020-08-07 11:33:26 -07:00
backing-dev.c writeback: remove struct bdi_writeback_congested 2020-07-08 17:05:53 -06:00
balloon_compaction.c
cleancache.c
cma.c cma: don't quit at first error when activating reserved areas 2020-08-12 10:57:57 -07:00
cma.h mm: cma: fix the name of CMA areas 2020-08-12 10:57:57 -07:00
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
compaction.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
debug.c mm, dump_page: do not crash with bad compound_mapcount() 2020-08-07 11:33:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c Documentation/mm: add descriptions for arch page table helpers 2020-08-07 11:33:23 -07:00
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c io_uring-5.9-2020-10-02 2020-10-02 14:38:10 -07:00
frame_vector.c
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup.c mm: do not rely on mm == current->mm in __get_user_pages_locked 2020-09-28 09:21:50 -07:00
gup_benchmark.c
highmem.c
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm/thp: Split huge pmds/puds if they're pinned when fork() 2020-09-27 11:21:35 -07:00
hugetlb.c mm/hugetlb: fix a race between hugetlb sysctl handlers 2020-09-05 12:14:30 -07:00
hugetlb_cgroup.c hugetlb_cgroup: convert comma to semicolon 2020-08-21 09:52:52 -07:00
hwpoison-inject.c
init-mm.c
internal.h mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
khugepaged.c mm/khugepaged.c: fix khugepaged's request size in collapse_file 2020-09-05 12:14:30 -07:00
kmemleak-test.c
kmemleak.c mm/kmemleak: silence KCSAN splats in checksum 2020-08-14 19:56:56 -07:00
ksm.c ksm: reinstate memcg charge on copied pages 2020-09-19 13:13:38 -07:00
list_lru.c mm/list_lru: fix a data race in list_lru_count_one 2020-08-14 19:56:57 -07:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm: validate pmd after splitting 2020-09-26 10:48:08 -07:00
mapping_dirty_helpers.c
memblock.c mm/memblock: expose only miminal interface to add/walk physmem 2020-07-10 15:08:09 +02:00
memcontrol.c mm: memcontrol: fix missing suffix of workingset_restore 2020-09-26 10:33:57 -07:00
memfd.c
memory-failure.c mm/migrate: introduce a standard migration target allocation function 2020-08-12 10:58:02 -07:00
memory.c mm: avoid early COW write protect games during fork() 2020-10-08 10:11:32 -07:00
memory_hotplug.c mm: don't rely on system state to detect hot-plug operations 2020-09-26 10:33:57 -07:00
mempolicy.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
mempool.c mm/mempool: fix a data race in mempool_free() 2020-08-14 19:56:57 -07:00
memremap.c memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC 2020-09-04 09:59:59 +02:00
memtest.c
migrate.c mm/migrate: correct thp migration stats 2020-09-26 10:33:57 -07:00
mincore.c
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm: remove unnecessary wrapper function do_mmap_pgoff() 2020-08-07 11:33:27 -07:00
mmu_gather.c
mmu_notifier.c mm: mmu_notifier: fix and extend kerneldoc 2020-08-12 10:57:57 -07:00
mmzone.c
mprotect.c
mremap.c mm/mremap: start addresses are properly aligned 2020-08-07 11:33:27 -07:00
msync.c
nommu.c mm/nommu.c: delete duplicated words 2020-08-12 10:57:58 -07:00
oom_kill.c mm, oom: show process exiting information in __oom_kill_process() 2020-08-12 10:57:56 -07:00
page-writeback.c mm: remove vm_total_pages 2020-08-07 11:33:28 -07:00
page_alloc.c mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs 2020-10-03 11:28:12 -07:00
page_counter.c mm/page_counter: fix various data races at memsw 2020-08-14 19:56:57 -07:00
page_ext.c
page_idle.c
page_io.c mm/page_io: mark various intentional data races 2020-08-14 19:56:56 -07:00
page_isolation.c mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-19 13:13:39 -07:00
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
pagewalk.c
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-17 17:34:39 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c
process_vm_access.c mm/gup: remove task_struct pointer for all gup code 2020-08-12 10:58:04 -07:00
ptdump.c
readahead.c
rmap.c mm/rmap: fixup copying of soft dirty and uffd ptes 2020-09-05 12:14:30 -07:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c tmpfs: restore functionality of nr_inodes=0 2020-09-19 13:13:38 -07:00
shuffle.c mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab.c mm: slab: fix potential double free in ___cache_free 2020-09-26 10:15:01 -07:00
slab.h mm: slab: rename (un)charge_slab_page() to (un)account_slab_page() 2020-08-07 11:33:25 -07:00
slab_common.c mm/slab_common.c: delete duplicated word 2020-08-12 10:57:58 -07:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm, slub: restore initial kmem_cache flags 2020-10-03 11:28:12 -07:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparse: cleanup the code surrounding memory_present() 2020-08-07 11:33:27 -07:00
swap.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized 2020-08-07 11:33:24 -07:00
swap_state.c mm/swap_state: mark various intentional data races 2020-08-14 19:56:57 -07:00
swapfile.c mm, THP, swap: fix allocating cluster for swapfile by mistake 2020-09-26 10:33:57 -07:00
truncate.c
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm: remove unnecessary wrapper function do_mmap_pgoff() 2020-08-07 11:33:27 -07:00
vmacache.c
vmalloc.c mm/vunmap: add cond_resched() in vunmap_pmd_range 2020-08-21 09:52:53 -07:00
vmpressure.c
vmscan.c mm: fix check_move_unevictable_pages() on THP 2020-09-19 13:13:38 -07:00
vmstat.c Merge branch 'simplify-do_wp_page' 2020-09-04 09:31:54 -07:00
workingset.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
z3fold.c
zbud.c
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: fix duplicated words 2020-08-12 10:57:58 -07:00
zswap.c