linux/mm
Peter Zijlstra 8b1b436dd1 mm, locking: Rework {set,clear,mm}_tlb_flush_pending()
Commit:

  af2c1401e6 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates")

added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we
can solve the same problem without this barrier.

If instead we mandate that mm_tlb_flush_pending() is used while
holding the PTL we're guaranteed to observe prior
set_tlb_flush_pending() instances.

For this to work we need to rework migrate_misplaced_transhuge_page()
a little and move the test up into do_huge_pmd_numa_page().

NOTE: this relies on flush_tlb_range() to guarantee:

   (1) it ensures that prior page table updates are visible to the
       page table walker and
   (2) it ensures that subsequent memory accesses are only made
       visible after the invalidation has completed

This is required for architectures that implement TRANSPARENT_HUGEPAGE
(arc, arm, arm64, mips, powerpc, s390, sparc, x86) or otherwise use
mm_tlb_flush_pending() in their page-table operations (arm, arm64,
x86).

This appears true for:

 - arm (DSB ISB before and after),
 - arm64 (DSB ISHST before, and DSB ISH after),
 - powerpc (PTESYNC before and after),
 - s390 and x86 TLB invalidate are serializing instructions

But I failed to understand the situation for:

 - arc, mips, sparc

Now SPARC64 is a wee bit special in that flush_tlb_range() is a no-op
and it flushes the TLBs using arch_{enter,leave}_lazy_mmu_mode()
inside the PTL. It still needs to guarantee the PTL unlock happens
_after_ the invalidate completes.

Vineet, Ralf and Dave could you guys please have a look?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:01 +02:00
..
kasan Merge branch 'linus' into locking/core, to pick up fixes 2017-08-10 12:20:53 +02:00
Kconfig mm/kasan: add support for memory hotplug 2017-07-10 16:32:33 -07:00
Kconfig.debug mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
Makefile percpu: expose statistics about percpu memory via debugfs 2017-06-20 15:31:38 -04:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c mm/balloon_compaction.c: enqueue zero page to balloon device 2017-07-10 16:32:32 -07:00
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c fs: switch ->s_uuid to uuid_t 2017-06-05 16:59:12 +02:00
cma.c cma: fix calculation of aligned offset 2017-07-10 16:32:32 -07:00
cma.h cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
cma_debug.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
compaction.c mm, compaction: skip over holes in __reset_isolation_suitable 2017-07-06 16:24:32 -07:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
debug_page_ref.c
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c
filemap.c mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache() 2017-07-10 16:32:30 -07:00
frame_vector.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00
frontswap.c
gup.c mm, gup: ensure real head page is ref-counted when using hugepages 2017-07-06 16:24:34 -07:00
highmem.c
huge_memory.c mm, locking: Rework {set,clear,mm}_tlb_flush_pending() 2017-08-10 12:29:01 +02:00
hugetlb.c mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors 2017-08-02 16:34:46 -07:00
hugetlb_cgroup.c
hwpoison-inject.c mm: hwpoison: call shake_page() unconditionally 2017-05-03 15:52:12 -07:00
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
interval_tree.c
khugepaged.c mm: make PR_SET_THP_DISABLE immediately active 2017-07-10 16:32:31 -07:00
kmemcheck.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
kmemleak-test.c
kmemleak.c mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects 2017-07-06 16:24:34 -07:00
ksm.c ksm: optimize refile of stable_node_dup at the head of the chain 2017-07-06 16:24:31 -07:00
list_lru.c mm/list_lru.c: fix list_lru_count_node() to be race free 2017-07-10 16:32:33 -07:00
maccess.c
madvise.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
memblock.c mm, memory_hotplug: move movable_node to the hotplug proper 2017-07-06 16:24:35 -07:00
memcontrol.c mm, memcg: fix potential undefined behavior in mem_cgroup_event_ratelimit() 2017-07-10 16:32:32 -07:00
memory-failure.c mm, hugetlb, soft_offline: use new_page_nodemask for soft offline migration 2017-07-10 16:32:32 -07:00
memory.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
memory_hotplug.c mm/memory-hotplug: switch locking to a percpu rwsem 2017-07-10 16:32:33 -07:00
mempolicy.c mm, migration: do not trigger OOM killer when migrating memory 2017-07-12 16:26:04 -07:00
mempool.c sched/wait: Rename wait_queue_t => wait_queue_entry_t 2017-06-20 12:18:27 +02:00
memtest.c
migrate.c mm, locking: Rework {set,clear,mm}_tlb_flush_pending() 2017-08-10 12:29:01 +02:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c mlock: fix mlock count can not decrease in race condition 2017-06-02 15:07:38 -07:00
mm_init.c
mmap.c mm: fix overflow check in expand_upwards() 2017-07-14 15:05:12 -07:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c mm: Use static initialization for "srcu" 2017-04-18 11:38:22 -07:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
mremap.c userfaultfd: non-cooperative: notify about unmap of destination during mremap 2017-08-02 16:34:46 -07:00
msync.c
nobootmem.c mm/nobootmem.c: return 0 when start_pfn equals end_pfn 2017-07-06 16:24:31 -07:00
nommu.c mm, vmalloc: use __GFP_HIGHMEM implicitly 2017-05-08 17:15:13 -07:00
oom_kill.c mm/oom_kill.c: add tracepoints for oom reaper-related events 2017-07-10 16:32:32 -07:00
page-writeback.c writeback: rework wb_[dec|inc]_stat family of functions 2017-07-12 16:26:05 -07:00
page_alloc.c mm: take memory hotplug lock within numa_zonelist_order_handler() 2017-08-02 17:16:11 -07:00
page_counter.c
page_ext.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_idle.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
page_io.c mm/page_io.c: fix oops during block io poll in swapin path 2017-08-02 17:16:11 -07:00
page_isolation.c mm: unify new_node_page and alloc_migrate_target 2017-07-10 16:32:31 -07:00
page_owner.c mm: avoid taking zone lock in pagetypeinfo_showmixed() 2017-07-10 16:32:32 -07:00
page_poison.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_vma_mapped.c mm/hugetlb: add size parameter to huge_pte_offset() 2017-07-06 16:24:34 -07:00
pagewalk.c mm/hugetlb: add size parameter to huge_pte_offset() 2017-07-06 16:24:34 -07:00
percpu-internal.h percpu: fix early calls for spinlock in pcpu_stats 2017-06-21 13:53:52 -04:00
percpu-km.c percpu: fix static checker warnings in pcpu_destroy_chunk 2017-06-29 11:23:38 -04:00
percpu-stats.c percpu: expose statistics about percpu memory via debugfs 2017-06-20 15:31:38 -04:00
percpu-vm.c percpu: fix static checker warnings in pcpu_destroy_chunk 2017-06-29 11:23:38 -04:00
percpu.c percpu: resolve err may not be initialized in pcpu_alloc 2017-06-21 12:00:45 -04:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
rodata_test.c mm: remove rodata_test_data export, add pr_fmt 2017-05-03 15:52:09 -07:00
shmem.c mm: make PR_SET_THP_DISABLE immediately active 2017-07-10 16:32:31 -07:00
slab.c mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
slab.h mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
slab_common.c mm: allow slab_nomerge to be set at build time 2017-07-06 16:24:31 -07:00
slob.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
slub.c mm: memcontrol: account slab stats per lruvec 2017-07-06 16:24:35 -07:00
sparse-vmemmap.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
sparse.c mm, memory_hotplug: do not associate hotadded memory to zones until online 2017-07-06 16:24:32 -07:00
swap.c mm: swap: provide lru_add_drain_all_cpuslocked() 2017-07-10 16:32:33 -07:00
swap_cgroup.c mm, THP, swap: delay splitting THP during swap out 2017-07-06 16:24:31 -07:00
swap_slots.c mm/swap_slots.c: don't disable preemption while taking the per-CPU cache 2017-07-10 16:32:32 -07:00
swap_state.c swap: add block io poll in swapin path 2017-07-10 16:32:30 -07:00
swapfile.c swap: add block io poll in swapin path 2017-07-10 16:32:30 -07:00
truncate.c mm/truncate.c: fix THP handling in invalidate_mapping_pages() 2017-07-10 16:32:32 -07:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
vmstat.c mm: avoid taking zone lock in pagetypeinfo_showmixed() 2017-07-10 16:32:32 -07:00
workingset.c mm: memcontrol: per-lruvec stats infrastructure 2017-07-06 16:24:35 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c
zpool.c
zsmalloc.c zram: do not free pool->size_class 2017-08-02 16:34:47 -07:00
zswap.c mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() 2017-07-06 16:24:35 -07:00