linux/mm
Linus Torvalds be83bbf806 mmap: introduce sane default mmap limits
The internal VM "mmap()" interfaces are based on the mmap target doing
everything using page indexes rather than byte offsets, because
traditionally (ie 32-bit) we had the situation that the byte offset
didn't fit in a register.  So while the mmap virtual address was limited
by the word size of the architecture, the backing store was not.

So we're basically passing "pgoff" around as a page index, in order to
be able to describe backing store locations that are much bigger than
the word size (think files larger than 4GB etc).

But while this all makes a ton of sense conceptually, we've been dogged
by various drivers that don't really understand this, and internally
work with byte offsets, and then try to work with the page index by
turning it into a byte offset with "pgoff << PAGE_SHIFT".

Which obviously can overflow.

Adding the size of the mapping to it to get the byte offset of the end
of the backing store just exacerbates the problem, and if you then use
this overflow-prone value to check various limits of your device driver
mmap capability, you're just setting yourself up for problems.

The correct thing for drivers to do is to do their limit math in page
indices, the way the interface is designed.  Because the generic mmap
code _does_ test that the index doesn't overflow, since that's what the
mmap code really cares about.

HOWEVER.

Finding and fixing various random drivers is a sisyphean task, so let's
just see if we can just make the core mmap() code do the limiting for
us.  Realistically, the only "big" backing stores we need to care about
are regular files and block devices, both of which are known to do this
properly, and which have nice well-defined limits for how much data they
can access.

So let's special-case just those two known cases, and then limit other
random mmap users to a backing store that still fits in "unsigned long".
Realistically, that's not much of a limit at all on 64-bit, and on
32-bit architectures the only worry might be the GPU drivers, which can
have big physical address spaces.

To make it possible for drivers like that to say that they are 64-bit
clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the
file flags to allow drivers to mark their file descriptors as safe in
the full 64-bit mmap address space.

[ The timing for doing this is less than optimal, and this should really
  go in a merge window. But realistically, this needs wide testing more
  than it needs anything else, and being main-line is the only way to do
  that.

  So the earlier the better, even if it's outside the proper development
  cycle        - Linus ]

Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 09:52:01 -07:00
..
kasan slab, slub: skip unnecessary kasan_cache_shutdown() 2018-04-05 21:36:24 -07:00
Kconfig treewide: simplify Kconfig dependencies for removed archs 2018-03-26 15:55:57 +02:00
Kconfig.debug
Makefile mm/swap_slots.c: use conditional compilation 2018-04-05 21:36:24 -07:00
backing-dev.c bdi: Fix use after free bug in debugfs_remove() 2018-05-03 09:36:24 -06:00
balloon_compaction.c
bootmem.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
cleancache.c
cma.c mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE 2018-04-11 10:28:32 -07:00
cma.h
cma_debug.c
compaction.c mm/cma: remove ALLOC_CMA 2018-04-11 10:28:32 -07:00
debug.c
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() 2018-04-02 20:16:10 +02:00
failslab.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert() 2018-04-20 17:18:36 -07:00
frame_vector.c
frontswap.c
gup.c mm/gup.c: document return value 2018-04-13 17:10:27 -07:00
gup_benchmark.c mm/gup_benchmark: handle gup failures 2018-04-13 17:10:27 -07:00
highmem.c
hmm.c mm/hmm.c: remove superfluous RCU protection around radix tree lookup 2018-04-11 10:28:31 -07:00
huge_memory.c mm: enable thp migration for shmem thp 2018-04-20 17:18:35 -07:00
hugetlb.c mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct 2018-04-05 21:36:26 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm/cma: remove ALLOC_CMA 2018-04-11 10:28:32 -07:00
interval_tree.c
khugepaged.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
kmemleak-test.c
kmemleak.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
ksm.c mm/ksm.c: fix inconsistent accounting of zero pages 2018-04-11 10:28:31 -07:00
list_lru.c mm: make counting of list_lru_one::nr_items lockless 2018-04-05 21:36:27 -07:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c
memblock.c powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
memcontrol.c mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create() 2018-04-20 17:18:36 -07:00
memory-failure.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
memory.c mm: swap: unify cluster-based and vma-based swap readahead 2018-04-05 21:36:25 -07:00
memory_hotplug.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempolicy.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempool.c
memtest.c
migrate.c mm: enable thp migration for shmem thp 2018-04-20 17:18:35 -07:00
mincore.c
mlock.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
mm_init.c
mmap.c mmap: introduce sane default mmap limits 2018-05-11 09:52:01 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c sched/numa: avoid trapping faults and attempting migration of file-backed dirty pages 2018-04-11 10:28:31 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c mm/nommu: remove description of alloc_vm_area 2018-04-05 21:36:26 -07:00
oom_kill.c mm,oom_reaper: check for MMF_OOM_SKIP before complaining 2018-04-05 21:36:27 -07:00
page-writeback.c writeback: safer lock nesting 2018-04-20 17:18:35 -07:00
page_alloc.c xen, mm: allow deferred page initialization for xen pv domains 2018-04-11 10:28:38 -07:00
page_counter.c
page_ext.c
page_idle.c mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() 2018-04-05 21:36:25 -07:00
page_io.c
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm/page_owner.c: make early_page_owner_param() __init 2018-04-05 21:36:26 -07:00
page_poison.c mm/page_poison.c: make early_page_poison_param() __init 2018-04-05 21:36:26 -07:00
page_vma_mapped.c
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h
percpu-km.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu-stats.c mm: reuse DEFINE_SHOW_ATTRIBUTE() macro 2018-04-05 21:36:25 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c arch: remove obsolete architecture ports 2018-04-02 20:20:12 -07:00
pgtable-generic.c
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c
readahead.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
rmap.c mm: enable thp migration for shmem thp 2018-04-20 17:18:35 -07:00
rodata_test.c
shmem.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
slab.c mm, slab: reschedule cache_reap() on the same CPU 2018-04-13 17:10:27 -07:00
slab.h slab, slub: skip unnecessary kasan_cache_shutdown() 2018-04-05 21:36:24 -07:00
slab_common.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
slob.c
slub.c kasan, slub: fix handling of kasan_slab_free hook 2018-04-11 10:28:32 -07:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: optimize memory hotplug 2018-04-05 21:36:25 -07:00
swap.c mm/swap.c: remove @cold parameter description for release_pages() 2018-04-05 21:36:26 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: use conditional compilation 2018-04-05 21:36:24 -07:00
swap_state.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
swapfile.c mm/swapfile.c: make pointer swap_avail_heads static 2018-04-11 10:28:32 -07:00
truncate.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
usercopy.c
userfaultfd.c
util.c mm/gup.c: document return value 2018-04-13 17:10:27 -07:00
vmacache.c
vmalloc.c vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems 2018-02-21 15:35:43 -08:00
vmpressure.c
vmscan.c mm,vmscan: Allow preallocating memory for register_shrinker(). 2018-04-16 02:06:47 -04:00
vmstat.c mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES 2018-04-11 10:28:29 -07:00
workingset.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
z3fold.c mm/z3fold.c: use gfpflags_allow_blocking 2018-04-11 10:28:31 -07:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
zswap.c mm, swap, frontswap: fix THP swap if frontswap enabled 2018-02-21 15:35:43 -08:00