linux/mm
Tang Chen c5320926e3 mem-hotplug: introduce movable_node boot option
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed.  So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
kernel cannot use memory in movable nodes.  This will cause NUMA
performance down.  And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.

To achieve this, the movable_node boot option will control the memblock
allocation direction.  That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches.  So if movable_node boot option is set, the kernel
does the following:

1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
   top down.

Users can specify "movable_node" in kernel commandline to enable this
functionality.  For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything.  The kernel
will work as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:09 +09:00
..
Kconfig mem-hotplug: introduce movable_node boot option 2013-11-13 12:09:09 +09:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
Makefile list: add a new LRU list type 2013-09-10 18:56:30 -04:00
backing-dev.c mm/backing-dev.c: check user buffer length before copying data to the related user buffer 2013-09-11 15:58:03 -07:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm: use pgdat_end_pfn() to simplify the code in others 2013-11-13 12:09:03 +09:00
bounce.c mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored 2013-09-30 14:31:02 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c mm/compaction.c: update comment about zone lock in isolate_freepages_block 2013-11-13 12:09:03 +09:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c mm: memcg: handle non-error OOM situations more gracefully 2013-10-16 21:35:53 -07:00
filemap_xip.c lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
fremap.c mm: save soft-dirty bits on file pages 2013-08-13 17:57:48 -07:00
frontswap.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c mm: thp: khugepaged: add policy for finding target node 2013-11-13 12:09:06 +09:00
hugetlb.c mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages 2013-10-16 21:35:52 -07:00
hugetlb_cgroup.c cgroup: pass around cgroup_subsys_state instead of cgroup in file methods 2013-08-08 20:11:24 -04:00
hwpoison-inject.c mm/hwpoison: fix the lack of one reference count against poisoned page 2013-09-30 14:31:03 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: vmscan: fix do_try_to_free_pages() livelock 2013-09-11 15:58:01 -07:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
ksm.c ksm: remove redundant __GFP_ZERO from kcalloc 2013-11-13 12:09:02 +09:00
list_lru.c mm: list_lru: fix almost infinite loop causing effective livelock 2013-10-30 12:57:46 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood 2013-09-30 14:31:02 -07:00
memblock.c mm/memblock.c: introduce bottom-up allocation mode 2013-11-13 12:09:08 +09:00
memcontrol.c memcg: support hierarchical memory.numa_stats 2013-11-13 12:09:06 +09:00
memory-failure.c mm/memory-failure.c: move set_migratetype_isolate() outside get_any_page() 2013-11-13 12:09:04 +09:00
memory.c mm: remove obsolete comments about page table lock 2013-11-13 12:09:03 +09:00
memory_hotplug.c mem-hotplug: introduce movable_node boot option 2013-11-13 12:09:09 +09:00
mempolicy.c mm/mempolicy: use NUMA_NO_NODE 2013-11-13 12:09:06 +09:00
mempool.c mm/mempool.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-09-11 15:58:14 -07:00
migrate.c Merge branch 'linus' into sched/core 2013-11-01 08:24:41 +01:00
mincore.c swap: make each swap partition have one address_space 2013-02-23 17:50:17 -08:00
mlock.c mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration 2013-09-30 14:31:02 -07:00
mm_init.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mmap.c mmap: arch_get_unmapped_area(): use proper mmap base for bottom up direction 2013-11-13 12:09:08 +09:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c treewide: relase -> release 2013-06-28 14:34:33 +02:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c Merge branch 'linus' into sched/core 2013-11-01 08:24:41 +01:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c mm/nobootmem.c: have __free_pages_memory() free in larger chunks. 2013-11-13 12:09:04 +09:00
nommu.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
oom_kill.c mm: memcg: handle non-error OOM situations more gracefully 2013-10-16 21:35:53 -07:00
page-writeback.c writeback: fix negative bdi max pause 2013-10-16 21:35:53 -07:00
page_alloc.c mm/page_alloc.c: remove unused marco LONG_ALIGN 2013-11-13 12:09:07 +09:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c aio: Kill aio_rw_vect_retry() 2013-07-30 11:53:12 -04:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c mm, percpu: Make sure percpu_alloc early parameter has an argument 2012-12-02 06:23:04 -08:00
pgtable-generic.c mm: move pgtable related functions to right place 2013-09-11 15:57:30 -07:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm/readahead.c:do_readhead(): don't check for ->readpage 2013-11-13 12:09:02 +09:00
rmap.c thp: account anon transparent huge pages into NR_ANON_PAGES 2013-09-12 15:38:03 -07:00
shmem.c initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled 2013-09-11 15:59:37 -07:00
slab.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
slab.h memcg: check that kmem_cache has memcg_params before accessing it 2013-08-28 19:26:38 -07:00
slab_common.c slab_common: Do not check for duplicate slab names 2013-09-28 09:47:41 +03:00
slob.c mm/sl[aou]b: Move kmallocXXX functions to common code 2013-09-04 20:51:33 +03:00
slub.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-09-15 07:15:06 -04:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm/sparsemem: fix a bug in free_map_bootmem when CONFIG_SPARSEMEM_VMEMMAP 2013-11-13 12:09:06 +09:00
swap.c mm: make lru_add_drain_all() selective 2013-09-12 15:38:02 -07:00
swap_state.c lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt 2013-09-11 15:59:36 -07:00
swapfile.c frontswap: enable call to invalidate area on swapoff 2013-11-13 12:09:07 +09:00
truncate.c truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
util.c swap: clean-up #ifdef in page_mapping() 2013-09-11 15:57:31 -07:00
vmalloc.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
vmpressure.c Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-09-03 18:25:03 -07:00
vmscan.c mm/vmscan.c: don't forget to free shrinker->nr_deferred 2013-10-16 21:35:52 -07:00
vmstat.c mm: vmscan: fix do_try_to_free_pages() livelock 2013-09-11 15:58:01 -07:00
zbud.c mm/zbud: fix some trivial typos in comments 2013-09-11 15:57:35 -07:00
zswap.c mm/zswap: avoid unnecessary page scanning 2013-11-13 12:09:08 +09:00