linux/mm
Tang Chen c5320926e3 mem-hotplug: introduce movable_node boot option
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed.  So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.

Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.

But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
kernel cannot use memory in movable nodes.  This will cause NUMA
performance down.  And other users may be unhappy.

So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.

To achieve this, the movable_node boot option will control the memblock
allocation direction.  That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches.  So if movable_node boot option is set, the kernel
does the following:

1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
   top down.

Users can specify "movable_node" in kernel commandline to enable this
functionality.  For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything.  The kernel
will work as before.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:09 +09:00
..
backing-dev.c
balloon_compaction.c
bootmem.c mm: use pgdat_end_pfn() to simplify the code in others 2013-11-13 12:09:03 +09:00
bounce.c
cleancache.c
compaction.c mm/compaction.c: update comment about zone lock in isolate_freepages_block 2013-11-13 12:09:03 +09:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c mm: memcg: handle non-error OOM situations more gracefully 2013-10-16 21:35:53 -07:00
fremap.c
frontswap.c
highmem.c
huge_memory.c mm: thp: khugepaged: add policy for finding target node 2013-11-13 12:09:06 +09:00
hugetlb_cgroup.c
hugetlb.c mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages 2013-10-16 21:35:52 -07:00
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig mem-hotplug: introduce movable_node boot option 2013-11-13 12:09:09 +09:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
ksm.c ksm: remove redundant __GFP_ZERO from kcalloc 2013-11-13 12:09:02 +09:00
list_lru.c mm: list_lru: fix almost infinite loop causing effective livelock 2013-10-30 12:57:46 -07:00
maccess.c
madvise.c
Makefile
memblock.c mm/memblock.c: introduce bottom-up allocation mode 2013-11-13 12:09:08 +09:00
memcontrol.c memcg: support hierarchical memory.numa_stats 2013-11-13 12:09:06 +09:00
memory_hotplug.c mem-hotplug: introduce movable_node boot option 2013-11-13 12:09:09 +09:00
memory-failure.c mm/memory-failure.c: move set_migratetype_isolate() outside get_any_page() 2013-11-13 12:09:04 +09:00
memory.c mm: remove obsolete comments about page table lock 2013-11-13 12:09:03 +09:00
mempolicy.c mm/mempolicy: use NUMA_NO_NODE 2013-11-13 12:09:06 +09:00
mempool.c
migrate.c Merge branch 'linus' into sched/core 2013-11-01 08:24:41 +01:00
mincore.c
mlock.c
mm_init.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mmap.c mmap: arch_get_unmapped_area(): use proper mmap base for bottom up direction 2013-11-13 12:09:08 +09:00
mmu_context.c
mmu_notifier.c
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c Merge branch 'linus' into sched/core 2013-11-01 08:24:41 +01:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c mm/nobootmem.c: have __free_pages_memory() free in larger chunks. 2013-11-13 12:09:04 +09:00
nommu.c
oom_kill.c mm: memcg: handle non-error OOM situations more gracefully 2013-10-16 21:35:53 -07:00
page_alloc.c mm/page_alloc.c: remove unused marco LONG_ALIGN 2013-11-13 12:09:07 +09:00
page_cgroup.c
page_io.c
page_isolation.c
page-writeback.c writeback: fix negative bdi max pause 2013-10-16 21:35:53 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c mm/readahead.c:do_readhead(): don't check for ->readpage 2013-11-13 12:09:02 +09:00
rmap.c
shmem.c
slab_common.c
slab.c
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c mm/sparsemem: fix a bug in free_map_bootmem when CONFIG_SPARSEMEM_VMEMMAP 2013-11-13 12:09:06 +09:00
swap_state.c
swap.c
swapfile.c frontswap: enable call to invalidate area on swapoff 2013-11-13 12:09:07 +09:00
truncate.c
util.c
vmalloc.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
vmpressure.c
vmscan.c mm/vmscan.c: don't forget to free shrinker->nr_deferred 2013-10-16 21:35:52 -07:00
vmstat.c
zbud.c
zswap.c mm/zswap: avoid unnecessary page scanning 2013-11-13 12:09:08 +09:00