Go to file
Michal Hocko aaf14e40a3 mm, hugetlb: unclutter hugetlb allocation layers
Patch series "mm, hugetlb: allow proper node fallback dequeue".

While working on a hugetlb migration issue addressed in a separate
patchset[1] I have noticed that the hugetlb allocations from the
preallocated pool are quite subotimal.

 [1] //lkml.kernel.org/r/20170608074553.22152-1-mhocko@kernel.org

There is no fallback mechanism implemented and no notion of preferred
node.  I have tried to work around it but Vlastimil was right to push
back for a more robust solution.  It seems that such a solution is to
reuse zonelist approach we use for the page alloctor.

This series has 3 patches.  The first one tries to make hugetlb
allocation layers more clear.  The second one implements the zonelist
hugetlb pool allocation and introduces a preferred node semantic which
is used by the migration callbacks.  The last patch is a clean up.

This patch (of 3):

Hugetlb allocation path for fresh huge pages is unnecessarily complex
and it mixes different interfaces between layers.

__alloc_buddy_huge_page is the central place to perform a new
allocation.  It checks for the hugetlb overcommit and then relies on
__hugetlb_alloc_buddy_huge_page to invoke the page allocator.  This is
all good except that __alloc_buddy_huge_page pushes vma and address down
the callchain and so __hugetlb_alloc_buddy_huge_page has to deal with
two different allocation modes - one for memory policy and other node
specific (or to make it more obscure node non-specific) requests.

This just screams for a reorganization.

This patch pulls out all the vma specific handling up to
__alloc_buddy_huge_page_with_mpol where it belongs.
__alloc_buddy_huge_page will get nodemask argument and
__hugetlb_alloc_buddy_huge_page will become a trivial wrapper over the
page allocator.

In short:
__alloc_buddy_huge_page_with_mpol - memory policy handling
  __alloc_buddy_huge_page - overcommit handling and accounting
    __hugetlb_alloc_buddy_huge_page - page allocator layer

Also note that __hugetlb_alloc_buddy_huge_page and its cpuset retry loop
is not really needed because the page allocator already handles the
cpusets update.

Finally __hugetlb_alloc_buddy_huge_page had a special case for node
specific allocations (when no policy is applied and there is a node
given).  This has relied on __GFP_THISNODE to not fallback to a different
node.  alloc_huge_page_node is the only caller which relies on this
behavior so move the __GFP_THISNODE there.

Not only does this remove quite some code it also should make those
layers easier to follow and clear wrt responsibilities.

Link: http://lkml.kernel.org/r/20170622193034.28972-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:32 -07:00
Documentation mm, memory_hotplug: support movable_node for hotpluggable nodes 2017-07-10 16:32:31 -07:00
arch Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-09 11:21:31 -07:00
block Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 20:57:13 -07:00
certs scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances 2017-05-08 17:15:13 -07:00
crypto dmaengine updates for 4.13-rc1 2017-07-08 12:36:50 -07:00
drivers zram: use __sysfs_match_string() helper 2017-07-10 16:32:31 -07:00
firmware firmware/Makefile: force recompilation if makefile changes 2017-05-08 17:15:10 -07:00
fs mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error 2017-07-10 16:32:30 -07:00
include mm, hugetlb: unclutter hugetlb allocation layers 2017-07-10 16:32:32 -07:00
init mm: allow slab_nomerge to be set at build time 2017-07-06 16:24:31 -07:00
ipc mqueue: fix a use-after-free in sys_mq_notify() 2017-07-09 14:37:19 -07:00
kernel mm: make PR_SET_THP_DISABLE immediately active 2017-07-10 16:32:31 -07:00
lib dmaengine updates for 4.13-rc1 2017-07-08 12:36:50 -07:00
mm mm, hugetlb: unclutter hugetlb allocation layers 2017-07-10 16:32:32 -07:00
net mpls: fix uninitialized in_label var warning in mpls_getroute 2017-07-08 11:26:41 +01:00
samples bpf: fix return in load_bpf_file 2017-07-05 09:05:28 +01:00
scripts Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
security Merge branch 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-05 16:05:24 -07:00
sound main drm pull for v4.13 2017-07-09 18:48:37 -07:00
tools Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-09 10:49:47 -07:00
usr ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk. 2017-07-06 16:24:30 -07:00
virt PPC: 2017-07-06 18:38:31 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support to generate LLVM assembly files 2017-04-25 08:13:52 +09:00
.mailmap power supply and reset changes for the v4.12 series (part 2) 2017-05-12 12:02:21 -07:00
COPYING
CREDITS avr32: remove support for AVR32 architecture 2017-05-01 09:27:15 +02:00
Kbuild kbuild: Consolidate header generation from ASM offset information 2017-04-13 05:43:37 +09:00
Kconfig
MAINTAINERS main drm pull for v4.13 2017-07-09 18:48:37 -07:00
Makefile Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

README

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.