linux

Go to file

Mel Gorman 2c83362734 sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on find_idlest_group() compares a local group with each other group to select the one that is most idle. When comparing groups in different NUMA domains, a very slight imbalance is enough to select a remote NUMA node even if the runnable load on both groups is 0 or close to 0. This ignores the cost of remote accesses entirely and is a problem when selecting the CPU for a newly forked task to run on. This is problematic when a forking server is almost guaranteed to run on a remote node incurring numerous remote accesses and potentially causing automatic NUMA balancing to try migrate the task back or migrate the data to another node. Similar weirdness is observed if a basic shell command pipes output to another as each process in the pipeline is likely to start on different nodes and then get adjusted later by wake_affine(). This patch adds imbalance to remote domains when considering whether to select CPUs from remote domains. If the local domain is selected, imbalance will still be used to try select a CPU from a lower scheduler domain's group instead of stacking tasks on the same CPU. A variety of workloads and machines were tested and as expected, there is no difference on UMA. The difference on NUMA can be dramatic. This is a comparison of elapsed times running the git regression test suite. It's fork-intensive with short-lived processes: 4.15.0 4.15.0 noexit-v1r23 sdnuma-v1r23 Elapsed min 1706.06 ( 0.00%) 1435.94 ( 15.83%) Elapsed mean 1709.53 ( 0.00%) 1436.98 ( 15.94%) Elapsed stddev 2.16 ( 0.00%) 1.01 ( 53.38%) Elapsed coeffvar 0.13 ( 0.00%) 0.07 ( 44.54%) Elapsed max 1711.59 ( 0.00%) 1438.01 ( 15.98%) 4.15.0 4.15.0 noexit-v1r23 sdnuma-v1r23 User 5434.12 5188.41 System 4878.77 3467.09 Elapsed 10259.06 8624.21 That shows a considerable reduction in elapsed times. It's important to note that automatic NUMA balancing does not affect this load as processes are too short-lived. There is also a noticable impact on hackbench such as this example using processes and pipes: hackbench-process-pipes 4.15.0 4.15.0 noexit-v1r23 sdnuma-v1r23 Amean 1 1.0973 ( 0.00%) 0.9393 ( 14.40%) Amean 4 1.3427 ( 0.00%) 1.3730 ( -2.26%) Amean 7 1.4233 ( 0.00%) 1.6670 ( -17.12%) Amean 12 3.0250 ( 0.00%) 3.3013 ( -9.13%) Amean 21 9.0860 ( 0.00%) 9.5343 ( -4.93%) Amean 30 14.6547 ( 0.00%) 13.2433 ( 9.63%) Amean 48 22.5447 ( 0.00%) 20.4303 ( 9.38%) Amean 79 29.2010 ( 0.00%) 26.7853 ( 8.27%) Amean 110 36.7443 ( 0.00%) 35.8453 ( 2.45%) Amean 141 45.8533 ( 0.00%) 42.6223 ( 7.05%) Amean 172 55.1317 ( 0.00%) 50.6473 ( 8.13%) Amean 203 64.4420 ( 0.00%) 58.3957 ( 9.38%) Amean 234 73.2293 ( 0.00%) 67.1047 ( 8.36%) Amean 265 80.5220 ( 0.00%) 75.7330 ( 5.95%) Amean 296 88.7567 ( 0.00%) 82.1533 ( 7.44%) It's not a universal win as there are occasions when spreading wide and quickly is a benefit but it's more of a win than it is a loss. For other workloads, there is little difference but netperf is interesting. Without the patch, the server and client starts on different nodes but quickly get migrated due to wake_affine. Hence, the difference is overall performance is marginal but detectable: 4.15.0 4.15.0 noexit-v1r23 sdnuma-v1r23 Hmean send-64 349.09 ( 0.00%) 354.67 ( 1.60%) Hmean send-128 699.16 ( 0.00%) 702.91 ( 0.54%) Hmean send-256 1316.34 ( 0.00%) 1350.07 ( 2.56%) Hmean send-1024 5063.99 ( 0.00%) 5124.38 ( 1.19%) Hmean send-2048 9705.19 ( 0.00%) 9687.44 ( -0.18%) Hmean send-3312 14359.48 ( 0.00%) 14577.64 ( 1.52%) Hmean send-4096 16324.20 ( 0.00%) 16393.62 ( 0.43%) Hmean send-8192 26112.61 ( 0.00%) 26877.26 ( 2.93%) Hmean send-16384 37208.44 ( 0.00%) 38683.43 ( 3.96%) Hmean recv-64 349.09 ( 0.00%) 354.67 ( 1.60%) Hmean recv-128 699.16 ( 0.00%) 702.91 ( 0.54%) Hmean recv-256 1316.34 ( 0.00%) 1350.07 ( 2.56%) Hmean recv-1024 5063.99 ( 0.00%) 5124.38 ( 1.19%) Hmean recv-2048 9705.16 ( 0.00%) 9687.43 ( -0.18%) Hmean recv-3312 14359.42 ( 0.00%) 14577.59 ( 1.52%) Hmean recv-4096 16323.98 ( 0.00%) 16393.55 ( 0.43%) Hmean recv-8192 26111.85 ( 0.00%) 26876.96 ( 2.93%) Hmean recv-16384 37206.99 ( 0.00%) 38682.41 ( 3.97%) However, what is very interesting is how automatic NUMA balancing behaves. Each netperf instance runs long enough for balancing to activate: NUMA base PTE updates 4620 1473 NUMA huge PMD updates 0 0 NUMA page range updates 4620 1473 NUMA hint faults 4301 1383 NUMA hint local faults 1309 451 NUMA hint local percent 30 32 NUMA pages migrated 1335 491 AutoNUMA cost 21% 6% There is an unfortunate number of remote faults although tracing indicated that the vast majority are in shared libraries. However, the tendency to start tasks on the same node if there is capacity means that there were far fewer PTE updates and faults incurred overall. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Giovanni Gherdovich <ggherdovich@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180213133730.24064-6-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-02-21 08:49:43 +01:00
Documentation	ACPI updates for v4.16-rc2	2018-02-15 14:50:32 -08:00
LICENSES	LICENSES: Add MPL-1.1 license	2018-01-06 10:59:44 -07:00
arch	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-02-18 12:56:41 -08:00
block	blk: optimization for classic polling	2018-02-13 09:12:04 -07:00
certs	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
crypto	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	2018-02-12 08:57:21 -08:00
drivers	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-02-18 12:22:04 -08:00
firmware	kbuild: remove all dummy assignments to obj-	2017-11-18 11:46:06 +09:00
fs	for-4.16-rc1-tag	2018-02-16 09:26:18 -08:00
include	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-02-18 11:54:22 -08:00
init	membarrier: Provide core serializing command, *_SYNC_CORE	2018-02-05 21:35:03 +01:00
ipc	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
kernel	sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on	2018-02-21 08:49:43 +01:00
lib	dma-direct: comment the dma_direct_free calling convention	2018-02-12 15:59:07 +00:00
mm	mm: hide a #warning for COMPILE_TEST	2018-02-16 09:41:36 -08:00
net	virtio: bugfixes	2018-02-15 14:29:27 -08:00
samples	sample/bpf: fix erspan metadata	2018-02-06 11:32:49 -05:00
scripts	Kbuild updates for v4.16 (2nd)	2018-02-09 19:32:41 -08:00
security	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
sound	ALSA: hda/realtek: PCI quirk for Fujitsu U7x7	2018-02-14 12:02:26 +01:00
tools	perf/core improvements and fixes:	2018-02-16 09:10:09 +01:00
usr	initramfs: fix initramfs rebuilds w/ compression after disabling	2017-11-03 07:39:19 -07:00
virt	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
.cocciconfig	scripts: add Linux .cocciconfig for coccinelle	2016-07-22 12:13:39 +02:00
.get_maintainer.ignore	…
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	scripts/package: snap-pkg target	2017-12-13 00:00:18 +09:00
.mailmap	mailmap: update Mark Yao's email address	2018-01-04 16:45:09 -08:00
COPYING	…
CREDITS	MAINTAINERS: update TPM driver infrastructure changes	2017-11-09 17:58:40 -08:00
Kbuild	Kbuild updates for v4.15	2017-11-17 17:45:29 -08:00
Kconfig	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
MAINTAINERS	Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-02-14 17:02:15 -08:00
Makefile	Linux 4.16-rc2	2018-02-18 17:29:42 -08:00
README	README: add a new README file, pointing to the Documentation/	2016-10-24 08:12:35 -02:00

README

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.