linux/drivers
Kemi Wang 3a321d2a3d mm: change the call sites of numa statistics items
Patch series "Separate NUMA statistics from zone statistics", v2.

Each page allocation updates a set of per-zone statistics with a call to
zone_statistics().  As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed.  This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).

A link to the MM summit slides:
  http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf

To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang).  The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.

With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark.  Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).

I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.

Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
  https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

   Threshold   CPU cycles    Throughput(88 threads)
      32        799         241760478
      64        640         301628829
      125       537         358906028 <==> system by default
      256       468         412397590
      512       428         450550704
      4096      399         482520943
      20000     394         489009617
      30000     395         488017817
      65533     369(-31.3%) 521661345(+45.3%) <==> with this patchset
      N/A       342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics

This patch (of 3):

In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.

E.g. cat /proc/zoneinfo
    ***Base***                           ***With this patch***
nr_free_pages 3976                         nr_free_pages 3976
nr_zone_inactive_anon 0                    nr_zone_inactive_anon 0
nr_zone_active_anon 0                      nr_zone_active_anon 0
nr_zone_inactive_file 0                    nr_zone_inactive_file 0
nr_zone_active_file 0                      nr_zone_active_file 0
nr_zone_unevictable 0                      nr_zone_unevictable 0
nr_zone_write_pending 0                    nr_zone_write_pending 0
nr_mlock     0                             nr_mlock     0
nr_page_table_pages 0                      nr_page_table_pages 0
nr_kernel_stack 0                          nr_kernel_stack 0
nr_bounce    0                             nr_bounce    0
nr_zspages   0                             nr_zspages   0
numa_hit 0                                *nr_free_cma  0*
numa_miss 0                                numa_hit     0
numa_foreign 0                             numa_miss    0
numa_interleave 0                          numa_foreign 0
numa_local   0                             numa_interleave 0
numa_other   0                             numa_local   0
*nr_free_cma 0*                            numa_other 0
    ...                                        ...
vm stats threshold: 10                     vm stats threshold: 10
    ...                                        ...

The next patch updates the numa stats counter size and threshold.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:47 -07:00
..
accessibility
acpi Device properties framework updates for v4.14-rc1 2017-09-05 12:50:00 -07:00
amba
android ANDROID: binder: don't queue async transactions to thread. 2017-09-01 09:22:50 +02:00
ata Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-09-06 22:41:21 -07:00
atm
auxdisplay
base mm: change the call sites of numa statistics items 2017-09-08 18:26:47 -07:00
bcma
block SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
bluetooth
bus
cdrom
char Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-09-06 15:17:17 -07:00
clk clk: sunxi-ng: Provide a default reset hook 2017-08-30 15:03:52 +02:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 13:06:34 -07:00
connector
cpufreq ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
cpuidle Power management updates for v4.14-rc1 2017-09-05 12:19:08 -07:00
crypto dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dax
dca
devfreq PM / devfreq: Fix memory leak when fail to register device 2017-08-28 10:31:08 +09:00
dio
dma dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dma-buf
edac EDAC, mce_amd: Get rid of local var in amd_filter_mce() 2017-08-21 17:59:38 +02:00
eisa
extcon extcon: max77693: Allow MHL attach notifier 2017-08-25 09:32:27 +09:00
firewire
firmware Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:42:35 -07:00
fmc drivers/fmc: carrier can program FPGA on registration 2017-08-28 16:24:22 +02:00
fpga
fsi drivers/fsi/scom: Remove reset before every putscom 2017-08-28 17:15:16 +02:00
gpio - New Drivers 2017-09-07 13:51:13 -07:00
gpu - For the randstruct plugin, enable automatic randomization of structures 2017-09-07 20:30:19 -07:00
hid media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
hsi
hv Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:25:15 -07:00
hwmon Merge branches 'ib-mfd-arm-i2c-4.14', 'ib-mfd-arm-usb-video-4.14', 'ib-mfd-hwmon-4.14', 'ib-mfd-iio-pwm-4.14', 'ib-mfd-input-rtc-4.14', 'ib-mfd-many-4.14' and 'ib-mfd-pinctrl-regulator-4.14' into ibs-for-mfd-merged 2017-09-05 08:45:36 +01:00
hwspinlock
hwtracing stm class / intel_th: Updates for 4.14 2017-08-28 16:58:19 +02:00
i2c i2c: designware: Round down ACPI provided clk to nearest supported clk 2017-08-31 20:27:39 +02:00
ide Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
idle Power management updates for v4.14-rc1 2017-09-05 12:19:08 -07:00
iio - New Drivers 2017-09-07 13:51:13 -07:00
infiniband RDMA/netlink: clean up message validity array initializer 2017-09-08 10:17:20 -07:00
input - New Drivers 2017-09-07 13:51:13 -07:00
iommu Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 12:21:28 -07:00
ipack
irqchip Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 13:08:27 -07:00
isdn
leds LED updates for 4.14 2017-09-07 14:33:13 -07:00
lightnvm
macintosh powerpc/macintosh: constify wf_sensor_ops structures 2017-09-01 16:42:54 +10:00
mailbox Just behavorial changes to a controller driver: 2017-09-07 13:23:37 -07:00
mcb Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
md Merge tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-09-07 12:41:48 -07:00
media media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
memory mfd: syscon: atmel-smc: Add helper to retrieve register layout 2017-09-05 08:46:01 +01:00
memstick
message scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough 2017-08-29 21:51:45 -04:00
mfd - New Drivers 2017-09-07 13:51:13 -07:00
misc powerpc updates for 4.14 2017-09-07 10:15:40 -07:00
mmc mmc: renesas_sdhi: Add r8a7743/5 support 2017-09-01 15:31:01 +02:00
mtd mfd: syscon: atmel-smc: Add helper to retrieve register layout 2017-09-05 08:46:01 +01:00
mux mux: make device_type const 2017-08-29 13:46:35 +02:00
net - For the randstruct plugin, enable automatic randomization of structures 2017-09-07 20:30:19 -07:00
nfc
ntb
nubus
nvdimm Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nvme Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nvmem nvmem: core: remove unneeded NULL check 2017-08-28 17:33:23 +02:00
of DeviceTree updates for 4.14: 2017-09-07 14:43:33 -07:00
oprofile
parisc parisc: Fix up devices below a PCI-PCI MegaRAID controller bridge 2017-08-24 18:46:44 +02:00
parport Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
pci Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:25:15 -07:00
pcmcia
perf
phy - New Drivers 2017-09-07 13:51:13 -07:00
pinctrl - New Drivers 2017-09-07 13:51:13 -07:00
platform Merge branch 'pm-sleep' 2017-09-04 00:06:02 +02:00
pnp
power - New Drivers 2017-09-07 13:51:13 -07:00
powercap
pps
ps3
ptp ptp: make ptp_clock_info const 2017-08-22 11:04:51 -07:00
pwm Merge branches 'ib-mfd-arm-i2c-4.14', 'ib-mfd-arm-usb-video-4.14', 'ib-mfd-hwmon-4.14', 'ib-mfd-iio-pwm-4.14', 'ib-mfd-input-rtc-4.14', 'ib-mfd-many-4.14' and 'ib-mfd-pinctrl-regulator-4.14' into ibs-for-mfd-merged 2017-09-05 08:45:36 +01:00
rapidio
ras
regulator - New Drivers 2017-09-07 13:51:13 -07:00
remoteproc
reset
rpmsg
rtc Merge branches 'ib-mfd-arm-i2c-4.14', 'ib-mfd-arm-usb-video-4.14', 'ib-mfd-hwmon-4.14', 'ib-mfd-iio-pwm-4.14', 'ib-mfd-input-rtc-4.14', 'ib-mfd-many-4.14' and 'ib-mfd-pinctrl-regulator-4.14' into ibs-for-mfd-merged 2017-09-05 08:45:36 +01:00
s390 SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
sbus
scsi SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
sfi
sh
sn
soc soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create 2017-08-21 09:19:50 +02:00
spi ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
spmi spmi: pmic-arb: Move the ownership check to irq_chip callback 2017-08-28 13:52:22 +02:00
ssb
staging SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
target Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
tc
tee
thermal
thunderbolt ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
tty powerpc updates for 4.14 2017-09-07 10:15:40 -07:00
uio
usb SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
uwb
vfio
vhost vhost_net: correctly check tx avail during rx busy polling 2017-09-05 14:47:32 -07:00
video - Fix-ups 2017-09-07 13:55:16 -07:00
virt virt: Convert to using %pOF instead of full_name 2017-08-29 08:52:51 -05:00
virtio SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
vlynq
vme
w1 drivers: w1: add hwmon temp support for w1_therm 2017-08-31 18:50:14 +02:00
watchdog mfd: twl: Move header file out of I2C realm 2017-09-04 14:41:02 +01:00
xen xen: fixes and features for 4.14 2017-09-07 10:24:21 -07:00
zorro
Kconfig
Makefile x86/lguest: Remove lguest support 2017-08-24 09:57:28 +02:00