linux/drivers/base
Kemi Wang 3a321d2a3d mm: change the call sites of numa statistics items
Patch series "Separate NUMA statistics from zone statistics", v2.

Each page allocation updates a set of per-zone statistics with a call to
zone_statistics().  As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed.  This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).

A link to the MM summit slides:
  http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf

To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang).  The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.

With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark.  Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).

I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.

Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
  https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

   Threshold   CPU cycles    Throughput(88 threads)
      32        799         241760478
      64        640         301628829
      125       537         358906028 <==> system by default
      256       468         412397590
      512       428         450550704
      4096      399         482520943
      20000     394         489009617
      30000     395         488017817
      65533     369(-31.3%) 521661345(+45.3%) <==> with this patchset
      N/A       342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics

This patch (of 3):

In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.

E.g. cat /proc/zoneinfo
    ***Base***                           ***With this patch***
nr_free_pages 3976                         nr_free_pages 3976
nr_zone_inactive_anon 0                    nr_zone_inactive_anon 0
nr_zone_active_anon 0                      nr_zone_active_anon 0
nr_zone_inactive_file 0                    nr_zone_inactive_file 0
nr_zone_active_file 0                      nr_zone_active_file 0
nr_zone_unevictable 0                      nr_zone_unevictable 0
nr_zone_write_pending 0                    nr_zone_write_pending 0
nr_mlock     0                             nr_mlock     0
nr_page_table_pages 0                      nr_page_table_pages 0
nr_kernel_stack 0                          nr_kernel_stack 0
nr_bounce    0                             nr_bounce    0
nr_zspages   0                             nr_zspages   0
numa_hit 0                                *nr_free_cma  0*
numa_miss 0                                numa_hit     0
numa_foreign 0                             numa_miss    0
numa_interleave 0                          numa_foreign 0
numa_local   0                             numa_interleave 0
numa_other   0                             numa_local   0
*nr_free_cma 0*                            numa_other 0
    ...                                        ...
vm stats threshold: 10                     vm stats threshold: 10
    ...                                        ...

The next patch updates the numa stats counter size and threshold.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:47 -07:00
..
power Merge branch 'pm-sleep' 2017-09-04 00:06:02 +02:00
regmap regmap: regmap-w1: Fix build troubles 2017-07-17 15:53:00 +01:00
test driver core: test_async: fix up typo found by 0-day 2016-11-29 22:06:42 +01:00
Kconfig arm, arm64: factorize common cpu capacity default code 2017-06-03 19:10:09 +09:00
Makefile arm, arm64: factorize common cpu capacity default code 2017-06-03 19:10:09 +09:00
arch_topology.c base: Convert to using %pOF instead of full_name 2017-08-28 17:47:55 +02:00
attribute_container.c attribute_container: Fix typo 2016-08-31 15:13:56 +02:00
base.h driver core: make device_{add|remove}_groups() public 2017-07-22 11:59:23 +02:00
bus.c driver core: bus: Fix a potential double free 2017-08-31 18:57:30 +02:00
cacheinfo.c Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-22 09:25:45 -08:00
class.c driver core: remove class_attrs from struct class 2017-06-09 10:41:00 +02:00
component.c Merge 4.5-rc4 into driver-core-next 2016-02-14 14:29:55 -08:00
container.c ACPI / hotplug / driver core: Handle containers in a special way 2013-12-29 15:25:48 +01:00
core.c Do not disable driver and bus shutdown hook when class shutdown hook is set. 2017-08-28 18:02:46 +02:00
cpu.c CPU / PM: expose pm_qos_resume_latency for CPUs 2017-01-30 11:05:29 +01:00
dd.c initcall_debug: add deferred probe times 2017-08-03 17:48:49 -07:00
devcoredump.c driver core: devcoredump: convert to use class_groups 2016-11-29 21:12:12 +01:00
devres.c devres: add devm_alloc_percpu() 2016-11-15 22:34:25 -05:00
devtmpfs.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
dma-coherent.c dma-coherent: introduce interface for default DMA pool 2017-07-20 16:09:10 +02:00
dma-contiguous.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
dma-mapping.c dma-coherent: introduce interface for default DMA pool 2017-07-20 16:09:10 +02:00
driver.c driver core: add missing blank line after declaration 2015-03-25 14:36:30 +01:00
firmware.c
firmware_class.c Merge 4.13-rc5 into driver-core-next 2017-08-14 13:33:39 -07:00
hypervisor.c
init.c drivers: of/base: move of_init to driver_init 2015-05-26 19:55:56 -07:00
isa.c isa: Call isa_bus_init before dependent ISA bus drivers register 2016-06-17 20:47:11 -07:00
map.c drivers: base: map: Use kmalloc_array instead of kmalloc 2015-03-25 14:35:08 +01:00
memory.c mm, memory_hotplug: remove zone restrictions 2017-09-06 17:27:25 -07:00
module.c base: make module_create_drivers_dir race-free 2016-06-15 19:21:31 -07:00
node.c mm: change the call sites of numa statistics items 2017-09-08 18:26:47 -07:00
pinctrl.c driver core: fix automatic pinctrl management 2017-06-13 11:07:32 +02:00
platform-msi.c irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access 2017-06-22 18:29:17 +02:00
platform.c This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
property.c device property: Introduce fwnode_property_get_reference_args 2017-07-22 00:04:51 +02:00
soc.c base: soc: Allow early registration of a single SoC device 2017-03-29 21:43:26 +02:00
syscore.c genirq: Simplify wakeup mechanism 2014-09-01 13:48:59 +02:00
topology.c base: topology: constify attribute_group structures. 2017-08-28 17:49:22 +02:00
transport_class.c