linux/mm
Tim Chen 69980e3175 memcg: gix memory accounting scalability in shrink_page_list
I noticed in a multi-process parallel files reading benchmark I ran on a 8
socket machine, throughput slowed down by a factor of 8 when I ran the
benchmark within a cgroup container.  I traced the problem to the
following code path (see below) when we are trying to reclaim memory from
file cache.  The res_counter_uncharge function is called on every page
that's reclaimed and created heavy lock contention.  The patch below
allows the reclaimed pages to be uncharged from the resource counter in
batch and recovered the regression.

Tim

     40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
                      |
                      --- _raw_spin_lock
                         |
                         |--92.61%-- res_counter_uncharge
                         |          |
                         |          |--100.00%-- __mem_cgroup_uncharge_common
                         |          |          |
                         |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
                         |          |          |          __remove_mapping
                         |          |          |          shrink_page_list
                         |          |          |          shrink_inactive_list
                         |          |          |          shrink_mem_cgroup_zone
                         |          |          |          shrink_zone
                         |          |          |          do_try_to_free_pages
                         |          |          |          try_to_free_pages
                         |          |          |          __alloc_pages_nodemask
                         |          |          |          alloc_pages_current

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:49 -07:00
..
backing-dev.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
bootmem.c bootmem: make ___alloc_bootmem_node_nopanic() really nopanic 2012-07-17 16:21:29 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: have order > 0 compaction start off where it left 2012-07-31 18:42:43 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c mm, fadvise: don't return -EINVAL when filesystem cannot implement fadvise() 2012-07-31 18:42:42 -07:00
failslab.c
filemap_xip.c fs: introduce inode operation ->update_time 2012-06-01 12:07:25 -04:00
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
fremap.c
frontswap.c mm/frontswap: cleanup doc and comment error 2012-07-23 11:16:20 -04:00
highmem.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
huge_memory.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hugetlb.c hugetlb/cgroup: assign the page hugetlb cgroup when we move the page to active list. 2012-07-31 18:42:41 -07:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c
internal.h netvm: allow skb allocation to use PFMEMALLOC reserves 2012-07-31 18:42:46 -07:00
Kconfig mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
maccess.c
madvise.c mm: Hold a file reference in madvise_remove 2012-07-06 10:34:38 -07:00
Makefile mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
memblock.c mm/memblock.c:memblock_double_array(): cosmetic cleanups 2012-07-31 18:42:41 -07:00
memcontrol.c memcg: add mem_cgroup_from_css() helper 2012-07-31 18:42:49 -07:00
memory_hotplug.c mm/hotplug: free zone->pageset when a zone becomes empty 2012-07-31 18:42:44 -07:00
memory-failure.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
memory.c mm/memory.c:print_vma_addr(): call up_read(&mm->mmap_sem) directly 2012-07-31 18:42:43 -07:00
mempolicy.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-07-30 11:32:24 -07:00
mempool.c
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c mm: account the total_vm in the vm_stat_account() 2012-07-31 18:42:39 -07:00
mmu_context.c
mmu_notifier.c mm: mmu_notifier: fix freed page still mapped in secondary MMU 2012-07-31 18:42:49 -07:00
mmzone.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
mprotect.c
mremap.c mm: account the total_vm in the vm_stat_account() 2012-07-31 18:42:39 -07:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c nommu: fix compilation of nommu.c 2012-06-04 17:17:31 -04:00
oom_kill.c mm, memcg: move all oom handling to memcontrol.c 2012-07-31 18:42:45 -07:00
page_alloc.c mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage 2012-07-31 18:42:46 -07:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: fix kswapd looping forever problem 2012-07-31 18:42:45 -07:00
page-writeback.c writeback: Fix some comment errors 2012-06-09 19:54:47 +08:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c
readahead.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
slab_common.c mm: Fix build warning in kmem_cache_create() 2012-07-30 13:15:40 +03:00
slab.c mm: micro-optimise slab to avoid a function call 2012-07-31 18:42:46 -07:00
slab.h mm, sl[aou]b: Use a common mutex definition 2012-07-09 12:13:41 +03:00
slob.c slob: Fix early boot kernel crash 2012-07-12 10:13:22 +03:00
slub.c mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks 2012-07-31 18:42:45 -07:00
sparse-vmemmap.c
sparse.c mm/sparse: remove index_init_lock 2012-07-31 18:42:49 -07:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swap.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
swapfile.c mm: swapfile: clean up unuse_pte race handling 2012-07-31 18:42:48 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c new helper: vm_mmap_pgoff() 2012-06-01 10:37:18 -04:00
vmalloc.c mm: make vb_alloc() more foolproof 2012-07-31 18:42:39 -07:00
vmscan.c memcg: gix memory accounting scalability in shrink_page_list 2012-07-31 18:42:49 -07:00
vmstat.c mm: account for the number of times direct reclaimers get throttled 2012-07-31 18:42:46 -07:00