linux/mm
KAMEZAWA Hiroyuki 8c7c6e34a1 memcg: mem+swap controller core
This patch implements per cgroup limit for usage of memory+swap.  However
there are SwapCache, double counting of swap-cache and swap-entry is
avoided.

Mem+Swap controller works as following.
  - memory usage is limited by memory.limit_in_bytes.
  - memory + swap usage is limited by memory.memsw_limit_in_bytes.

This has following benefits.
  - A user can limit total resource usage of mem+swap.

    Without this, because memory resource controller doesn't take care of
    usage of swap, a process can exhaust all the swap (by memory leak.)
    We can avoid this case.

    And Swap is shared resource but it cannot be reclaimed (goes back to memory)
    until it's used. This characteristic can be trouble when the memory
    is divided into some parts by cpuset or memcg.
    Assume group A and group B.
    After some application executes, the system can be..

    Group A -- very large free memory space but occupy 99% of swap.
    Group B -- under memory shortage but cannot use swap...it's nearly full.

    Ability to set appropriate swap limit for each group is required.

Maybe someone wonder "why not swap but mem+swap ?"

  - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
    to move account from memory to swap...there is no change in usage of
    mem+swap.

    In other words, when we want to limit the usage of swap without affecting
    global LRU, mem+swap limit is better than just limiting swap.

Accounting target information is stored in swap_cgroup which is
per swap entry record.

Charge is done as following.
  map
    - charge  page and memsw.

  unmap
    - uncharge page/memsw if not SwapCache.

  swap-out (__delete_from_swap_cache)
    - uncharge page
    - record mem_cgroup information to swap_cgroup.

  swap-in (do_swap_page)
    - charged as page and memsw.
      record in swap_cgroup is cleared.
      memsw accounting is decremented.

  swap-free (swap_free())
    - if swap entry is freed, memsw is uncharged by PAGE_SIZE.

There are people work under never-swap environments and consider swap as
something bad. For such people, this mem+swap controller extension is just an
overhead.  This overhead is avoided by config or boot option.
(see Kconfig. detail is not in this patch.)

TODO:
 - maybe more optimization can be don in swap-in path. (but not very safe.)
   But we just do simple accounting at this stage.

[nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex]
[hugh@veritas.com: memswap controller core swapcache fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
..
Kconfig Remove obsolete CONFIG_RESOURCES_64BIT 2009-01-06 15:59:14 -08:00
Makefile shmem: unify regular and tiny shmem 2009-01-06 15:59:08 -08:00
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
bootmem.c bootmem: print request details before BUG_ON(them) 2009-01-06 15:59:10 -08:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap.c mm: pagecache gfp flags fix 2009-01-06 15:59:09 -08:00
filemap_xip.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
fremap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
highmem.c x86, pat: avoid highmem cache attribute aliasing 2008-08-15 17:22:57 +02:00
hugetlb.c mm: hugetlb: remove redundant `if' operation 2009-01-06 15:59:10 -08:00
internal.h mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
maccess.c
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
memcontrol.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
memory.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c
migrate.c memcg: simple migration handling 2009-01-08 08:31:04 -08:00
mincore.c
mlock.c mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: check for no mmaps in exit_mmap() 2009-01-06 15:59:10 -08:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c mm: cleanup: remove #ifdef CONFIG_MIGRATION 2009-01-06 15:59:00 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page-writeback.c mm: add dirty_background_bytes and dirty_bytes sysctls 2009-01-06 15:59:03 -08:00
page_alloc.c mm: remove CONFIG_OUT_OF_LINE_PFN_TO_PAGE 2009-01-06 15:59:10 -08:00
page_cgroup.c memcg: swap cgroup for remembering usage 2009-01-08 08:31:05 -08:00
page_io.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
shmem.c memcg: handle swap caches 2009-01-08 08:31:05 -08:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
swap_state.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
swapfile.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
thrash.c
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c mm: vmalloc make lazy unmapping configurable 2009-01-06 15:59:01 -08:00
vmscan.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30