linux/mm
KAMEZAWA Hiroyuki f817ed4853 memcg: move all acccounting to parent at rmdir()
This patch provides a function to move account information of a page
between mem_cgroups and rewrite force_empty to make use of this.

This moving of page_cgroup is done under
 - lru_lock of source/destination mem_cgroup is held.
 - lock_page_cgroup() is held.

Then, a routine which touches pc->mem_cgroup without lock_page_cgroup()
should confirm pc->mem_cgroup is still valid or not.  Typical code can be
following.

(while page is not under lock_page())
	mem = pc->mem_cgroup;
	mz = page_cgroup_zoneinfo(pc)
	spin_lock_irqsave(&mz->lru_lock);
	if (pc->mem_cgroup == mem)
		...../* some list handling */
	spin_unlock_irqrestore(&mz->lru_lock);

Of course, better way is
	lock_page_cgroup(pc);
	....
	unlock_page_cgroup(pc);

But you should confirm the nest of lock and avoid deadlock.

If you treats page_cgroup from mem_cgroup's LRU under mz->lru_lock,
you don't have to worry about what pc->mem_cgroup points to.
moved pages are added to head of lru, not to tail.

Expected users of this routine is:
  - force_empty (rmdir)
  - moving tasks between cgroup (for moving account information.)
  - hierarchy (maybe useful.)

force_empty(rmdir) uses this move_account and move pages to its parent.
This "move" will not cause OOM (I added "oom" parameter to try_charge().)

If the parent is busy (not enough memory), force_empty calls try_to_free_page()
and reduce usage.

Purpose of this behavior is
  - Fix "forget all" behavior of force_empty and avoid leak of accounting.
  - By "moving first, free if necessary", keep pages on memory as much as
    possible.

Adding a switch to change behavior of force_empty to
  - free first, move if necessary
  - free all, if there is mlocked/busy pages, return -EBUSY.
is under consideration. (I'll add if someone requtests.)

This patch also removes memory.force_empty file, a brutal debug-only interface.

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:04 -08:00
..
Kconfig Remove obsolete CONFIG_RESOURCES_64BIT 2009-01-06 15:59:14 -08:00
Makefile shmem: unify regular and tiny shmem 2009-01-06 15:59:08 -08:00
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
bootmem.c bootmem: print request details before BUG_ON(them) 2009-01-06 15:59:10 -08:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap.c mm: pagecache gfp flags fix 2009-01-06 15:59:09 -08:00
filemap_xip.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
fremap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
highmem.c x86, pat: avoid highmem cache attribute aliasing 2008-08-15 17:22:57 +02:00
hugetlb.c mm: hugetlb: remove redundant `if' operation 2009-01-06 15:59:10 -08:00
internal.h mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
memcontrol.c memcg: move all acccounting to parent at rmdir() 2009-01-08 08:31:04 -08:00
memory.c memcg: fix gfp_mask of callers of charge 2009-01-08 08:31:04 -08:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c memcg: simple migration handling 2009-01-08 08:31:04 -08:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: check for no mmaps in exit_mmap() 2009-01-06 15:59:10 -08:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c mm: cleanup: remove #ifdef CONFIG_MIGRATION 2009-01-06 15:59:00 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page-writeback.c mm: add dirty_background_bytes and dirty_bytes sysctls 2009-01-06 15:59:03 -08:00
page_alloc.c mm: remove CONFIG_OUT_OF_LINE_PFN_TO_PAGE 2009-01-06 15:59:10 -08:00
page_cgroup.c memcg: do not recalculate section unnecessarily in init_section_page_cgroup 2009-01-08 08:31:04 -08:00
page_io.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
shmem.c memcg: fix gfp_mask of callers of charge 2009-01-08 08:31:04 -08:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
swap_state.c mm: remove gfp_mask from add_to_swap 2009-01-06 15:59:04 -08:00
swapfile.c memcg: fix gfp_mask of callers of charge 2009-01-08 08:31:04 -08:00
thrash.c
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c mm: vmalloc make lazy unmapping configurable 2009-01-06 15:59:01 -08:00
vmscan.c mm: stop kswapd's infinite loop at high order allocation 2009-01-06 15:59:10 -08:00
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30