linux/kernel
Tejun Heo 2e91fa7f6d cgroup: keep zombies associated with their original cgroups
cgroup_exit() is called when a task exits and disassociates the
exiting task from its cgroups and half-attach it to the root cgroup.
This is unnecessary and undesirable.

No controller actually needs an exiting task to be disassociated with
non-root cgroups.  Both cpu and perf_event controllers update the
association to the root cgroup from their exit callbacks just to keep
consistent with the cgroup core behavior.

Also, this disassociation makes it difficult to track resources held
by zombies or determine where the zombies came from.  Currently, pids
controller is completely broken as it uncharges on exit and zombies
always escape the resource restriction.  With cgroup association being
reset on exit, fixing it is pretty painful.

There's no reason to reset cgroup membership on exit.  The zombie can
be removed from its css_set so that it doesn't show up on
"cgroup.procs" and thus can't be migrated or interfere with cgroup
removal.  It can still pin and point to the css_set so that its cgroup
membership is maintained.  This patch makes cgroup core keep zombies
associated with their cgroups at the time of exit.

* Previous patches decoupled populated_cnt tracking from css_set
  lifetime, so a dying task can be simply unlinked from its css_set
  while pinning and pointing to the css_set.  This keeps css_set
  association from task side alive while hiding it from "cgroup.procs"
  and populated_cnt tracking.  The css_set reference is dropped when
  the task_struct is freed.

* ->exit() callback no longer needs the css arguments as the
  associated css never changes once PF_EXITING is set.  Removed.

* cpu and perf_events controllers no longer need ->exit() callbacks.
  There's no reason to explicitly switch away on exit.  The final
  schedule out is enough.  The callbacks are removed.

* On traditional hierarchies, nothing changes.  "/proc/PID/cgroup"
  still reports "/" for all zombies.  On the default hierarchy,
  "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged
  to at the time of exit.  If the cgroup gets removed before the task
  is reaped, " (deleted)" is appended.

v2: Build brekage due to missing dummy cgroup_free() when
    !CONFIG_CGROUP fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
2015-10-15 16:41:53 -04:00
..
bpf bpf: fix out of bounds access in verifier log 2015-09-09 14:11:55 -07:00
configs kconfig: add xenconfig defconfig helper 2015-06-16 11:04:29 +01:00
debug debug: prevent entering debug mode on panic/exception. 2015-02-19 12:39:03 -06:00
events cgroup: keep zombies associated with their original cgroups 2015-10-15 16:41:53 -04:00
gcov gcov: add support for GCC 5.1 2015-06-30 19:44:57 -07:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:33:35 -07:00
livepatch livepatch: Improve error handling in klp_disable_func() 2015-07-14 22:48:06 +02:00
locking Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-09-05 20:34:28 -07:00
power Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block 2015-09-02 13:10:25 -07:00
printk kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
rcu rcu,locking: Privatize smp_mb__after_unlock_lock() 2015-08-04 08:49:21 -07:00
sched cgroup: keep zombies associated with their original cgroups 2015-10-15 16:41:53 -04:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:04:50 -07:00
trace Mostly this is just clean ups and micro optimizations. 2015-09-08 14:04:14 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS 2015-05-12 09:46:00 +02:00
Kconfig.preempt
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
acct.c acct: check FMODE_CAN_WRITE 2015-04-11 22:27:55 -04:00
async.c kernel/async.c: switch to pr_foo() 2014-10-09 22:26:04 -04:00
audit.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit.h Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit_watch.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
auditfilter.c audit: implement audit by executable 2015-08-06 16:17:25 -04:00
auditsc.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
backtracetest.c
bounds.c page-cgroup: get rid of NR_PCG_FLAGS 2014-08-08 15:57:18 -07:00
capability.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
cgroup.c cgroup: keep zombies associated with their original cgroups 2015-10-15 16:41:53 -04:00
cgroup_freezer.c cgroup: allow a cgroup subsystem to reject a fork 2015-07-14 17:29:23 -04:00
cgroup_pids.c cgroup: keep zombies associated with their original cgroups 2015-10-15 16:41:53 -04:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: Inherit TIF_NOHZ through forks instead of context switches 2015-05-07 12:02:51 +02:00
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c cgroup: replace cgroup_has_tasks() with cgroup_is_populated() 2015-10-15 16:41:50 -04:00
crash_dump.c crash_dump: Make is_kdump_kernel() accessible from modules 2014-08-25 15:42:19 -07:00
cred.c kernel/cred.c: remove unnecessary kdebug atomic reads 2015-09-10 13:29:01 -07:00
delayacct.c delayacct: Remove braindamaged type conversions 2014-07-23 10:18:06 -07:00
dma.c
elfcore.c
exec_domain.c Remove rest of exec domains. 2015-04-12 21:03:31 +02:00
exit.c kernel: exit: fix typo in comment 2015-08-07 13:59:49 +02:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c cgroup: keep zombies associated with their original cgroups 2015-10-15 16:41:53 -04:00
freezer.c freezer: remove obsolete comments in __thaw_task() 2014-10-21 23:44:20 +02:00
futex.c futex: Make should_fail_futex() static 2015-07-20 21:43:54 +02:00
futex_compat.c
groups.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
hung_task.c kernel/hung_task.c: change hung_task.c to use for_each_process_thread() 2015-04-15 16:35:22 -07:00
irq_work.c percpu: Convert remaining __get_cpu_var uses in 3.18-rcX 2014-10-29 11:18:18 -04:00
jump_label.c locking/static_keys: Add selftest 2015-08-03 11:34:16 +02:00
kallsyms.c kernel/kallsyms.c: use __seq_open_private() 2014-10-14 02:18:16 +02:00
kcmp.c kcmp: fix standard comparison bug 2014-09-10 15:42:12 -07:00
kexec.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kexec_core.c kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo 2015-09-10 13:29:01 -07:00
kexec_file.c kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kmod.c kmod: handle UMH_WAIT_PROC from system unbound workqueue 2015-09-10 13:29:01 -07:00
kprobes.c perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe 2015-08-04 10:16:54 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c add devm_memremap_pages 2015-08-27 19:40:58 -04:00
module-internal.h
module.c module: weaken locking assertion for oops path. 2015-07-29 06:13:22 +09:30
module_signing.c PKCS#7: Appropriately restrict authenticated attributes and content type 2015-08-12 17:01:01 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c bury struct proc_ns in fs/proc 2014-12-04 14:34:54 -05:00
padata.c padata: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
panic.c kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path 2015-06-30 19:44:57 -07:00
params.c Minor merge needed, due to function move. 2015-07-01 10:49:25 -07:00
pid.c rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() 2015-07-22 15:27:32 -07:00
pid_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-16 15:53:03 -08:00
profile.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
ptrace.c seccomp: add ptrace options for suspend/resume 2015-07-15 11:52:52 -07:00
range.c kernel: avoid overflow in cmp_range 2015-01-17 10:02:23 +13:00
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c kernel/relay.c: use kvfree() in relay_free_page_array() 2015-06-30 19:44:59 -07:00
resource.c mm: enhance region_is_ram() to region_intersects() 2015-08-10 23:07:05 -04:00
seccomp.c Merge tag 'seccomp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into next 2015-07-20 17:19:19 +10:00
signal.c signal: fix information leak in copy_siginfo_to_user 2015-08-07 04:39:40 +03:00
smp.c smp: Fix error case handling in smp_call_function_*() 2015-04-19 13:19:23 -07:00
smpboot.c smpboot: allow passing the cpumask on per-cpu thread registration 2015-09-04 16:54:41 -07:00
smpboot.h
softirq.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 15:24:03 -08:00
stacktrace.c stacktrace: introduce snprint_stack_trace for buffer output 2014-12-13 12:42:48 -08:00
stop_machine.c stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() 2015-08-03 12:21:28 +02:00
sys.c vfs: Commit to never having exectuables on proc and sysfs. 2015-07-10 10:39:25 -05:00
sys_ni.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
sysctl.c sysctl: fix int -> unsigned long assignments in INT_MIN case 2015-09-10 13:29:01 -07:00
sysctl_binary.c kernel: add panic_on_warn 2014-12-10 17:41:10 -08:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
test_kprobes.c kernel/test_kprobes.c: use current logging functions 2014-08-08 15:57:18 -07:00
torture.c rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() 2015-05-27 12:56:15 -07:00
tracepoint.c
tsacct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
uid16.c groups: Consolidate the setgroups permission checks 2014-12-05 17:19:27 -06:00
up.c
user-return-notifier.c scheduler: Replace __get_cpu_var with this_cpu_ptr 2014-08-26 13:45:45 -04:00
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2014-12-17 12:31:40 -08:00
user_namespace.c capabilities: ambient capabilities 2015-09-04 16:54:41 -07:00
utsname.c copy address of proc_ns_ops into ns_common 2014-12-04 14:34:47 -05:00
utsname_sysctl.c
watchdog.c watchdog: rename watchdog_suspend() and watchdog_resume() 2015-09-04 16:54:41 -07:00
workqueue.c Merge branch 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2015-09-02 08:02:20 -07:00
workqueue_internal.h