linux/kernel
Pavel Emelyanov cf7b708c8d Make access to task's nsproxy lighter
When someone wants to deal with some other taks's namespaces it has to lock
the task and then to get the desired namespace if the one exists.  This is
slow on read-only paths and may be impossible in some cases.

E.g.  Oleg recently noticed a race between unshare() and the (sent for
review in cgroups) pid namespaces - when the task notifies the parent it
has to know the parent's namespace, but taking the task_lock() is
impossible there - the code is under write locked tasklist lock.

On the other hand switching the namespace on task (daemonize) and releasing
the namespace (after the last task exit) is rather rare operation and we
can sacrifice its speed to solve the issues above.

The access to other task namespaces is proposed to be performed
like this:

     rcu_read_lock();
     nsproxy = task_nsproxy(tsk);
     if (nsproxy != NULL) {
             / *
               * work with the namespaces here
               * e.g. get the reference on one of them
               * /
     } / *
         * NULL task_nsproxy() means that this task is
         * almost dead (zombie)
         * /
     rcu_read_unlock();

This patch has passed the review by Eric and Oleg :) and,
of course, tested.

[clg@fr.ibm.com: fix unshare()]
[ebiederm@xmission.com: Update get_net_ns_by_pid]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:37 -07:00
..
irq Compile handle_percpu_irq even for uniprocessor kernels 2007-10-17 08:43:00 -07:00
power Hibernation: Enter platform hibernation state in a consistent way 2007-10-18 14:37:20 -07:00
time x86: C1E late detection fix. Really switch off lapic timer 2007-10-17 20:15:13 +02:00
.gitignore
acct.c whitespace fixes: process accounting 2007-10-18 14:37:24 -07:00
audit.c whitespace fixes: system auditing 2007-10-18 14:37:25 -07:00
audit.h
auditfilter.c whitespace fixes: audit filtering 2007-10-18 14:37:24 -07:00
auditsc.c whitespace fixes: syscall auditing 2007-10-18 14:37:25 -07:00
capability.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c Add cgroupstats 2007-10-19 11:53:36 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c
cpu_acct.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
cpu.c cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE 2007-10-18 14:37:21 -07:00
cpuset.c Task Control Groups: make cpusets a client of cgroups 2007-10-19 11:53:36 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c Make access to task's nsproxy lighter 2007-10-19 11:53:37 -07:00
extable.c
fork.c Make access to task's nsproxy lighter 2007-10-19 11:53:37 -07:00
futex_compat.c
futex.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
hrtimer.c hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier 2007-10-18 22:54:18 +02:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c
Kconfig.hz
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
kfifo.c
kmod.c
kprobes.c
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c
latency.c
lockdep_internals.h
lockdep_proc.c
lockdep.c
Makefile cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
module.c whitespace fixes: module loading 2007-10-18 14:37:25 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c Make access to task's nsproxy lighter 2007-10-19 11:53:37 -07:00
panic.c whitespace fixes: panic handling 2007-10-18 14:37:25 -07:00
params.c param_sysfs_builtin memchr argument fix 2007-10-18 14:37:21 -07:00
pid.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
posix-cpu-timers.c
posix-timers.c hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier 2007-10-18 22:54:18 +02:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c make kernel/profile.c:time_hook static 2007-10-17 08:42:55 -07:00
ptrace.c
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c
rtmutex_common.h
rtmutex-debug.c kernel/rtmutex-debug.c: cleanups 2007-10-17 08:42:50 -07:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_debug.c sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched_fair.c
sched_idletask.c
sched_rt.c
sched_stats.h sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
seccomp.c
signal.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
softirq.c
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c kernel/sys_ni.c: add dummy sys_ni_syscall() prototype 2007-10-17 08:42:55 -07:00
sys.c pid namespaces: round up the API 2007-10-19 11:53:37 -07:00
sysctl_check.c V3 file capabilities: alter behavior of cap_setpcap 2007-10-18 14:37:24 -07:00
sysctl.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
taskstats.c Add cgroupstats 2007-10-19 11:53:36 -07:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user_namespace.c
user.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched 2007-10-17 09:11:18 -07:00
utsname_sysctl.c
utsname.c
wait.c
workqueue.c