linux/kernel
Konstantin Khlebnikov e9714acf8c mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas
Currently the kernel sets mm->exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm->exe_file to NULL.  Plus it
resets mm->exe_file at last mmput() when mm->mm_users drops to zero.

VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
mm->exe_file and mmaps executable vmas with this file, they hold
mm->exe_file while task is running.

comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:

> The kernel implements readlink of /proc/pid/exe by getting the file from
> the first executable VMA.  Then the path to the file is reconstructed and
> reported as the result.
>
> Because of the VMA walk the code is slightly different on nommu systems.
> This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
> walking the VMAs to find the first executable file-backed VMA we store a
> reference to the exec'd file in the mm_struct.
>
> That reference would prevent the filesystem holding the executable file
> from being unmounted even after unmapping the VMAs.  So we track the number
> of VM_EXECUTABLE VMAs and drop the new reference when the last one is
> unmapped.  This avoids pinning the mounted filesystem.

exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.

Seems like currently nobody depends on this behaviour.  We can try to
remove this logic and keep mm->exe_file until final mmput().

mm->exe_file is still protected with mm->mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
task can change its mm->exe_file and unpin mountpoint explicitly.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
..
debug kdb: Implement disable_nmi command 2012-09-26 13:42:25 -07:00
events Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
gcov
irq genirq: Export dummy_irq_chip 2012-08-21 16:14:23 +02:00
power Merge branch 'pm-qos' 2012-09-17 20:25:51 +02:00
sched Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:43:39 -07:00
time Power management updates for 3.7-rc1 2012-10-02 18:32:35 -07:00
trace Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 2012-10-07 21:04:56 +09:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt
Makefile Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:28:49 -07:00
acct.c userns: Convert bsd process accounting to use kuid and kgid where appropriate 2012-09-18 01:01:33 -07:00
async.c [SCSI] async: make async_synchronize_full() flush all work regardless of domain 2012-07-20 09:07:37 +01:00
audit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-10-02 13:38:27 -07:00
audit.h userns: Convert audit to work with user namespaces enabled 2012-09-18 01:00:26 -07:00
audit_tree.c audit: clean up refcounting in audit-tree 2012-08-15 12:55:22 +02:00
audit_watch.c userns: Convert the audit loginuid to be a kuid 2012-09-17 18:08:54 -07:00
auditfilter.c userns: Convert the audit loginuid to be a kuid 2012-09-17 18:08:54 -07:00
auditsc.c mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file 2012-10-09 16:22:18 +09:00
backtracetest.c
bounds.c
capability.c
cgroup.c Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-10-02 10:52:28 -07:00
cgroup_freezer.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
compat.c
configs.c
cpu.c CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() 2012-10-09 16:22:15 +09:00
cpu_pm.c
cpuset.c cpusets: Remove/update outdated comments 2012-07-24 13:53:28 +02:00
crash_dump.c
cred.c userns: Make credential debugging user namespace safe. 2012-08-23 22:54:18 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
extable.c
fork.c mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas 2012-10-09 16:22:18 +09:00
freezer.c
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-07-24 16:02:57 +02:00
futex_compat.c
groups.c
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c
irq_work.c
itimer.c
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c
kcmp.c
kexec.c kdump: remove unneeded include 2012-10-06 03:05:19 +09:00
kfifo.c
kmod.c kmod: avoid deadlock from recursive kmod call 2012-07-30 17:25:20 -07:00
kprobes.c kprobes/x86: Fix to support jprobes on ftrace-based kprobe 2012-09-13 22:52:11 -04:00
ksysfs.c
kthread.c kthread: Implement park/unpark facility 2012-08-13 17:01:06 +02:00
latencytop.c
lglock.c
lockdep.c lockdep: Check if nested lock is actually held 2012-09-13 17:00:44 +02:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
module.c
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c
pid.c net ip6 flowlabel: Make owner a union of struct pid * and kuid_t 2012-08-14 21:49:25 -07:00
pid_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-10-02 11:11:09 -07:00
posix-cpu-timers.c
posix-timers.c
printk.c printk: Fix calculation of length used to discard records 2012-08-12 21:25:50 +03:00
profile.c
ptrace.c ptrace: mark __ptrace_may_access() static 2012-08-03 14:47:17 +10:00
range.c
rcu.h
rcupdate.c rcu: Add PROVE_RCU_DELAY to provoke difficult races 2012-09-23 07:42:49 -07:00
rcutiny.c rcu: Move TINY_RCU quiescent state out of extended quiescent state 2012-09-23 07:42:52 -07:00
rcutiny_plugin.h rcu: Move TINY_PREEMPT_RCU away from raw_local_irq_save() 2012-09-23 07:42:51 -07:00
rcutorture.c rcu: Prevent initialization race in rcutorture kthreads 2012-09-23 07:42:23 -07:00
rcutree.c rcu: Apply micro-optimization and int/bool fixes to RCU's idle handling 2012-09-26 15:47:18 +02:00
rcutree.h rcu: Ignore userspace extended quiescent state by default 2012-09-26 15:47:01 +02:00
rcutree_plugin.h rcu: Make RCU_FAST_NO_HZ handle adaptive ticks 2012-09-26 15:44:02 +02:00
rcutree_trace.c Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b 2012-09-25 10:01:45 -07:00
relay.c
res_counter.c
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
seccomp.c
semaphore.c
signal.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
smp.c
smpboot.c hotplug: Fix UP bug in smpboot hotplug code 2012-08-13 17:01:07 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:43:39 -07:00
spinlock.c
srcu.c workqueue: deprecate system_nrt[_freezable]_wq 2012-08-20 14:51:24 -07:00
stacktrace.c
stop_machine.c
sys.c poweroff: fix bug in orderly_poweroff() 2012-10-06 03:04:48 +09:00
sys_ni.c
sysctl.c Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry 2012-10-09 16:22:14 +09:00
sysctl_binary.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:45:16 -07:00
tracepoint.c
tsacct.c userns: Convert taskstats to handle the user and pid namespaces. 2012-09-18 01:01:32 -07:00
uid16.c
up.c
user-return-notifier.c
user.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
user_namespace.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
utsname.c
utsname_sysctl.c
wait.c
watchdog.c watchdog: Use hotplug thread infrastructure 2012-08-13 17:01:07 +02:00
workqueue.c Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2012-10-02 09:54:49 -07:00
workqueue_sched.h