linux/kernel
Oleg Nesterov d80e731eca epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24 11:42:50 -08:00
..
debug module: struct module_ref should contains long fields 2012-01-13 09:32:14 +10:30
events perf: Fix double start/stop in x86_pmu_start() 2012-02-07 16:58:56 +01:00
gcov
irq module_param: make bool parameters really bool (core code) 2012-01-13 09:32:18 +10:30
power PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails 2012-02-04 22:23:05 +01:00
sched sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW 2012-01-27 12:49:41 +01:00
time Merge branch 'rcu/fixes-for-v3.2' into rcu/urgent 2012-01-16 09:41:18 -08:00
trace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-15 11:26:35 -08:00
.gitignore
acct.c Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-01-08 12:19:57 -08:00
async.c kernel/async: remove redundant declaration. 2012-01-13 09:32:18 +10:30
audit_tree.c
audit_watch.c
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit 2012-01-17 16:41:31 -08:00
audit.h audit: remove AUDIT_SETUP_CONTEXT as it isn't used 2012-01-17 16:16:57 -05:00
auditfilter.c audit: allow interfield comparison in audit rules 2012-01-17 16:17:01 -05:00
auditsc.c kernel-doc: fix new warnings in auditsc.c 2012-01-23 08:44:53 -08:00
backtracetest.c
bounds.c
capability.c Revert "capabitlies: ns_capable can use the cap helpers rather than lsm call" 2012-01-17 10:19:41 -08:00
cgroup_freezer.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
cgroup.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
compat.c
configs.c
cpu_pm.c
cpu.c Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm 2012-01-08 13:10:57 -08:00
cpuset.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c sched: Fix ancient race in do_exit() 2012-01-27 11:55:36 +01:00
extable.c
fork.c epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() 2012-02-24 11:42:50 -08:00
freezer.c
futex_compat.c
futex.c futex: Fix uninterruptible loop due to gate_area 2011-12-31 11:48:28 -08:00
groups.c
hrtimer.c
hung_task.c hung_task: fix false positive during vfork 2012-01-03 16:14:32 -08:00
irq_work.c
itimer.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
jump_label.c Merge remote-tracking branch 'tip/perf/core' into kvm-updates/3.3 2011-12-27 11:22:24 +02:00
kallsyms.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kdump: crashk_res init check for /sys/kernel/kexec_crash_size 2012-01-12 20:13:11 -08:00
kfifo.c
kmod.c Merge branch 'pm-sleep' into pm-for-linus 2011-12-25 23:42:20 +01:00
kprobes.c kprobes: fix a memory leak in function pre_handler_kretprobe() 2012-02-03 16:16:41 -08:00
ksysfs.c
kthread.c
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-06 08:02:58 -08:00
Makefile PM: Make sysrq-o be available for CONFIG_PM unset 2012-01-14 00:33:03 +01:00
module.c error: implicit declaration of function 'module_flags_taint' 2012-01-15 16:21:07 -08:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c
panic.c panic: don't print redundant backtraces on oops 2012-01-12 20:13:11 -08:00
params.c module: make module param bint handle nul value 2012-02-14 11:02:15 +10:30
pid_namespace.c sysctl: add the kernel.ns_last_pid control 2012-01-12 20:13:11 -08:00
pid.c vfs: fix panic in __d_lookup() with high dentry hashtable counts 2012-02-13 20:45:38 -05:00
posix-cpu-timers.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
posix-timers.c
printk.c module_param: make bool parameters really bool (core code) 2012-01-13 09:32:18 +10:30
profile.c
ptrace.c Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security 2012-01-14 18:36:33 -08:00
range.c
rcu.h
rcupdate.c
rcutiny_plugin.h
rcutiny.c
rcutorture.c rcu: Add missing __cpuinit annotation in rcutorture code 2012-01-16 09:44:05 -08:00
rcutree_plugin.h
rcutree_trace.c
rcutree.c
rcutree.h
relay.c relay: prevent integer overflow in relay_open() 2012-02-10 09:04:49 +01:00
res_counter.c net: introduce res_counter_charge_nofail() for socket allocations 2012-01-22 15:08:46 -05:00
resource.c
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: convert sysdev_class to a regular subsystem 2011-12-14 14:54:22 -08:00
rtmutex.c
rtmutex.h
rwsem.c
seccomp.c seccomp: audit abnormal end to a process due to seccomp 2012-01-17 16:16:55 -05:00
semaphore.c
signal.c user namespace: make signal.c respect user namespaces 2012-01-10 16:30:54 -08:00
smp.c
softirq.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c c/r: prctl: add PR_SET_MM codes to set up mm_struct entries 2012-01-12 20:13:13 -08:00
sysctl_binary.c binary_sysctl(): fix memory leak 2011-12-20 10:25:04 -08:00
sysctl_check.c
sysctl.c
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-06 07:53:34 -08:00
tracepoint.c tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT 2012-01-16 11:35:57 -05:00
tsacct.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
uid16.c
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
wait.c lockdep/waitqueues: Add better annotation 2011-12-21 10:07:39 +01:00
watchdog.c bugs, x86: Fix printk levels for panic, softlockups and stack dumps 2012-01-26 21:28:45 +01:00
workqueue_sched.h
workqueue.c workqueue: make alloc_workqueue() take printf fmt and args for name 2012-01-10 16:30:54 -08:00