linux/kernel
Davide Libenzi e1ad7468c7 signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only).  It can be used instead of pipe(2) in all cases where those
would simply be used to signal events.  Their kernel overhead is much lower
than pipes, and they do not consume two fds.  When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it.  The API is:

int eventfd(unsigned int count);

The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd.  It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

The POLLIN flag is raised when the internal counter is greater than zero.

The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.

The POLLERR flag is raised when an overflow in the counter value is detected.

The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).

But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.

The read(2) function reads the __u64 counter value, and reset the internal
value to zero.  If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).

The write(2) call writes an __u64 count value, and adds it to the current
counter.  The eventfd fd supports O_NONBLOCK also.

On the kernel side, we have:

struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);

The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).

The kernel can then call eventfd_signal() every time it wants to post an event
to userspace.  The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:

http://www.xmailserver.org/eventfd-bench.c

This is the eventfd-based version of pipetest-4 (pipe(2) based):

http://www.xmailserver.org/pipetest-4.c

Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.

[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
..
irq Fix Linuxdoc comment 2007-05-09 12:30:48 -07:00
power Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2007-05-09 12:56:01 -07:00
time clocksource: fix resume logic 2007-05-09 12:30:56 -07:00
.gitignore
acct.c [PATCH] kernel: change uses of f_{dentry, vfsmnt} to use f_path 2006-12-08 08:28:42 -08:00
audit.c audit: add spaces on either side of case "..." operator. 2007-05-08 11:15:09 -07:00
audit.h
auditfilter.c [PATCH] minor update to rule add/delete messages (ver 2) 2007-02-17 21:30:09 -05:00
auditsc.c [PATCH] fix deadlock in audit_log_task_context() 2007-03-14 15:27:48 -07:00
capability.c [PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task 2007-02-12 09:48:32 -08:00
compat.c signal/timer/event: timerfd compat code 2007-05-11 08:29:36 -07:00
configs.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
cpu.c microcode: use suspend-related CPU hotplug notifications 2007-05-09 12:30:56 -07:00
cpuset.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
delayacct.c KMEM_CACHE(): simplify slab cache creation 2007-05-07 12:12:55 -07:00
die_notifier.c move die notifier handling to common code 2007-05-08 11:15:04 -07:00
dma.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
exec_domain.c
exit.c signal/timer/event: signalfd core 2007-05-11 08:29:36 -07:00
extable.c
fork.c signal/timer/event: signalfd core 2007-05-11 08:29:36 -07:00
futex_compat.c futex_requeue_pi optimization 2007-05-09 12:30:55 -07:00
futex.c FUTEX: new PRIVATE futexes 2007-05-09 12:30:55 -07:00
hrtimer.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
itimer.c The scheduled -EINVAL for invalid timevals in setitimer 2007-05-08 11:15:13 -07:00
kallsyms.c kallsyms: cleanup: use seq_release_private() where appropriate 2007-05-08 11:15:09 -07:00
Kconfig.hz [PATCH] HZ: 300Hz support 2006-12-07 08:39:36 -08:00
Kconfig.preempt Fix trivial typos in Kconfig* files 2007-05-09 07:12:20 +02:00
kexec.c kdump/kexec: calculate note size at compile time 2007-05-08 11:15:07 -07:00
kfifo.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
kmod.c wait_for_helper: remove unneeded do_sigaction() 2007-05-09 12:30:53 -07:00
kprobes.c Kprobes: The ON/OFF knob thru debugfs 2007-05-08 11:15:19 -07:00
ksysfs.c remove "struct subsystem" as it is no longer needed 2007-05-02 18:57:59 -07:00
kthread.c change kernel threads to ignore signals instead of blocking them 2007-05-09 12:30:53 -07:00
latency.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
lockdep_internals.h [PATCH] lockdep: more chains 2006-12-07 08:39:43 -08:00
lockdep_proc.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
lockdep.c lockdep: removed unused ip argument in mark_lock & mark_held_locks 2007-05-08 11:15:13 -07:00
Makefile move die notifier handling to common code 2007-05-08 11:15:04 -07:00
module.c Fix minor typoes in kernel/module.c 2007-05-09 07:26:28 +02:00
mutex-debug.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
mutex-debug.h
mutex.c wrap access to thread_info 2007-05-09 12:30:56 -07:00
mutex.h
nsproxy.c Merge sys_clone()/sys_unshare() nsproxy and namespace handling 2007-05-08 11:15:00 -07:00
panic.c [PATCH] Add TAINT_USER and ability to set taint flags from userspace 2007-02-11 10:51:29 -08:00
params.c kernel/params.c: fix lying comment for param_array() 2007-05-08 11:15:08 -07:00
pid.c statically initialize struct pid for swapper 2007-05-11 08:29:35 -07:00
posix-cpu-timers.c Introduce a handy list_first_entry macro 2007-05-08 11:15:11 -07:00
posix-timers.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
printk.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
profile.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
ptrace.c
rcupdate.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
rcutorture.c rcutorture: Remove redundant assignment to cur_ops in for loop 2007-05-08 11:15:17 -07:00
relay.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-05-09 12:54:17 -07:00
resource.c libata/IDE: remove combined mode quirk 2007-04-28 14:15:59 -04:00
rtmutex_common.h futex_requeue_pi optimization 2007-05-09 12:30:55 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
rtmutex.c futex_requeue_pi optimization 2007-05-09 12:30:55 -07:00
rtmutex.h
rwsem.c Lockdep treats down_write_trylock like regular down_write 2007-05-08 11:15:09 -07:00
sched.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
seccomp.c
signal.c signal/timer/event: signalfd core 2007-05-11 08:29:36 -07:00
softirq.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
softlockup.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00
spinlock.c [PATCH] lockdep: spin_lock_irqsave_nested() 2006-11-25 13:28:34 -08:00
srcu.c
stacktrace.c
stop_machine.c stop_machine() now uses hard_irq_disable 2007-05-11 08:29:34 -07:00
sys_ni.c signal/timer/event: eventfd core 2007-05-11 08:29:36 -07:00
sys.c attach_pid() with struct pid parameter 2007-05-11 08:29:35 -07:00
sysctl.c Make vm statistics update interval configurable 2007-05-09 12:30:56 -07:00
taskstats.c KMEM_CACHE(): simplify slab cache creation 2007-05-07 12:12:55 -07:00
time.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
timer.c timer: revert parenthesis fix in tbase_get_deferrable() etc 2007-05-10 09:26:53 -07:00
tsacct.c [PATCH] time: x86_64: split x86_64/kernel/time.c up 2007-02-16 08:14:00 -08:00
uid16.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
user.c [PATCH] slab: remove kmem_cache_t 2006-12-07 08:39:25 -08:00
utsname_sysctl.c [PATCH] sysctl: remove insert_at_head from register_sysctl 2007-02-14 08:09:59 -08:00
utsname.c Merge sys_clone()/sys_unshare() nsproxy and namespace handling 2007-05-08 11:15:00 -07:00
wait.c Fix occurrences of "the the " 2007-05-09 08:57:56 +02:00
workqueue.c Add suspend-related notifications for CPU hotplug 2007-05-09 12:30:56 -07:00