linux/kernel
Tejun Heo b2efa05265 block, cfq: unlink cfq_io_context's immediately
cic is association between io_context and request_queue.  A cic is
linked from both ioc and q and should be destroyed when either one
goes away.  As ioc and q both have their own locks, locking becomes a
bit complex - both orders work for removal from one but not from the
other.

Currently, cfq tries to circumvent this locking order issue with RCU.
ioc->lock nests inside queue_lock but the radix tree and cic's are
also protected by RCU allowing either side to walk their lists without
grabbing lock.

This rather unconventional use of RCU quickly devolves into extremely
fragile convolution.  e.g. The following is from cfqd going away too
soon after ioc and q exits raced.

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:
 [   88.503444]
 Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
 RIP: 0010:[<ffffffff81397628>]  [<ffffffff81397628>] cfq_exit_single_io_context+0x58/0xf0
 ...
 Call Trace:
  [<ffffffff81395a4a>] call_for_each_cic+0x5a/0x90
  [<ffffffff81395ab5>] cfq_exit_io_context+0x15/0x20
  [<ffffffff81389130>] exit_io_context+0x100/0x140
  [<ffffffff81098a29>] do_exit+0x579/0x850
  [<ffffffff81098d5b>] do_group_exit+0x5b/0xd0
  [<ffffffff81098de7>] sys_exit_group+0x17/0x20
  [<ffffffff81b02f2b>] system_call_fastpath+0x16/0x1b

The only real hot path here is cic lookup during request
initialization and avoiding extra locking requires very confined use
of RCU.  This patch makes cic removal from both ioc and request_queue
perform double-locking and unlink immediately.

* From q side, the change is almost trivial as ioc->lock nests inside
  queue_lock.  It just needs to grab each ioc->lock as it walks
  cic_list and unlink it.

* From ioc side, it's a bit more difficult because of inversed lock
  order.  ioc needs its lock to walk its cic_list but can't grab the
  matching queue_lock and needs to perform unlock-relock dancing.

  Unlinking is now wholly done from put_io_context() and fast path is
  optimized by using the queue_lock the caller already holds, which is
  by far the most common case.  If the ioc accessed multiple devices,
  it tries with trylock.  In unlikely cases of fast path failure, it
  falls back to full double-locking dance from workqueue.

Double-locking isn't the prettiest thing in the world but it's *far*
simpler and more understandable than RCU trick without adding any
meaningful overhead.

This still leaves a lot of now unnecessary RCU logics.  Future patches
will trim them.

-v2: Vivek pointed out that cic->q was being dereferenced after
     cic->release() was called.  Updated to use local variable @this_q
     instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-14 00:33:39 +01:00
..
debug Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
events perf: Do no try to schedule task events if there are none 2011-12-07 16:31:22 +01:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq genirq: Fix race condition when stopping the irq thread 2011-12-02 11:54:24 +01:00
power PM / Hibernate: Do not leak memory in error/test code paths 2011-11-23 21:03:38 +01:00
time alarmtimers: Fix time comparison 2011-12-06 11:38:32 +01:00
trace ftrace: Fix hash record accounting bug 2011-12-05 13:28:47 -05:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt sched: Isolate preempt counting in its own config option 2011-06-10 15:15:40 +02:00
Makefile Merge branch 'devel-stable' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm 2011-10-28 12:02:27 -07:00
acct.c pass a struct path to vfs_statfs 2010-08-09 16:48:42 -04:00
async.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
audit.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
audit_tree.c audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu() 2011-07-20 14:10:11 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
cgroup.c memcg: replace ss->id_lock with a rwlock 2011-11-02 16:07:03 -07:00
cgroup_freezer.c cgroup_freezer: fix freezing groups with stopped tasks 2011-11-24 11:58:22 -08:00
compat.c kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure 2011-10-31 19:30:05 -04:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cpu_pm.c cpu_pm: call notifiers during suspend 2011-09-23 12:05:29 +05:30
cpuset.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
elfcore.c
exec_domain.c sys_personality: remove the bogus checks in sys_personality()->__set_personality() path 2010-08-09 20:45:05 -07:00
exit.c oom: remove oom_disable_count 2011-10-31 17:30:45 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c block, cfq: unlink cfq_io_context's immediately 2011-12-14 00:33:39 +01:00
freezer.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
futex.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
groups.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
hrtimer.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-11-28 08:43:52 -08:00
hung_task.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
irq_work.c kernel: fix two implicit header assumptions in irq_work.c 2011-10-31 09:20:12 -04:00
itimer.c
jump_label.c jump_label: jump_label_inc may return before the code is patched 2011-12-05 13:28:46 -05:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
kexec.c [S390] kdump: Add infrastructure for unmapping crashkernel memory 2011-10-30 15:16:42 +01:00
kfifo.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
kmod.c kmod: prevent kmod_loop_msg overflow in __request_module() 2011-10-26 13:10:39 +10:30
kprobes.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ksysfs.c kernel: ksysfs.c is implicitly using stat.h 2011-10-31 09:20:13 -04:00
kthread.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep.c lockdep, kmemcheck: Annotate ->lock in lockdep_init_map() 2011-12-06 18:18:13 +01:00
lockdep_internals.h
lockdep_proc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep_states.h
module.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
padata.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
panic.c module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree 2011-11-07 07:54:42 +10:30
params.c kernel: params.c needs module.h not moduleparam.h 2011-10-31 09:20:13 -04:00
pid.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
posix-cpu-timers.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 16:17:32 +02:00
posix-timers.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
printk.c printk: avoid double lock acquire 2011-12-09 07:50:28 -08:00
profile.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ptrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
range.c range: fix bogus misuse of module.h to get printk() 2011-10-31 09:20:11 -04:00
rcu.h rcu: Add grace-period, quiescent-state, and call_rcu trace events 2011-09-28 21:38:21 -07:00
rcupdate.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rcutiny.c kernel: fix up module header handling in rcutiny files 2011-10-31 09:20:13 -04:00
rcutiny_plugin.h kernel: fix up module header handling in rcutiny files 2011-10-31 09:20:13 -04:00
rcutorture.c rcu: Make rcu_torture_boost() exit loops at end of test 2011-09-28 21:38:46 -07:00
rcutree.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rcutree.h rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states 2011-09-28 21:38:48 -07:00
rcutree_plugin.h rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states 2011-09-28 21:38:48 -07:00
rcutree_trace.c rcu: Simplify quiescent-state accounting 2011-09-28 21:38:22 -07:00
relay.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rtmutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rtmutex-debug.h
rtmutex-tester.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rtmutex.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
sched.c sched: Set the command name of the idle tasks in SMP kernels 2011-11-14 12:50:43 +01:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched: Skip autogroup when looking for all rt sched groups 2011-07-01 10:39:08 +02:00
sched_clock.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
sched_cpupri.c sched/cpupri: Remove cpupri->pri_active 2011-08-14 12:01:11 +02:00
sched_cpupri.h sched/cpupri: Remove cpupri->pri_active 2011-08-14 12:01:11 +02:00
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Fix buglet in return_cfs_rq_runtime() 2011-11-16 08:43:45 +01:00
sched_features.h sched, rt: Provide means of disabling cross-cpu bandwidth sharing 2011-11-14 12:50:40 +01:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched, rt: Provide means of disabling cross-cpu bandwidth sharing 2011-11-14 12:50:40 +01:00
sched_stats.h locking, sched: Annotate thread_group_cputimer as raw 2011-09-13 11:11:55 +02:00
sched_stoptask.c sched: Implement hierarchical task accounting for SCHED_OTHER 2011-08-14 12:01:13 +02:00
seccomp.c
semaphore.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
signal.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
smp.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
softirq.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
spinlock.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
srcu.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys_ni.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
sysctl.c Merge branch 'akpm' (Andrew's incoming) 2011-10-31 17:46:07 -07:00
sysctl_binary.c ipv4: NET_IPV4_ROUTE_GC_INTERVAL removal 2011-10-03 14:13:01 -04:00
sysctl_check.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
taskstats.c Make TASKSTATS require root access 2011-09-19 17:04:37 -07:00
test_kprobes.c kprobes: Fix selftest to clear flags field for reusing probes 2010-10-14 08:55:27 +02:00
time.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
timeconst.pl
timer.c sys_getppid: add missing rcu_dereference 2011-12-09 07:50:29 -08:00
tracepoint.c Tracepoint: Dissociate from module mutex 2011-08-10 20:38:14 -04:00
tsacct.c Make taskstats round statistics down to nearest 1k bytes/events 2011-09-19 17:10:57 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user-return-notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user_namespace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
utsname.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
utsname_sysctl.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
wait.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
watchdog.c watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL 2011-10-31 17:30:53 -07:00
workqueue.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
workqueue_sched.h workqueue: implement concurrency managed dynamic worker pool 2010-06-29 10:07:14 +02:00