linux/block
Roman Pen a347c7ad8e blk-mq: reinit q->tag_set_list entry only after grace period
It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  <IRQ>
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  </IRQ>

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash            D    0  5910   5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Fixes: Fixes: 705cda97ee ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Cc: stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-11 08:13:41 -06:00
..
partitions block: add verifier for cmdline partition 2018-06-05 09:20:53 -06:00
badblocks.c badblocks: fix wrong return value in badblocks_set if badblocks are disabled 2017-11-03 11:29:50 -07:00
bfq-cgroup.c block: use ktime_get_ns() instead of sched_clock() for cfq and bfq 2018-05-09 08:33:06 -06:00
bfq-iosched.c block, bfq: prevent soft_rt_next_start from being stuck at infinity 2018-05-31 08:54:41 -06:00
bfq-iosched.h block, bfq: remove slow-system class 2018-05-31 08:54:38 -06:00
bfq-wf2q.c block, bfq: limit sectors served with interactive weight raising 2018-01-18 08:21:37 -07:00
bio-integrity.c block: Convert bio_set to mempool_init() 2018-05-14 13:16:03 -06:00
bio.c for-linus-20180608 2018-06-08 13:36:19 -07:00
blk-cgroup.c blkcg: init root blkcg_gq under lock 2018-04-19 08:51:59 -06:00
blk-core.c block: always set partition number to '0' in blk_partition_remap() 2018-06-07 06:56:01 -06:00
blk-exec.c blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request 2018-01-17 09:49:21 -07:00
blk-flush.c block: fix use-after-free in block flush handling 2018-06-09 06:37:14 -06:00
blk-integrity.c block drivers/block: Use octal not symbolic permissions 2018-05-24 13:38:59 -06:00
blk-ioc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-lib.c block: break discard submissions into the user defined size 2018-05-08 15:10:44 -06:00
blk-map.c Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block 2018-01-29 11:51:49 -08:00
blk-merge.c block: don't use blocking queue entered for recursive bio submits 2018-06-02 20:35:00 -06:00
blk-mq-cpumap.c blk-mq: don't keep offline CPUs mapped to hctx 0 2018-04-10 08:38:46 -06:00
blk-mq-debugfs.c blk-mq: Remove generation seqeunce 2018-05-29 08:59:21 -06:00
blk-mq-debugfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-mq-pci.c blk-mq: Allow PCI vector offset for mapping queues 2018-03-27 21:25:36 -06:00
blk-mq-rdma.c block: Add rdma affinity based queue mapping helper 2017-08-08 14:58:03 -04:00
blk-mq-sched.c blk-mq: update nr_requests when switching to 'none' scheduler 2018-06-02 20:35:00 -06:00
blk-mq-sched.h block: move sysfs_lock into elevator_init 2018-06-01 07:38:19 -06:00
blk-mq-sysfs.c block drivers/block: Use octal not symbolic permissions 2018-05-24 13:38:59 -06:00
blk-mq-tag.c blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter 2018-05-30 11:31:34 -06:00
blk-mq-tag.h Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
blk-mq-virtio.c
blk-mq.c blk-mq: reinit q->tag_set_list entry only after grace period 2018-06-11 08:13:41 -06:00
blk-mq.h blk-mq: Remove generation seqeunce 2018-05-29 08:59:21 -06:00
blk-settings.c block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}() 2018-03-08 14:13:48 -07:00
blk-softirq.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-stat.c block: consolidate struct request timestamp fields 2018-05-09 08:33:09 -06:00
blk-stat.h block: consolidate struct request timestamp fields 2018-05-09 08:33:09 -06:00
blk-sysfs.c block: convert bounce, q->bio_split to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
blk-tag.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
blk-throttle.c blk-throttle: return proper bool type to caller instead of 0/1 2018-05-30 12:48:22 -06:00
blk-timeout.c block: remove BLK_EH_HANDLED 2018-05-29 08:59:21 -06:00
blk-wbt.c block: get rid of struct blk_issue_stat 2018-05-09 08:33:05 -06:00
blk-wbt.h block: get rid of struct blk_issue_stat 2018-05-09 08:33:05 -06:00
blk-zoned.c blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers 2018-05-22 11:58:07 -06:00
blk.h block: split the blk-mq case from elevator_init 2018-06-01 07:38:21 -06:00
bounce.c block: fixup bioset_integrity_create() call 2018-05-30 18:51:21 -06:00
bsg-lib.c block: remove parent device reference from struct bsg_class_device 2018-05-29 13:00:25 -06:00
bsg.c block: remove parent device reference from struct bsg_class_device 2018-05-29 13:00:25 -06:00
cfq-iosched.c block drivers/block: Use octal not symbolic permissions 2018-05-24 13:38:59 -06:00
cmdline-parser.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
deadline-iosched.c block drivers/block: Use octal not symbolic permissions 2018-05-24 13:38:59 -06:00
elevator.c block: split the blk-mq case from elevator_init 2018-06-01 07:38:21 -06:00
genhd.c Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
ioctl.c block: pass inclusive 'lend' parameter to truncate_inode_pages_range 2018-02-23 15:20:19 -07:00
ioprio.c block: add ioprio_check_cap function 2018-05-31 10:50:54 -04:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.iosched License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kyber-iosched.c block: kyber: make kyber more friendly with merging 2018-05-30 10:47:40 -06:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mq-deadline.c block drivers/block: Use octal not symbolic permissions 2018-05-24 13:38:59 -06:00
noop-iosched.c
opal_proto.h block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled 2017-09-11 09:45:52 -06:00
partition-generic.c block: don't print a message when the device went away 2018-05-29 08:59:21 -06:00
scsi_ioctl.c block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
sed-opal.c for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
t10-pi.c t10-pi: Move opencoded contants to common header 2017-07-03 16:56:25 -06:00