linux/drivers/block
Alex Elder 0f2d5be792 rbd: use reference counts for image requests
Each image request contains a reference count, but to date it has
not actually been used.  (I think this was just an oversight.) A
recent report involving rbd failing an assertion shed light on why
and where we need to use these reference counts.

Every OSD request associated with an object request uses
rbd_osd_req_callback() as its callback function.  That function will
call a helper function (dependent on the type of OSD request) that
will set the object request's "done" flag if the object request if
appropriate.  If that "done" flag is set, the object request is
passed to rbd_obj_request_complete().

In rbd_obj_request_complete(), requests are processed in sequential
order.  So if an object request completes before one of its
predecessors in the image request, the completion is deferred.
Otherwise, if it's a completing object's "turn" to be completed, it
is passed to rbd_img_obj_end_request(), which records the result of
the operation, accumulates transferred bytes, and so on.  Next, the
successor to this request is checked and if it is marked "done",
(deferred) completion processing is performed on that request, and
so on.  If the last object request in an image request is completed,
rbd_img_request_complete() is called, which (typically) destroys
the image request.

There is a race here, however.  The instant an object request is
marked "done" it can be provided (by a thread handling completion of
one of its predecessor operations) to rbd_img_obj_end_request(),
which (for the last request) can then lead to the image request
getting torn down.  And this can happen *before* that object has
itself entered rbd_img_obj_end_request().  As a result, once it
*does* enter that function, the image request (and even the object
request itself) may have been freed and become invalid.

All that's necessary to avoid this is to properly count references
to the image requests.  We tear down an image request's object
requests all at once--only when the entire image request has
completed.  So there's no need for an image request to count
references for its object requests.  However, we don't want an
image request to go away until the last of its object requests
has passed through rbd_img_obj_callback().  In other words,
we don't want rbd_img_request_complete() to necessarily
result in the image request being destroyed, because it may
get called before we've finished processing on all of its
object requests.

So the fix is to add a reference to an image request for
each of its object requests.  The reference can be viewed
as representing an object request that has not yet finished
its call to rbd_img_obj_callback().  That is emphasized by
getting the reference right after assigning that as the image
object's callback function.  The corresponding release of that
reference is done at the end of rbd_img_obj_callback(), which
every image object request passes through exactly once.

Cc: stable@vger.kernel.org
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-06-06 09:29:59 +08:00
..
aoe mm: close PageTail race 2014-03-04 07:55:47 -08:00
drbd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
mtip32xx Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block 2014-04-01 19:43:53 -07:00
paride drivers/block/paride/pg.c: underflow bug in pg_write() 2014-01-21 20:16:56 -08:00
rsxx block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
xen-blkback xen-blkback: init persistent_purge_work work_struct 2014-02-11 20:34:03 -07:00
zram zram: support REQ_DISCARD 2014-04-07 16:36:02 -07:00
DAC960.c DAC960: remove sleep_on usage 2014-03-13 14:56:38 -06:00
DAC960.h
Kconfig zram: promote zram from staging 2014-01-30 16:56:55 -08:00
Makefile zram: promote zram from staging 2014-01-30 16:56:55 -08:00
amiflop.c tree-wide: use reinit_completion instead of INIT_COMPLETION 2013-11-15 09:32:21 +09:00
ataflop.c ataflop: fix sleep_on races 2014-03-13 14:56:38 -06:00
brd.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
cciss.c cciss: Fallback to MSI rather than to INTx if MSI-X failed 2014-03-13 14:56:39 -06:00
cciss.h
cciss_cmd.h
cciss_scsi.c
cciss_scsi.h
cpqarray.c
cpqarray.h
cryptoloop.c
floppy.c floppy: don't write kernel-only members to FDRAWCMD ioctl output 2014-05-05 07:46:56 -07:00
hd.c
ida_cmd.h
ida_ioctl.h
loop.c drivers/block/loop.c: ratelimit error messages 2014-04-08 14:44:35 -06:00
loop.h
mg_disk.c mg_disk: Spelling s/finised/finished/ 2014-01-21 20:34:58 -08:00
nbd.c switch nbd to sockfd_lookup/sockfd_put 2014-04-01 23:19:10 -04:00
null_blk.c null_blk: use blk_complete_request and blk_mq_complete_request 2014-02-10 09:27:31 -07:00
nvme-core.c Merge git://git.infradead.org/users/willy/linux-nvme 2014-04-11 16:45:59 -07:00
nvme-scsi.c NVMe: Retry failed commands with non-fatal errors 2014-04-10 17:11:59 -04:00
osdblk.c
pktcdvd.c pktcdvd: fix error return code 2014-01-03 10:05:34 +01:00
ps3disk.c block: Kill bio_segments()/bi_vcnt usage 2013-11-23 22:33:51 -08:00
ps3vram.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
rbd.c rbd: use reference counts for image requests 2014-06-06 09:29:59 +08:00
rbd_types.h
skd_main.c skd: Use pci_enable_msix_range() instead of pci_enable_msix() 2014-02-21 15:45:26 -08:00
skd_s1120.h skd: fix formatting in skd_s1120.h 2013-11-08 09:10:30 -07:00
smart1,2.h
sunvdc.c
swim.c
swim3.c swim3: fix interruptible_sleep_on race 2014-03-13 14:56:38 -06:00
swim_asm.S
sx8.c drivers/block/sx8.c: remove unnecessary pci_set_drvdata() 2014-01-21 20:16:56 -08:00
umem.c block: Convert drivers to immutable biovecs 2013-11-23 22:33:51 -08:00
umem.h
virtio_blk.c Nothing exciting: virtio-blk users might see a bit of a boost from the 2014-04-02 14:43:17 -07:00
xen-blkfront.c Merge branch 'stable/for-jens-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linus 2014-02-10 12:52:34 -07:00
xsysace.c
z2ram.c block/z2ram: Remove duplicate external declarations 2013-11-26 11:09:10 +01:00