linux/net
Alex Elder 26be88087a libceph: change how "safe" callback is used
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:18:52 -07:00
..
9p Revert parts of "hlist: drop the node parameter from iterators" 2013-03-08 15:05:34 -08:00
802 net/802/mrp: fix possible race condition when calling mrp_pdu_queue() 2013-04-12 15:10:48 -04:00
8021q 8021q: fix a potential use-after-free 2013-03-24 17:27:28 -04:00
appletalk hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
atm atm: update msg_namelen in vcc_recvmsg() 2013-04-07 16:28:00 -04:00
ax25 ax25: fix info leak via msg_name in ax25_recvmsg() 2013-04-07 16:28:00 -04:00
batman-adv batman-adv: make is_my_mac() check for the current mesh only 2013-04-17 22:31:22 +02:00
bluetooth Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg() 2013-04-07 16:28:01 -04:00
bridge bridge: make user modified path cost sticky 2013-04-15 14:03:44 -04:00
caif caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg() 2013-04-07 16:28:01 -04:00
can can: gw: use kmem_cache_free() instead of kfree() 2013-04-09 09:58:44 +02:00
ceph libceph: change how "safe" callback is used 2013-05-01 21:18:52 -07:00
core net: rate-limit warn-bad-offload splats. 2013-04-19 17:57:49 -04:00
dcb dcbnl: fix various netlink info leaks 2013-03-10 05:19:26 -04:00
dccp Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
decnet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
dns_resolver Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-12-16 15:40:50 -08:00
dsa dsa: make dsa_switch_setup check for valid port names 2013-01-21 15:40:12 -05:00
ethernet net: split eth_mac_addr for better error handling 2013-01-21 14:07:44 -05:00
ieee802154 6lowpan: Fix endianness issue in is_addr_link_local(). 2013-03-10 16:49:35 -04:00
ipv4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2013-04-19 14:24:47 -04:00
ipv6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2013-04-19 14:24:47 -04:00
ipx hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
irda irda: small read past the end of array in debug code 2013-04-19 17:32:31 -04:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
key Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2013-03-27 14:07:04 -04:00
l2tp l2tp: fix info leak in l2tp_ip6_recvmsg() 2013-04-07 16:28:01 -04:00
lapb net/lapb: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:40:01 -08:00
llc llc: Fix missing msg_namelen update in llc_ui_recvmsg() 2013-04-07 16:28:01 -04:00
mac80211 Merge branch 'for-john' of git://x-git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-12 13:20:51 -04:00
mac802154 Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
netfilter netfilter: ipset: bitmap:ip,mac: fix listing with timeout 2013-04-18 23:40:41 +02:00
netlabel netlabel: fix build problems when CONFIG_IPV6=n 2013-03-08 11:33:51 -05:00
netlink genetlink: trigger BUG_ON if a group name is too long 2013-03-20 12:05:51 -04:00
netrom netrom: fix invalid use of sizeof in nr_recvmsg() 2013-04-08 22:49:23 -04:00
nfc NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg() 2013-04-07 16:28:02 -04:00
openvswitch openvswitch: correct an invalid BUG_ON 2013-03-27 09:07:41 -07:00
packet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
phonet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
rds net/rds: zero last byte for strncpy 2013-03-08 00:35:44 -05:00
rfkill rfkill: don't use [delayed_]work_pending() 2012-12-28 13:40:16 -08:00
rose rose: fix info leak via msg_name in rose_recvmsg() 2013-04-07 16:28:02 -04:00
rxrpc Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
sched pkt_sched: fix error return code in fw_change_attrs() 2013-04-19 17:34:53 -04:00
sctp sctp: don't break the loop while meeting the active_path so as to find the matched transport 2013-03-13 10:09:55 -04:00
sunrpc NFS client bugfixes for Linux 3.9 2013-04-10 09:00:51 -07:00
tipc tipc: fix info leaks via msg_name in recv_msg/recv_stream 2013-04-07 16:28:02 -04:00
unix af_unix: If we don't care about credentials coallesce all messages 2013-04-05 00:49:13 -04:00
vmw_vsock VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg() 2013-04-07 16:28:02 -04:00
wimax net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wireless Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-03 14:19:48 -04:00
x25 hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2013-03-27 14:07:04 -04:00
Kconfig Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
Makefile VSOCK: Introduce VM Sockets 2013-02-10 19:41:08 -05:00
compat.c make get_file() return its argument 2012-09-26 21:10:25 -04:00
nonet.c
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
sysctl_net.c user_ns: get rid of duplicate code in net_ctl_permissions 2012-11-18 20:32:45 -05:00