linux

Author	SHA1	Message	Date
Sage Weil	559c1e0073	ceph: include auth method in error messages Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:23 -07:00
Sage Weil	f26e681d52	ceph: osdtimeout=0 for now timeout Allow the osd reset timeout to be disabled. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:22 -07:00
Dan Carpenter	0d509c949a	ceph: d_obtain_alias() returns ERR_PTR() d_obtain_alias() doesn't return NULL, it returns an ERR_PTR(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:22 -07:00
Yehuda Sadeh	c473ad927e	ceph: wake up mount thread when getting osdmap Now that the mount thread waits for the osdmap, it needs to be awaken. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-05-17 15:25:21 -07:00
Huang Weiyi	1bb71637d0	ceph: remove unused #includes Remove unused #include's in fs/ceph/super.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:21 -07:00
Sage Weil	6822d00b54	ceph: wait for both monmap and osdmap when opening session Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-05-17 15:25:20 -07:00
Sage Weil	6f2bc3ff4c	ceph: clean up connection reset Reset out_keepalive_pending and peer_global_seq, and drop unused var. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:20 -07:00
Sage Weil	bb257664f7	ceph: simplify ceph_msg_new We only need to pass in front_len. Callers can attach any other payload pieces (middle, data) as they see fit. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:19 -07:00
Sage Weil	a79832f26b	ceph: make ceph_msg_new return NULL on failure; clean up, fix callers Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure instead, and fix up the callers (about half of which were wrong anyway). Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:18 -07:00
Sage Weil	d52f847a84	ceph: rewrite msgpool using mempool_t Since we don't need to maintain large pools of messages, we can just use the standard mempool_t. We maintain a msgpool 'wrapper' because we need the mempool_t* in the alloc function, and mempool gives us only pool_data. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:18 -07:00
Cheng Renquan	640ef79d27	ceph: use ceph_sb_to_client instead of ceph_client ceph_sb_to_client and ceph_client are really identical, we need to dump one; while function ceph_client is confusing with "struct ceph_client", ceph_sb_to_client's definition is more clear; so we'd better switch all call to ceph_sb_to_client. -static inline struct ceph_client ceph_client(struct super_block sb) -{ - return sb->s_fs_info; -} Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:17 -07:00
Cheng Renquan	2d06eeb877	ceph: handle kzalloc() failure Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:16 -07:00
Sage Weil	7c315c552c	ceph: drop unnecessary msgpool for mon_client subscribe_ack Preallocate a single message to reuse instead. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:16 -07:00
Sage Weil	6694d6b95c	ceph: drop unnecessary msgpool for mon_client auth_reply Preallocate a single reply message that we can reuse instead. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:15 -07:00
Sage Weil	3143edd3a1	ceph: clean up statfs Avoid unnecessary msgpool. Preallocate reply. Fix use-after-free race. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:15 -07:00
Sage Weil	6f46cb2935	ceph: fix theoretically possible double-put on connection This would only trigger if we bailed out before resetting r_con_filling_msg because the server reply was corrupt (oversized). Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:14 -07:00
Dan Carpenter	c7708075f1	ceph: cleanup: remove dead code "xattr" is never NULL here. We took care of that in the previous if statement block. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:14 -07:00
Sage Weil	104648ad3f	ceph: reduce build_path debug output Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:13 -07:00
Yehuda Sadeh	31459fe4b2	ceph: use __page_cache_alloc and add_to_page_cache_lru Following Nick Piggin patches in btrfs, pagecache pages should be allocated with __page_cache_alloc, so they obey pagecache memory policies. Also, using add_to_page_cache_lru instead of using a private pagevec where applicable. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:12 -07:00
Stephen Rothwell	f553069e5d	ceph: update for removal of kref_set Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:12 -07:00
Sage Weil	21b667f69b	ceph: simplify page setup for incoming data Drop largely useless helper __prepare_pages(), and simplify sanity checks. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 15:25:11 -07:00
Sage Weil	81a6cf2d30	ceph: invalidate affected dentry leases on aborted requests If we abort a request, we return to caller, but the request may still complete. And if we hold the dir FILE_EXCL bit, we may not release a lease when sending a request. A simple un-tar, control-c, un-tar again will reproduce the bug (manifested as a 'Cannot open: File exists'). Ensure we invalidate affected dentry leases (as well dir I_COMPLETE) so we don't have valid (but incorrect) leases. Do the same, consistently, at other sites where I_COMPLETE is similarly cleared. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 10:25:45 -07:00
Sage Weil	b4556396fa	ceph: fix race between aborted requests and fill_trace When we abort requests we need to prevent fill_trace et al from doing anything that relies on locks held by the VFS caller. This fixes a race between the reply handler and the abort code, ensuring that continue holding the dir mutex until the reply handler completes. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 10:25:45 -07:00
Sage Weil	e1518c7c0a	ceph: clean up mds reply, error handling We would occasionally BUG out in the reply handler because r_reply was nonzero, due to a race with ceph_mdsc_do_request temporarily setting r_reply to an ERR_PTR value. This is unnecessary, messy, and also wrong in the EIO case. Clean up by consistently using r_err for errors and r_reply for messages. Also fix the abort logic to trigger consistently for all errors that return to the caller early (e.g., EIO from timeout case). If an abort races with a reply, use the result from the reply. Also fix locking for r_err, r_reply update in the reply handler. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-17 10:25:44 -07:00
Linus Torvalds	18e41da89d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check for read permission on src file in the clone ioctl	2010-05-15 12:55:31 -07:00
Dan Rosenberg	5dc6416414	Btrfs: check for read permission on src file in the clone ioctl The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-05-15 12:05:50 -04:00
Linus Torvalds	3f8bf8f0fd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: JFS: Free sbi memory in error path fs/sysv: dereferencing ERR_PTR() Fix double-free in logfs Fix the regression created by "set S_DEAD on unlink()..." commit	2010-05-15 09:03:15 -07:00
Jan Blunck	684bdc7ff9	JFS: Free sbi memory in error path I spotted the missing kfree() while removing the BKL. [akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again] Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:34 -04:00
Dan Carpenter	404e781249	fs/sysv: dereferencing ERR_PTR() I moved the dir_put_page() inside the if condition so we don't dereference "page", if it's an ERR_PTR(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:33 -04:00
Al Viro	265624495f	Fix double-free in logfs iput() is needed until we'd done successful d_alloc_root() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:33 -04:00
Al Viro	d83c49f3e3	Fix the regression created by "set S_DEAD on unlink()..." commit 1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:33 -04:00
Linus Torvalds	4fc4c3ce0d	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: don't leak user struct on inotify release inotify: race use after free/double free in inotify inode marks inotify: clean up the inotify_add_watch out path Inotify: undefined reference to `anon_inode_getfd' Manual merge to remove duplicate "select ANON_INODES" from Kconfig file	2010-05-14 11:49:42 -07:00
Pavel Emelyanov	b3b38d842f	inotify: don't leak user struct on inotify release inotify_new_group() receives a get_uid-ed user_struct and saves the reference on group->inotify_data.user. The problem is that free_uid() is never called on it. Issue seem to be introduced by `63c882a0` (inotify: reimplement inotify using fsnotify) after 2.6.30. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Eric Paris <eparis@parisplace.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-14 11:53:36 -04:00
Eric Paris	e08733446e	inotify: race use after free/double free in inotify inode marks There is a race in the inotify add/rm watch code. A task can find and remove a mark which doesn't have all of it's references. This can result in a use after free/double free situation. Task A Task B ------------ ----------- inotify_new_watch() allocate a mark (refcnt == 1) add it to the idr inotify_rm_watch() inotify_remove_from_idr() fsnotify_put_mark() refcnt hits 0, free take reference because we are on idr [at this point it is a use after free] [time goes on] refcnt may hit 0 again, double free The fix is to take the reference BEFORE the object can be found in the idr. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: <stable@kernel.org>	2010-05-14 11:52:57 -04:00
Eric Paris	3dbc6fb6a3	inotify: clean up the inotify_add_watch out path inotify_add_watch explictly frees the unused inode mark, but it can just use the generic code. Just do that. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-14 11:51:07 -04:00
Linus Torvalds	4462dc0284	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: guard against hardlinking directories	2010-05-13 10:36:16 -07:00
Jan Kara	002baeecf5	vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW \| O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-13 08:46:04 -07:00
Linus Torvalds	cdf5f61ed1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: preserve seq # on requeued messages after transient transport errors ceph: fix cap removal races ceph: zero unused message header, footer fields ceph: fix locking for waking session requests after reconnect ceph: resubmit requests on pg mapping change (not just primary change) ceph: fix open file counting on snapped inodes when mds returns no caps ceph: unregister osd request on failure ceph: don't use writeback_control in writepages completion ceph: unregister bdi before kill_anon_super releases device name	2010-05-12 18:47:29 -07:00
David Howells	7ac512aa82	CacheFiles: Fix error handling in cachefiles_determine_cache_security() cachefiles_determine_cache_security() is expected to return with a security override in place. However, if set_create_files_as() fails, we fail to do this. In this case, we should just reinstate the security override that was set by the caller. Furthermore, if set_create_files_as() fails, we should dispose of the new credentials we were in the process of creating. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-12 18:23:58 -07:00
Russell King	e7b702b1a8	Inotify: undefined reference to `anon_inode_getfd' Fix: fs/built-in.o: In function `sys_inotify_init1': summary.c:(.text+0x347a4): undefined reference to `anon_inode_getfd' found by kautobuild with arms bcmring_defconfig, which ends up with INOTIFY_USER enabled (through the 'default y') but leaves ANON_INODES unset. However, inotify_user.c uses anon_inode_getfd(). Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-05-12 11:03:40 -04:00
Sage Weil	e84346b726	ceph: preserve seq # on requeued messages after transient transport errors If the tcp connection drops and we reconnect to reestablish a stateful session (with the mds), we need to resend previously sent (and possibly received) messages with the _same_ seq # so that they can be dropped on the other end if needed. Only assign a new seq once after the message is queued. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 21:20:38 -07:00
Sage Weil	f818a73674	ceph: fix cap removal races The iterate_session_caps helper traverses the session caps list and tries to grab an inode reference. However, the __ceph_remove_cap was clearing the inode backpointer _before_ removing itself from the session list, causing a null pointer dereference. Clear cap->ci under protection of s_cap_lock to avoid the race, and to tightly couple the list and backpointer state. Use a local flag to indicate whether we are releasing the cap, as cap->session may be modified by a racing thread in iterate_session_caps. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 20:56:31 -07:00
Robin Holt	34441427aa	revert "procfs: provide stack information for threads" and its fixup commits Originally, commit `d899bf7b` ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit `c44972f1` ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit `89240ba0` ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit `9ebd4eba7` ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit `1306d603f` ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted `c44972f1` ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted `89240ba0` ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-11 17:33:41 -07:00
Sage Weil	45c6ceb547	ceph: zero unused message header, footer fields We shouldn't leak any prior memory contents to other parties. And random data, particularly in the 'version' field, can cause problems down the line. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 15:17:40 -07:00
Jeff Layton	3d69438031	cifs: guard against hardlinking directories When we made serverino the default, we trusted that the field sent by the server in the "uniqueid" field was actually unique. It turns out that it isn't reliably so. Samba, in particular, will just put the st_ino in the uniqueid field when unix extensions are enabled. When a share spans multiple filesystems, it's quite possible that there will be collisions. This is a server bug, but when the inodes in question are a directory (as is often the case) and there is a collision with the root inode of the mount, the result is a kernel panic on umount. Fix this by checking explicitly for directory inodes with the same uniqueid. If that is the case, then we can assume that using server inode numbers will be a problem and that they should be disabled. Fixes Samba bugzilla 7407 Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-05-11 20:57:50 +00:00
David Howells	c61ea31dac	CacheFiles: Fix occasional EIO on call to vfs_unlink() Fix an occasional EIO returned by a call to vfs_unlink(): [ 4868.465413] CacheFiles: I/O Error: Unlink failed [ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error [ 4947.320011] CacheFiles: File cache on md3 unregistering [ 4947.320041] FS-Cache: Withdrawing cache "mycache" [ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles) [ 5127.348716] CacheFiles: File cache on md3 registered [ 7076.871081] CacheFiles: I/O Error: Unlink failed [ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error [ 7116.780891] CacheFiles: File cache on md3 unregistering [ 7116.780937] FS-Cache: Withdrawing cache "mycache" [ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles) [ 7296.813432] CacheFiles: File cache on md3 registered What happens is this: (1) A cached NFS file is seen to have become out of date, so NFS retires the object and immediately acquires a new object with the same key. (2) Retirement of the old object is done asynchronously - so the lookup/create to generate the new object may be done first. This can be a problem as the old object and the new object must exist at the same point in the backing filesystem (i.e. they must have the same pathname). (3) The lookup for the new object sees that a backing file already exists, checks to see whether it is valid and sees that it isn't. It then deletes that file and creates a new one on disk. (4) The retirement phase for the old file is then performed. It tries to delete the dentry it has, but ext4_unlink() returns -EIO because the inode attached to that dentry no longer matches the inode number associated with the filename in the parent directory. The trace below shows this quite well. [md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1) [md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100) NFS has retired the old cookie and asked for a new one. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_DYING] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING] The old object (OBJ52) is going through the terminal states to get rid of it, whilst the new object - (OBJ53) - is coming into being. [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0}) [kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,) [kslowd] lookup '@68' [kslowd] next -> ffff88002ce41bd0 positive [kslowd] advance [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff8800369faac8 positive The new object has looked up the subdir in which the file would be in (getting dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry ffff8800369faac8). [kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object [kslowd] <== cachefiles_bury_object() = 0 It then checks the file's xattrs to see if it's valid. NFS says that the auxiliary data indicate the file is out of date (obvious to us - that's why NFS ditched the old version and got a new one). CacheFiles then deletes the old file (dentry ffff8800369faac8). [kslowd] redo lookup [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff88002cd94288 negative [kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}} CacheFiles then redoes the lookup and gets a negative result in a new dentry (ffff88002cd94288) which it then creates a file for. [kslowd] ==> cachefiles_mark_object_active(,OBJ53) [kslowd] <== cachefiles_mark_object_active() = 0 [kslowd] === OBTAINED_OBJECT === [kslowd] <== cachefiles_walk_to_object() = 0 [148247] [kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE] The new object is then marked active and the state machine moves to the available state - at which point NFS can start filling the object. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20}) [kslowd] ==> fscache_release_object() [kslowd] ==> cachefiles_drop_object({OBJ52,2}) [kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8}) The old object, meanwhile, goes on with being retired. If allocation occurs first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to become available before it can continue. [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193) CacheFiles: I/O Error: Unlink failed FS-Cache: Cache cachefiles stopped due to I/O error CacheFiles then tries to delete the file for the old object, but the dentry it has (ffff8800369faac8) no longer points to a valid inode for that directory entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino. [kslowd] <== cachefiles_bury_object() = -5 [kslowd] <== cachefiles_delete_object() = -5 [kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE] (Note that the above trace includes extra information beyond that produced by the upstream code). The fix is to note when an object that is being retired has had its object deleted preemptively by a replacement object that is being created, and to skip the second removal attempt in such a case. Reported-by: Greg M <gregm@servu.net.au> Reported-by: Mark Moseley <moseleymark@gmail.com> Reported-by: Romain DEGEZ <romain.degez@smartjog.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-11 10:07:53 -07:00
Sage Weil	9abf82b8bc	ceph: fix locking for waking session requests after reconnect The session->s_waiting list is protected by mdsc->mutex, not s_mutex. This was causing (rare) s_waiting list corruption. Fix errors paths too, while we're here. A more thorough cleanup of this function is coming soon. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:57 -07:00
Sage Weil	d85b705663	ceph: resubmit requests on pg mapping change (not just primary change) OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:56 -07:00
Sage Weil	04d000eb35	ceph: fix open file counting on snapped inodes when mds returns no caps It's possible the MDS will not issue caps on a snapped inode, in which case an open request may not __ceph_get_fmode(), botching the open file counting. (This is actually a server bug, but the client shouldn't BUG out in this case.) Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:55 -07:00
Sage Weil	0ceed5db32	ceph: unregister osd request on failure The osd request wasn't being unregistered when the osd returned a failure code, even though the result was returned to the caller. This would cause it to eventually time out, and then crash the kernel when it tried to resend the request using a stale page vector. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-11 09:53:18 -07:00

1 2 3 4 5 ...

17659 Commits