linux

Commit Graph

Author	SHA1	Message	Date
Mi Jinlong	ae82a8d06f	nfsd41: check the size of request Check in SEQUENCE that the request doesn't exceed maxreq_sz for the given session. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 19:00:00 -04:00
Mi Jinlong	1b74c25bc1	nfsd41: error out when client sets maxreq_sz or maxresp_sz too small According to RFC5661, 18.36.3, "if the client selects a value for ca_maxresponsesize such that a replier on a channel could never send a response,the server SHOULD return NFS4ERR_TOOSMALL in the CREATE_SESSION reply." So, error out when the client sets a maxreq_sz less than the minimum possible SEQUENCE request size, or sets a maxresp_sz less than the minimum possible SEQUENCE reply size. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:51 -04:00
J. Bruce Fields	f197c27196	nfsd4: fix file leak on open_downgrade Stateid's hold a read reference for a read open, a write reference for a write open, and an additional one of each for each read+write open. The latter wasn't getting put on a downgrade, so something like: open RW open R downgrade to R was resulting in a file leak. Also fix an imbalance in an error path. Regression from `7d94784293` "nfsd4: fix downgrade/lock logic". Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:49 -04:00
J. Bruce Fields	499f3edc23	nfsd4: remember to put RW access on stateid destruction Without this, for example, open read open read+write close will result in a struct file leak. Regression from `7d94784293` "nfsd4: fix downgrade/lock logic". Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:49 -04:00
Bryan Schumaker	1745680454	NFSD: Added TEST_STATEID operation This operation is used by the client to check the validity of a list of stateids. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:48 -04:00
Bryan Schumaker	e1ca12dfb1	NFSD: added FREE_STATEID operation This operation is used by the client to tell the server to free a stateid. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:47 -04:00
NeilBrown	49b28684fd	nfsd: Remove deprecated nfsctl system call and related code. As promised in feature-removal-schedule.txt it is time to remove the nfsctl system call. Userspace has perferred to not use this call throughout 2.6 and it has been excluded in the default configuration since 2.6.36 (9 months ago). So this patch removes all the code that was being compiled out. There are still references to sys_nfsctl in various arch systemcall tables and related code. These should be cleaned out too, probably in the next merge window. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:42 -04:00
Benny Halevy	094b5d74f4	NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as the client ID derived from the session ID of SEQUENCE is not the same as the client ID to be destroyed. If the client IDs are the same, then the server MUST return NFS4ERR_CLIENTID_BUSY. (that's not implemented yet) If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only operation in the COMPOUND request (otherwise, the server MUST return NFS4ERR_NOT_ONLY_OP). This fixes the error return; before, we returned NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:41 -04:00
David Teigland	23e8e1aaac	dlm: use workqueue for callbacks Instead of creating our own kthread (dlm_astd) to deliver callbacks for all lockspaces, use a per-lockspace workqueue to deliver the callbacks. This eliminates complications and slowdowns from many lockspaces sharing the same thread. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-15 12:30:43 -05:00
Linus Torvalds	da1b001a2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fix loop checks in d_materialise_unique() Fix ->d_lock locking order in unlazy_walk()	2011-07-15 09:55:39 -07:00
Eric Sandeen	46fcb2ed29	GFS2: combine duplicated block freeing routines __gfs2_free_data and __gfs2_free_meta are almost identical, and can be trivially combined. [This is as per Eric's original patch minus gfs2_free_data() which had no callers left and plus the conversion of the bmap.c calls to these functions. All in all, a nice clean up] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:52 +01:00
Steven Whitehouse	9964afbb79	GFS2: Add S_NOSEC support This adds S_NOSEC support to GFS2. We set/reset the flag either when a user calls setattr or when we have just regained the glock from another node. The flag is only set if there are no xattrs on the inode and there is no suid bit set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>	2011-07-15 09:32:35 +01:00
Bob Peterson	7cf8dcd3b6	GFS2: Automatically adjust glock min hold time This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:32:11 +01:00
Steven Whitehouse	17d539f049	GFS2: Cache dir hash table in a contiguous buffer This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-15 09:31:48 +01:00
Al Viro	1836750115	fix loop checks in d_materialise_unique() Both __d_unalias() and __d_materialise_dentry() need loop prevention. Grab rename_lock in caller, check for loops there... As a side benefit, we have dentry_lock_for_move() called only under rename_lock, which seriously reduces deadlock potential of the execrable "locking order" used for ->d_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-14 21:33:41 -04:00
David Teigland	883ba74f43	dlm: remove deadlock debug print gfs2 recently began using this feature heavily, creating more debug output than we want to see. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-14 12:31:49 -05:00
Linus Torvalds	5dcd07b9f3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Resolve inode eviction and ail list interaction bug GFS2: Fix race during filesystem mount GFS2: force a log flush when invalidating the rindex glock	2011-07-14 10:20:42 -07:00
Steven Whitehouse	380f7c65a7	GFS2: Resolve inode eviction and ail list interaction bug This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>	2011-07-14 08:59:44 +01:00
Linus Torvalds	201f92e2ca	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix use of static variable in rpcb_getport_async NFSv4.1: update nfs4_fattr_bitmap_maxsz SUNRPC: Fix a race between work-queue and rpc_killall_tasks pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs	2011-07-13 14:34:08 -07:00
Christoph Hellwig	d0f9e8fb4c	xfs: remove the dead XFS_DABUF_DEBUG code Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:50 +02:00
Christoph Hellwig	c84470dda7	xfs: remove leftovers of the old btree tracing code Remove various bits left over from the old kdb-only btree tracing code, but leave the actual trace point stubs in place to ease adding new event based btree tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:50 +02:00
Christoph Hellwig	ea15ab3cdd	xfs: remove the dead QUOTADEBUG code Remove the dead hash table test rid which has been rotting away under QUOTADEBUG, including some code that was compiled for normal debug builds, but not actually called without QUOTADEBUG, and enable a few cheap debug checks that were hidden under QUOTADEBUG for normal debug builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:50 +02:00
Christoph Hellwig	54244fec67	xfs: remove the unused xfs_buf_delwri_sort function Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:49 +02:00
Christoph Hellwig	cb669ca570	xfs: remove wrappers around b_iodone Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:49 +02:00
Christoph Hellwig	adadbeefb3	xfs: remove wrappers around b_fspriv Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:49 +02:00
Christoph Hellwig	bf9d9013a2	xfs: add a proper transaction pointer to struct xfs_buf Replace the typeless b_fspriv2 and the ugly macros around it with a properly typed transaction pointer. As a fallout the log buffer state debug checks are also removed. We could have kept them using casts, but as they do not have a real purpose we can as well just remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:49 +02:00
Christoph Hellwig	77936d0280	xfs: factor out xfs_da_grow_inode_int xfs_da_grow_inode and xfs_dir2_grow_inode are mostly duplicate code. Factor the meat of those two functions into a new common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:49 +02:00
Christoph Hellwig	a230a1df40	xfs: factor out xfs_dir2_leaf_find_stale Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:48 +02:00
Christoph Hellwig	a00b7745c6	xfs: cleanup struct xfs_dir2_free Change the bests array to be a proper variable sized entry. This is done easily as no one relies on the size of the structure. Also change XFS_DIR2_MAX_FREE_BESTS to an inline function while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:48 +02:00
Christoph Hellwig	5792664070	xfs: reshuffle dir2 headers Replace the current mess of dir2 headers with just three that have a clear purpose: - xfs_dir2_format.h for all format definitions, including the inline helpers to access our variable size structures - xfs_dir2_priv.h for all prototypes that are internal to the dir2 code and not needed by anything outside of the directory code. For this purpose xfs_da_btree.c, and phase6.c in xfs_repair are considered part of the directory code. - xfs_dir2.h for the public interface to the directory code In addition to the reshuffle I have also update the comments to not only match the new file structure, but also to describe the directory format better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-13 13:43:48 +02:00
Christoph Hellwig	2bcf6e970f	xfs: start periodic workers later Start the periodic sync workers only after we have finished xfs_mountfs and thus fully set up the filesystem structures. Without this we can call into xfs_qm_sync before the quotainfo strucute is set up if the mount takes unusually long, and probably hit other incomplete states as well. Also clean up the xfs_fs_fill_super error path by using consistent label names, and removing an impossible to reach case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-07-13 13:43:48 +02:00
Al Viro	94c0d4ecbe	Fix ->d_lock locking order in unlazy_walk() Make sure that child is still a child of parent before nested locking of child->d_lock in unlazy_walk(); otherwise we are risking a violation of locking order and deadlocks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-12 21:40:23 -04:00
David Teigland	3881ac04eb	dlm: improve rsb searches By pre-allocating rsb structs before searching the hash table, they can be inserted immediately. This avoids always having to repeat the search when adding the struct to hash list. This also adds space to the rsb struct for a max resource name, so an rsb allocation can be used by any request. The constant size also allows us to finally use a slab for the rsb structs. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-12 16:02:09 -05:00
Steve French	c2ec9471b5	[CIFS] update cifs to version 1.74 Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-12 19:15:02 +00:00
Steve French	ea1be1a3c3	[CIFS] update limit for snprintf in cifs_construct_tcon In `34c87901e1` "Shrink stack space usage in cifs_construct_tcon" we change the size of the username name buffer from MAX_USERNAME_SIZE (256) to 28. This call to snprintf() needs to be updated as well. Reported by Dan Carpenter. Reviewed-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-12 19:14:24 +00:00
Shirish Pargaonkar	62411ab2fe	cifs: Fix signing failure when server mandates signing for NTLMSSP When using NTLMSSP authentication mechanism, if server mandates signing, keep the flags in type 3 messages of the NTLMSSP exchange same as in type 1 messages (i.e. keep the indicated capabilities same). Some of the servers such as Samba, expect the flags such as Negotiate_Key_Exchange in type 3 message of NTLMSSP exchange as well. Some servers like Windows do not. https://bugzilla.samba.org/show_bug.cgi?id=8212 Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-12 19:14:23 +00:00
Steven Whitehouse	3942ae5319	GFS2: Fix race during filesystem mount There is a potential race during filesystem mounting which has recently been reported. It occurs when the userland gfs_controld is able to process requests fast enough that it tries to use the sysfs interface before the lock module is properly initialised. This is a pretty unusual case as normally the lock module initialisation is very quick compared with gfs_controld. This patch adds an interruptible completion which is used to ensure that userland will wait for the initialisation of the lock module to complete. There are other potential solutions to this problem, but this is the quickest at this stage and has been tested both with and without mount.gfs2 present in the system. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Booher <dbooher@adams.net>	2011-07-12 09:15:46 +01:00
Benjamin Marzinski	1ce533686c	GFS2: force a log flush when invalidating the rindex glock Right now, there is nothing that forces the log to get flushed when a node drops its rindex glock so that another node can grow the filesystem. If the log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the following way. A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is dropped so the other node can grow the filesystem. When the node reacquires the rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being removed from the list by gfs2_log_flush(). This code simply forces a log flush when the rindex glock is invalidated, solving the problem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-12 09:15:24 +01:00
Justin TerAvest	4aede84b33	fixlet: Remove fs_excl from struct task. fs_excl is a poor man's priority inheritance for filesystems to hint to the block layer that an operation is important. It was never clearly specified, not widely adopted, and will not prevent starvation in many cases (like across cgroups). fs_excl was introduced with the time sliced CFQ IO scheduler, to indicate when a process held FS exclusive resources and thus needed a boost. It doesn't cover all file systems, and it was never fully complete. Lets kill it. Signed-off-by: Justin TerAvest <teravest@google.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-12 08:35:10 +02:00
Andy Adamson	e5012d1f38	NFSv4.1: update nfs4_fattr_bitmap_maxsz Attribute IDs assigned in RFC 5661 now require three bitmaps. Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Signed-off-by: Andy Adamson <andros@netapp.com> Cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-11 19:14:38 -04:00
Linus Torvalds	71a1b44b03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: drop spinlock before calling cifs_put_tlink cifs: fix expand_dfs_referral cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALL cifs: factor smb_vol allocation out of cifs_setup_volume_info cifs: have cifs_cleanup_volume_info not take a double pointer cifs: fix build_unc_path_to_root to account for a prefixpath cifs: remove bogus call to cifs_cleanup_volume_info	2011-07-11 12:48:24 -07:00
Jeff Layton	f484b5d001	cifs: drop spinlock before calling cifs_put_tlink ...as that function can sleep. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-11 18:40:52 +00:00
Alex Elder	b2ce397400	Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" This reverts commit `7a249cf83d`. That commit created a situation that could lead to a filesystem hang. As Dave Chinner pointed out, xfs_trans_alloc() could hold a reference to m_active_trans (i.e., keep it non-zero) and then wait for SB_FREEZE_TRANS to complete. Meanwhile a filesystem freeze request could set SB_FREEZE_TRANS and then wait for m_active_trans to drop to zero. Nobody benefits from this sequence of events... Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-11 10:21:03 -05:00
David Teigland	3d6aa675ff	dlm: keep lkbs in idr This is simpler and quicker than the hash table, and avoids needing to search the hash list for every new lkid to check if it's used. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-11 08:43:45 -05:00
David Teigland	a22ca48068	dlm: fix kmalloc args The gfp and size args were switched. Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-11 08:40:53 -05:00
Jesper Juhl	5d70828a77	dlm: don't do pointless NULL check, use kzalloc and fix order of arguments In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small issues: 1) There's no need to test the return value of the allocation and do a memset if is succeedes. Just use kzalloc() to obtain zeroed memory. 2) Since kfree() handles NULL pointers gracefully, the test of 'warned' against NULL before the kfree() after the loop is completely pointless. Remove it. 3) The arguments to kmalloc() (now kzalloc()) were swapped. Thanks to Dr. David Alan Gilbert for pointing this out. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-11 08:39:42 -05:00
Jiri Kosina	b7e9c223be	Merge branch 'master' into for-next Sync with Linus' tree to be able to apply pending patches that are based on newer code already present upstream.	2011-07-11 14:15:55 +02:00
Wu Fengguang	1a12d8bd7b	writeback: scale IO chunk size up to half device bandwidth Originally, MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a concern of not holding I_SYNC for too long. (At least, that was the comment previously.) This doesn't make sense now because the only time we wait for I_SYNC is if we are calling sync or fsync, and in that case we need to write out all of the data anyway. Previously there may have been other code paths that waited on I_SYNC, but not any more. -- Theodore Ts'o So remove the MAX_WRITEBACK_PAGES constraint. The writeback pages will adapt to as large as the storage device can write within 500ms. XFS is observed to do IO completions in a batch, and the batch size is equal to the write chunk size. To avoid dirty pages to suddenly drop out of balance_dirty_pages()'s dirty control scope and create large fluctuations, the chunk size is also limited to half the control scope. The balance_dirty_pages() control scrope is [(background_thresh + dirty_thresh) / 2, dirty_thresh] which is by default [15%, 20%] of global dirty pages, whose range size is dirty_thresh / DIRTY_FULL_SCOPE. The adpative write chunk size will be rounded to the nearest 4MB boundary. http://bugzilla.kernel.org/show_bug.cgi?id=13930 CC: Theodore Ts'o <tytso@mit.edu> CC: Dave Chinner <david@fromorbit.com> CC: Chris Mason <chris.mason@oracle.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:03 -07:00
Wu Fengguang	c42843f2f0	writeback: introduce smoothed global dirty limit The start of a heavy weight application (ie. KVM) may instantly knock down determine_dirtyable_memory() if the swap is not enabled or full. global_dirty_limits() and bdi_dirty_limit() will in turn get global/bdi dirty thresholds that are _much_ lower than the global/bdi dirty pages. balance_dirty_pages() will then heavily throttle all dirtiers including the light ones, until the dirty pages drop below the new dirty thresholds. During this _deep_ dirty-exceeded state, the system may appear rather unresponsive to the users. About "deep" dirty-exceeded: task_dirty_limit() assigns 1/8 lower dirty threshold to heavy dirtiers than light ones, and the dirty pages will be throttled around the heavy dirtiers' dirty threshold and reasonably below the light dirtiers' dirty threshold. In this state, only the heavy dirtiers will be throttled and the dirty pages are carefully controlled to not exceed the light dirtiers' dirty threshold. However if the threshold itself suddenly drops below the number of dirty pages, the light dirtiers will get heavily throttled. So introduce global_dirty_limit for tracking the global dirty threshold with policies - follow downwards slowly - follow up in one shot global_dirty_limit can effectively mask out the impact of sudden drop of dirtyable memory. It will be used in the next patch for two new type of dirty limits. Note that the new dirty limits are not going to avoid throttling the light dirtiers, but could limit their sleep time to 200ms. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:02 -07:00
Wu Fengguang	e98be2d599	writeback: bdi write bandwidth estimation The estimation value will start from 100MB/s and adapt to the real bandwidth in seconds. It tries to update the bandwidth only when disk is fully utilized. Any inactive period of more than one second will be skipped. The estimated bandwidth will be reflecting how fast the device can writeout when _fully utilized_, and won't drop to 0 when it goes idle. The value will remain constant at disk idle time. At busy write time, if not considering fluctuations, it will also remain high unless be knocked down by possible concurrent reads that compete for the disk time and bandwidth with async writes. The estimation is not done purely in the flusher because there is no guarantee for write_cache_pages() to return timely to update bandwidth. The bdi->avg_write_bandwidth smoothing is very effective for filtering out sudden spikes, however may be a little biased in long term. The overheads are low because the bdi bandwidth update only occurs at 200ms intervals. The 200ms update interval is suitable, because it's not possible to get the real bandwidth for the instance at all, due to large fluctuations. The NFS commits can be as large as seconds worth of data. One XFS completion may be as large as half second worth of data if we are going to increase the write chunk to half second worth of data. In ext4, fluctuations with time period of around 5 seconds is observed. And there is another pattern of irregular periods of up to 20 seconds on SSD tests. That's why we are not only doing the estimation at 200ms intervals, but also averaging them over a period of 3 seconds and then go further to do another level of smoothing in avg_write_bandwidth. CC: Li Shaohua <shaohua.li@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:01 -07:00
Wu Fengguang	d46db3d582	writeback: make writeback_control.nr_to_write straight Pass struct wb_writeback_work all the way down to writeback_sb_inodes(), and initialize the struct writeback_control there. struct writeback_control is basically designed to control writeback of a single file, but we keep abuse it for writing multiple files in writeback_sb_inodes() and its callers. It immediately clean things up, e.g. suddenly wbc.nr_to_write vs work->nr_pages starts to make sense, and instead of saving and restoring pages_skipped in writeback_sb_inodes it can always start with a clean zero value. It also makes a neat IO pattern change: large dirty files are now written in the full 4MB writeback chunk size, rather than whatever remained quota in wbc->nr_to_write. Acked-by: Jan Kara <jack@suse.cz> Proposed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:01 -07:00
Jeff Layton	b9bce2e9f9	cifs: fix expand_dfs_referral Regression introduced in commit `724d9f1cfb`. Prior to that, expand_dfs_referral would regenerate the mount data string and then call cifs_parse_mount_options to re-parse it (klunky, but it worked). The above commit moved cifs_parse_mount_options out of cifs_mount, so the re-parsing of the new mount options no longer occurred. Fix it by making expand_dfs_referral re-parse the mount options. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-09 21:25:57 +00:00
Jeff Layton	20547490c1	cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALL This needs to be done regardless of whether that KConfig option is set or not. Reported-by: Sven-Haegar Koch <haegar@sdinet.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-09 20:29:51 +00:00
Linus Torvalds	1acc9309eb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix oops when doing space balance Btrfs: don't panic if we get an error while balancing V2 btrfs: add missing options displayed in mount output	2011-07-08 23:25:45 -07:00
Chandra Seetharaman	81463b1ca8	xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact() Remove two variables that serve no purpose in xfs_alloc_ag_vextent_exact(). Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-08 11:32:51 -05:00
Eric Sandeen	c0e090ced2	xfs: consolidate & clarify mount sanity checks Pavol pointed out that there is one silent error case in the mount path, and that others are rather uninformative. I've taken Pavol's suggested patch and extended it a bit to also: * fix a message which says "turned off" but actually errors out * consolidate the vaguely differentiated "SB sanity check [12]" messages, and hexdump the superblock for analysis Original-patch-by: Pavol Gono <Pavol.Gono@siemens.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-08 11:32:51 -05:00
Linus Torvalds	54af2bd25c	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: unpin stale inodes directly in IOP_COMMITTED	2011-07-08 09:00:51 -07:00
Christoph Hellwig	e163cbde98	xfs: avoid a few disk cache flushes There is no need for a pre-flush when doing writing the second part of a split log buffer, and if we are using an external log there is no need to do a full cache flush of the log device at all given that all writes to it use the FUA flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:36 +02:00
Christoph Hellwig	1d5ae5dfee	xfs: cleanup I/O-related buffer flags Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and XBF_FLUSH to allow more fine grained control over the bio flags. Also cleanup processing of the flags in _xfs_buf_ioapply to make more sense, and renumber the sparse flag number space to group flags by purpose. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:32 +02:00
Christoph Hellwig	c8da0faf6b	xfs: return the buffer locked from xfs_buf_get_uncached All other xfs_buf_get/read-like helpers return the buffer locked, make sure xfs_buf_get_uncached isn't different for no reason. Half of the callers already lock it directly after, and the others probably should also keep it locked if only for consistency and beeing able to use xfs_buf_rele, but I'll leave that for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:25 +02:00
Christoph Hellwig	0c842ad46a	xfs: clean up buffer locking helpers Rename xfs_buf_cond_lock and reverse it's return value to fit most other trylock operations in the Kernel and XFS (with the exception of down_trylock, after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val with an xfs_buf_islocked for use in asserts, or and opencoded variant in tracing. remove the XFS_BUF_* wrappers for all the locking helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:19 +02:00
Christoph Hellwig	bbb4197c73	xfs: remove the unused xfs_bufhash structure Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:10 +02:00
Christoph Hellwig	69ef921b55	xfs: byteswap constants instead of variables Micro-optimize various comparisms by always byteswapping the constant instead of the variable, which allows to do the swap at compile instead of runtime. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:05 +02:00
Christoph Hellwig	218106a110	xfs: use generic get_unaligned_beXX helpers Switch the shortform directory code over to use the generic get_unaligned_beXX helpers instead of reinventing them. As a result kill off xfs_arch.h and move the setting of XFS_NATIVE_HOST into xfs_linux.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:58 +02:00
Christoph Hellwig	2282396d81	xfs: cleanup struct xfs_dir2_leaf Simplify the confusing xfs_dir2_leaf structure. It is supposed to describe an XFS dir2 leaf format btree block, but due to the variable sized nature of almost all elements in it it can't actuall do anything close to that job. Remove the members that are after the first variable sized array, given that they could only be used for sizeof expressions that can as well just use the underlying types directly, and make the ents array a real C99 variable sized array. Also factor out the xfs_dir2_leaf_size, to make the sizing of a leaf entry which already was convoluted somewhat readable after using the longer type names in the sizeof expressions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:53 +02:00
Christoph Hellwig	3ed8638f88	xfs: cleanup the definition of struct xfs_dir2_data_entry Remove the tag member which is at a variable offset after the actual name, and make name a real variable sized C99 array instead of the incorrect one-sized array which confuses (not only) gcc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:50 +02:00
Christoph Hellwig	0ba9cd84ef	xfs: kill struct xfs_dir2_data Remove the confusing xfs_dir2_data structure. It is supposed to describe an XFS dir2 data btree block, but due to the variable sized nature of almost all elements in it it can't actuall do anything close to that job. In addition to accessing the fixed offset header structure it was only used to get a pointer to the first dir or unused entry after it, which can be trivially replaced by pointer arithmetics on the header pointer. For most users that is actually more natural anyway, as they don't use a typed pointer but rather a character pointer for further arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:42 +02:00
Christoph Hellwig	c2066e2662	xfs: avoid usage of struct xfs_dir2_data In most places we can simply pass around and use the struct xfs_dir2_data_hdr, which is the first and most important member of struct xfs_dir2_data instead of the full structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:38 +02:00
Christoph Hellwig	a64b041797	xfs: kill struct xfs_dir2_block Remove the confusing xfs_dir2_block structure. It is supposed to describe an XFS dir2 block format btree block, but due to the variable sized nature of almost all elements in it it can't actuall do anything close to that job. In addition to accessing the fixed offset header structure it was only used to get a pointer to the first dir or unused entry after it, which can be trivially replaced by pointer arithmetics on the header pointer. For most users that is actually more natural anyway, as they don't use a typed pointer but rather a character pointer for further arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:32 +02:00
Christoph Hellwig	4f6ae1a49e	xfs: avoid usage of struct xfs_dir2_block In most places we can simply pass around and use the struct xfs_dir2_data_hdr, which is the first and most important member of struct xfs_dir2_block instead of the full structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:27 +02:00
Christoph Hellwig	78f70cd7b7	xfs: cleanup the definition of struct xfs_dir2_sf_entry Remove the inumber member which is at a variable offset after the actual name, and make name a real variable sized C99 array instead of the incorrect one-sized array which confuses (not only) gcc. Based on this clean up the helpers to calculate the entry size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:19 +02:00
Christoph Hellwig	ac8ba50f6b	xfs: kill struct xfs_dir2_sf The list field of it is never cactually used, so all uses can simply be replaced with the xfs_dir2_sf_hdr_t type that it has as first member. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:13 +02:00
Christoph Hellwig	8bc3878758	xfs: cleanup shortform directory inode number handling Refactor the shortform directory helpers that deal with the 32-bit vs 64-bit wide inode numbers into more sensible helpers, and kill the xfs_intino_t typedef that is now superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:35:03 +02:00
Christoph Hellwig	4fb44c8272	xfs: factor out xfs_dir2_leaf_find_entry Add a new xfs_dir2_leaf_find_entry helper to factor out some duplicate code from xfs_dir2_leaf_addname xfs_dir2_leafn_add. Found by Eric Sandeen using an automated code duplication checker. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:59 +02:00
Christoph Hellwig	29d104af0a	xfs: kill the unused struct xfs_sync_work Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:51 +02:00
Christoph Hellwig	f3ca87389d	xfs: remove i_transp Remove the transaction pointer in the inode. It's only used to avoid passing down an argument in the bmap code, and for a few asserts in the transaction code right now. Also use the local variable ip in a few more places in xfs_inode_item_unlock, so that it isn't only used for debug builds after the above change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:47 +02:00
Christoph Hellwig	7a249cf83d	xfs: fix filesystsem freeze race in xfs_trans_alloc As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem freeze when it sleeps during the memory allocation. Fix this by moving the wait_for_freeze call after the memory allocation. This means moving the freeze into the low-level _xfs_trans_alloc helper, which thus grows a new argument. Also fix up some comments in that area while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <david@fromorbit.com>	2011-07-08 14:34:42 +02:00
Christoph Hellwig	33b8f7c247	xfs: improve sync behaviour in the face of aggressive dirtying The following script from Wu Fengguang shows very bad behaviour in XFS when aggressively dirtying data during a sync on XFS, with sync times up to almost 10 times as long as ext4. A large part of the issue is that XFS writes data out itself two times in the ->sync_fs method, overriding the livelock protection in the core writeback code, and another issue is the lock-less xfs_ioend_wait call, which doesn't prevent new ioend from being queue up while waiting for the count to reach zero. This patch removes the XFS-internal sync calls and relies on the VFS to do it's work just like all other filesystems do. Note that the i_iocount wait which is rather suboptimal is simply removed here. We already do it in ->write_inode, which keeps the current supoptimal behaviour. We'll eventually need to remove that as well, but that's material for a separate commit. ------------------------------ snip ------------------------------ #!/bin/sh umount /dev/sda7 mkfs.xfs -f /dev/sda7 # mkfs.ext4 /dev/sda7 # mkfs.btrfs /dev/sda7 mount /dev/sda7 /fs echo $((50<<20)) > /proc/sys/vm/dirty_bytes pid= for i in `seq 10` do dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 & pid="$pid $!" done sleep 1 tic=$(date +'%s') sync tac=$(date +'%s') echo echo sync time: $((tac-tic)) egrep '(Dirty\|Writeback\|NFS_Unstable)' /proc/meminfo pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; } ------------------------------ snip ------------------------------ Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:39 +02:00
Christoph Hellwig	8f04c47aa9	xfs: split xfs_itruncate_finish Split the guts of xfs_itruncate_finish that loop over the existing extents and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs. Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish, which allows to simplify the latter a lot, by only letting it deal with the data fork. As a result xfs_itruncate_finish is renamed to xfs_itruncate_data to make its use case more obvious. Also remove the sync parameter from xfs_itruncate_data, which has been unessecary since the introduction of the busy extent list in 2002, and completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was made a no-op. I can't actually see why the xfs_attr_inactive needs to set the transaction sync, but let's keep this patch simple and without changes in behaviour. Also avoid passing a useless argument to xfs_isize_check, and make it private to xfs_inode.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:34 +02:00
Christoph Hellwig	857b9778d8	xfs: kill xfs_itruncate_start xfs_itruncate_start is a rather length wrapper that evaluates to a call to xfs_ioend_wait and xfs_tosspages, and only has two callers. Instead of using the complicated checks left over from IRIX where we can to truncate the pagecache just call xfs_tosspages (aka truncate_inode_pages) directly as we want to get rid of all data after i_size, and truncate_inode_pages handles incorrect alignments and too large offsets just fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:30 +02:00
Christoph Hellwig	681b120018	xfs: always log timestamp updates in xfs_setattr_size Get rid of the special case where we use unlogged timestamp updates for a truncate to the current inode size, and just call xfs_setattr_nonsize for it to treat it like a utimes calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:26 +02:00
Christoph Hellwig	c4ed4243c4	xfs: split xfs_setattr Split up xfs_setattr into two functions, one for the complex truncate handling, and one for the trivial attribute updates. Also move both new routines to xfs_iops.c as they are fairly Linux-specific. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:23 +02:00
Christoph Hellwig	dec58f1dfd	xfs: work around bogus gcc warning in xfs_allocbt_init_cursor GCC 4.6 complains about an array subscript is above array bounds when using the btree index to index into the agf_levels array. The only two indices passed in are 0 and 1, and we have an assert insuring that. Replace the trick of using the array index directly with using constants in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:18 +02:00
Christoph Hellwig	dbcdde3e76	xfs: re-enable non-blocking behaviour in xfs_map_blocks The non-blockig behaviour in xfs_vm_writepage currently is conditional on having both the WB_SYNC_NONE sync_mode and the nonblocking flag set. The latter used to be used by both pdflush, kswapd and a few other places in older kernels, but has been fading out starting with the introduction of the per-bdi flusher threads. Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back the behaviour we want. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:14 +02:00
Christoph Hellwig	680a647b49	xfs: PF_FSTRANS should never be set in ->writepage Now that we reject direct reclaim in addition to always using GFP_NOFS allocation there's no chance we'll ever end up in ->writepage with PF_FSTRANS set. Add a WARN_ON if we hit this case, and stop checking if we'd actually need to start a transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:05 +02:00
Anatolij Gustschin	19495f70d1	UBIFS: fix master node recovery When the 1st LEB was unmapped and written but 2nd LEB not, the master node recovery doesn't succeed after power cut. We see following error when mounting UBIFS partition on NOR flash: UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node Correct 2nd master node offset check is needed to fix the problem. If the 2nd master node is at the end in the 2nd LEB, first master node is used for recovery. When checking for this condition we should check whether the master node is exactly at the end of the LEB (without remaining empty space) or whether it is followed by an empty space less than the master node size. Artem: when the error happened, offs2 = 261120, sz = 512, c->leb_size = 262016. Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>	2011-07-08 06:53:18 +03:00
Jeff Layton	04db79b015	cifs: factor smb_vol allocation out of cifs_setup_volume_info Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-08 03:51:23 +00:00
David Howells	c902ce1bfb	FS-Cache: Add a helper to bulk uncache pages on an inode Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-07 13:21:56 -07:00
Alexey Khoroshilov	dd7f3d5458	hfsplus: Add error propagation for hfsplus_ext_write_extent_locked Implement error propagation through the callers of hfsplus_ext_write_extent_locked(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-07 17:45:46 +02:00
Alexey Khoroshilov	5bd9d99d10	hfsplus: add error checking for hfs_find_init() hfs_find_init() may fail with ENOMEM, but there are places, where the returned value is not checked. The consequences can be very unpleasant, e.g. kfree uninitialized pointer and inappropriate mutex unlocking. The patch adds checks for errors in hfs_find_init(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-07 17:45:46 +02:00
Miao Xie	149e2d76b4	btrfs: fix oops when doing space balance We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:51:53 -04:00
Josef Bacik	508794eb5e	Btrfs: don't panic if we get an error while balancing V2 A user reported an error where if we try to balance an fs after a device has been removed it will blow up. This is because we get an EIO back and this is where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks, Reported-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:46:43 -04:00
David Sterba	0942caa373	btrfs: add missing options displayed in mount output There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:46:43 -04:00
Masatake YAMATO	bcaadf5c1a	dlm: dump address of unknown node When the dlm fails to make a network connection to another node, include the address of the node in the error message. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-06 16:37:23 -05:00
Dave Chinner	1316d4da3f	xfs: unpin stale inodes directly in IOP_COMMITTED When inodes are marked stale in a transaction, they are treated specially when the inode log item is being inserted into the AIL. It tries to avoid moving the log item forward in the AIL due to a race condition with the writing the underlying buffer back to disk. The was "fixed" in commit `de25c18` ("xfs: avoid moving stale inodes in the AIL"). To avoid moving the item forward, we return a LSN smaller than the commit_lsn of the completing transaction, thereby trying to trick the commit code into not moving the inode forward at all. I'm not sure this ever worked as intended - it assumes the inode is already in the AIL, but I don't think the returned LSN would have been small enough to prevent moving the inode. It appears that the reason it worked is that the lower LSN of the inodes meant they were inserted into the AIL and flushed before the inode buffer (which was moved to the commit_lsn of the transaction). The big problem is that with delayed logging, the returning of the different LSN means insertion takes the slow, non-bulk path. Worse yet is that insertion is to a position -before- the commit_lsn so it is doing a AIL traversal on every insertion, and has to walk over all the items that have already been inserted into the AIL. It's expensive. To compound the matter further, with delayed logging inodes are likely to go from clean to stale in a single checkpoint, which means they aren't even in the AIL at all when we come across them at AIL insertion time. Hence these were all getting inserted into the AIL when they simply do not need to be as inodes marked XFS_ISTALE are never written back. Transactional/recovery integrity is maintained in this case by the other items in the unlink transaction that were modified (e.g. the AGI btree blocks) and committed in the same checkpoint. So to fix this, simply unpin the stale inodes directly in xfs_inode_item_committed() and return -1 to indicate that the AIL insertion code does not need to do any further processing of these inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-06 15:44:40 -05:00
Jeff Layton	f9e59bcba2	cifs: have cifs_cleanup_volume_info not take a double pointer ...as that makes for a cumbersome interface. Make it take a regular smb_vol pointer and rely on the caller to zero it out if needed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:05 +00:00
Jeff Layton	b2a0fa1520	cifs: fix build_unc_path_to_root to account for a prefixpath Regression introduced by commit `f87d39d951`. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:05 +00:00
Jeff Layton	677d8537d8	cifs: remove bogus call to cifs_cleanup_volume_info This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's called the following call to cifs_get_tcp_session will oops as the volume_info pointer will then be NULL. The caller of cifs_mount should clean up this data since it passed it in. There's no need for us to call this here. Regression introduced by commit `724d9f1cfb`. Reported-by: Adam Williamson <awilliam@redhat.com> Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:04 +00:00
Davidlohr Bueso	bcb65a797e	FDPIC: Fix memory leak The shdr4extnum variable isn't being freed in the cleanup process of elf_fdpic_core_dump(). Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 12:15:16 -07:00
Miklos Szeredi	a51cb91d81	fs: fix lock initialization locks_alloc_lock() assumed that the allocated struct file_lock is already initialized to zero members. This is only true for the first allocation of the structure, after reuse some of the members will have random values. This will for example result in passing random fl_start values to userspace in fuse for FL_FLOCK locks, which is an information leak at best. Fix by reinitializing those members which may be non-zero after freeing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 10:41:13 -07:00
Linus Torvalds	121782a248	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix sync and dio writes across stripe boundaries libceph: fix page calculation for non-page-aligned io ceph: fix page alignment corrections	2011-07-05 13:15:57 -07:00
Linus Torvalds	a8728d3554	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: Fix double iput of the same inode in hfsplus_fill_super() hfsplus: add missing call to bio_put()	2011-07-05 10:04:27 -07:00
Artem Bityutskiy	a7fa94a9fe	UBIFS: improve power cut emulation testing This patch cleans-up and improves the power cut testing: 1. Kill custom 'simple_random()' function and use 'random32()' instead. 2. Make timeout larger 3. When cutting the buffer - fill the end with random data sometimes, not only with 0xFFs. 4. Some times cut in the middle of the buffer, not always at the end. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:34 +03:00
Artem Bityutskiy	d27462a518	UBIFS: rename recovery testing variables Since the recovery testing is effectively about emulating power cuts by UBIFS, use "power cut" as the base term for all the related variables and name them correspondingly. This is just a minor clean-up for the sake of readability. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:34 +03:00
Artem Bityutskiy	f57cb188cc	UBIFS: remove custom list of superblocks This is a clean-up of the power-cut emulation code - remove the custom list of superblocks which we maintained to find the superblock by the UBI volume descriptor. We do not need that crud any longer, because now we can get the superblock as a function argument. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	0a541b14e8	UBIFS: stop re-defining UBI operations Now when we use UBIFS helpers for all the I/O, we can remove the horrible hack of re-defining UBI I/O functions. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d3b2578f56	UBIFS: switch to I/O helpers Switch the rest of direct UBI calls to UBIFS helper functions. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	987226a5d3	UBIFS: switch to ubifs_leb_write Stop using 'ubi_leb_write()' directly and switch to the 'ubifs_leb_write()' helper. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d304820a1f	UBIFS: switch to ubifs_leb_read Instead of using 'ubi_read()' function directly, used the 'ubifs_leb_read()' helper function instead. This allows to get rid of several redundant error messages and make sure that we always have a stack dump on read errors. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	83cef708c6	UBIFS: introduce more I/O helpers Introduce the following I/O helper functions: 'ubifs_leb_read()', 'ubifs_leb_write()', 'ubifs_leb_change()', 'ubifs_leb_unmap()', 'ubifs_leb_map()', 'ubifs_is_mapped(). The idea is to wrap all UBI I/O functions in order to encapsulate various assertions and error path handling (error message, stack dump, switching to R/O mode). And there are some other benefits of this which will be used in the following patches. This patch does not switch whole UBIFS to use these functions yet. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d033c98b17	UBIFS: always print stacktrace when switching to R/O mode When switching to R/O mode due to an I/O error, always dump the stack, not only when debugging is enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	891a54a153	UBIFS: remove unused and unneeded debugging function This patch contains several minor clean-up and preparational cahnges. 1. Remove 'dbg_read()', 'dbg_write()', 'dbg_change()', and 'dbg_leb_erase()' functions as they are not used. 2. Remove 'dbg_leb_read()' and 'dbg_is_mapped()' as they are not really needed, it is fine to let reads go through in failure mode. 3. Rename 'offset' argument to 'offs' to be consistent with the rest of UBIFS code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	e7717060dd	UBIFS: add global debugfs knobs Now we have per-FS (superblock) debugfs knobs, but they have one drawback - you have to first mount the FS and only after this you can switch self-checks on/off. But often we want to have the checks enabled during the mount. Introduce global debugging knobs for this purpose. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	28488fc28a	UBIFS: introduce debugfs helpers Separate out pieces of code from the debugfs file read/write functions and create separate 'interpret_user_input()'/'provide_user_output()' helpers. These helpers will be needed in one of the following patches, so this is just a preparational change. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	7dae997de6	UBIFS: re-arrange debugging code a bit Move 'dbg_debugfs_init()' and 'dbg_debugfs_exit()' functions which initialize debugfs for whole UBIFS subsystem below the code which initializes debugfs for a particular UBIFS instance. And do the same for 'ubifs_debugging_init()' and 'ubifs_debugging_exit()' functions. This layout is a bit better for the next patches, so this is just a preparation. Also, rename 'open_debugfs_file()' into 'dfs_file_open()' for consistency. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	24a4f8009e	UBIFS: be more informative in failure mode When we are testing UBIFS recovery, it is better to print in which eraseblock we are going to fail. Currently UBIFS prints it only if recovery debugging messages are enabled, but this is not very practical. So change 'dbg_rcvry()' messages to 'ubifs_warn()' messages. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:30 +03:00
Artem Bityutskiy	81e79d38df	UBIFS: switch self-check knobs to debugfs UBIFS has many built-in self-check functions which can be enabled using the debug_chks module parameter or the corresponding sysfs file (/sys/module/ubifs/parameters/debug_chks). However, this is not flexible enough because it is not per-filesystem. This patch moves this to debugfs interfaces. We already have debugfs support, so this patch just adds more debugfs files. While looking at debugfs support I've noticed that it is racy WRT file-system unmount, and added a TODO entry for that. This problem has been there for long time and it is quite standard debugfs PITA. The plan is to fix this later. This patch is simple, but it is large because it changes many places where we check if a particular type of checks is enabled or disabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	8d7819b4af	UBIFS: lessen amount of debugging check types We have too many different debugging checks - lessen the amount by merging all index-related checks into one. At the same time, move the "force in-the-gap" test to the "index checks" class, because it is too heavy for the "general" class. This patch merges TNC, Old index, and Index size check and calles this just "index checks". Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	2b1844a8c9	UBIFS: introduce helper functions for debugging checks and tests This patch introduces helper functions for all debugging checks, so instead of doing if (!(ubifs_chk_flags & UBIFS_CHK_GEN)) we now do if (!dbg_is_chk_gen(c)) This is a preparation to further changes where the flags will go away, and we'll need to only change the helper functions, but the code which utilizes them won't be touched. At the same time this patch removes 'dbg_force_in_the_gaps()', 'dbg_force_in_the_gaps_enabled()', and dbg_failure_mode helpers for consistency. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	d808efb407	UBIFS: amend debugging inode size check function prototype Add 'const struct ubifs_info *c' parameter to 'dbg_check_synced_i_size()' function because we'll need it in the next patch when we switch to debugfs. So this patch is just a preparation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	bb2615d4d1	UBIFS: amend debugging name check function prototype Add 'struct ubifs_info *c' parameter to the 'dbg_check_name()' debugging function - it will be needed in one of the following commits where we switch to debugfs. So this is just a preparation. Mark parameters as 'const' while on it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	06b282a4cc	UBIFS: add few commentaries about TNC Add a couple of comments - while looking into TNC I could not easily figure out few facts, so it is a good idea to document them in the code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	3766244769	UBIFS: use correct flags in lprops The UBIFS lpt tree is in many aspects similar to the TNC tree, and we have similar flags for these trees. And by mistake we use the COW_ZNODE flag for LPT in some places, instead of the right flag COW_CNODE. And this works only because these two constants have the same value. This patch makes all the LPT code to use COW_CNODE and also changes COW_CNODE constant value to make sure we do not misuse the flags any more. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	f42eed7cba	UBIFS: harmonize znode flag helpers We have 3 znode flags: cow, obsolete, dirty. For the last flag we have a 'ubifs_zn_dirty()' helper function, but for the other 2 flags we use 'test_bit()' directly. This patch makes the situation more consistent and introduces helpers for the other 2 flags: 'ubifs_zn_cow()' and 'ubifs_zn_obsolete()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	1f42596ec0	UBIFS: remove dead code Remove dead pieces of code under "if (c->min_io_size == 1)" statement - we never execute it because in UBIFS 'c->min_io_size' is always at least 8. This are leftovers from old pre-mainline prototype. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	12e776a088	UBIFS: remove unnecessary brackets Remove unnecessary brackets in "inode->i_flags \|= (S_NOCMTIME)" statement to make the code not look silly. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	a29fa9dfa4	UBIFS: minor cleanup: use S_ISREG helper Instead of using long "(inode->i_mode & S_IFMT) != S_IFREG" expression, use shorted "!S_ISREG(inode->i_mode)". Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	1b51e98365	UBIFS: rename dbg_check_dir_size function Since this function is not only about size checking, rename it to 'dbg_check_dir()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	4315fb4072	UBIFS: improve inode dumping function Teach 'dbg_dump_inode()' dump directory entries for directory inodes. This requires few additional changes: 1. The 'c' argument of 'dbg_dump_inode()' cannot be const any more. 2. Users of 'dbg_dump_inode()' should not have 'tnc_mutex' locked. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	bfcf677dec	UBIFS: dump stack when pnode or nnode reading fails When we fail to read a pnode or nnode - print stacktrace if debugging is enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	ae380ce047	UBIFS: lessen the size of debugging info data structure This patch lessens the 'struct ubifs_debug_info' size by 90 bytes by allocating less bytes for the debugfs root directory name. It introduces macros for the name patter an length instead of hard-coding 100 bytes. It also makes UBIFS use 'snprintf()' and teaches it to gracefully catch situations when the name array is too short. Additionally, this patch makes 2 unrelated changes - I just thought they do not deserve separate commits: simplifies 'ubifs_assert()' for non-debugging case and makes 'dbg_debugfs_init()' properly verify debugfs return code which may be an error code or NULL, so we should you 'IS_ERR_OR_NULL()' instead of 'IS_ERR()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	549c999a76	UBIFS: return EROFS in case of broken commit If commit failed and it is in broken state, UBIFS switches to R/O mode. Most operations return -EROFS in this case, except of commit which returns -EINVAL. Make it return -EROFS too for consistency. This is also important for our power cut emulation testing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Bryn M. Reeves	c282af4990	dlm: use vmalloc for hash tables Allocate dlm hash tables in the vmalloc area to allow a greater maximum size without restructuring of the hash table code. Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-01 15:49:23 -05:00
Johannes Stezenbach	390192b300	compat_ioctl: fix warning caused by qemu On Linux x86_64 host with 32bit userspace, running qemu or even just "qemu-img create -f qcow2 some.img 1G" causes a kernel warning: ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(00005326){t:'S';sz:0} arg(7fffffff) on some.img ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img ioctl 00005326 is CDROM_DRIVE_STATUS, ioctl 801c0204 is FDGETPRM. The warning appears because the Linux compat-ioctl handler for these ioctls only applies to block devices, while qemu also uses the ioctls on plain files. Signed-off-by: Johannes Stezenbach <js@sig21.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-01 22:32:26 +02:00
Denys Vlasenko	bb188d7e64	ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop When multithreaded program execs under ptrace, all traced threads report WIFEXITED status, except for thread group leader and the thread which execs. Unless tracer tracks thread group relationship between tracees, which is a nontrivial task, it will not detect that execed thread no longer exists. This patch allows tracer to figure out which thread performed this exec, by requesting PTRACE_GETEVENTMSG in PTRACE_EVENT_EXEC stop. Another, samller problem which is solved by this patch is that tracer now can figure out which of the several concurrent execs in multithreaded program succeeded. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-07-01 18:51:49 +02:00
Jeff Layton	ee1b3ea9e6	cifs: set socket send and receive timeouts before attempting connect Benjamin S. reported that he was unable to suspend his machine while it had a cifs share mounted. The freezer caused this to spew when he tried it: -----------------------[snip]------------------ PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 Call Trace: [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f [<ffffffff811e81c7>] ? release_sock+0x19/0xef [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffff810444a1>] ? kthread+0x7a/0x82 [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81044427>] ? kthread+0x0/0x82 [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 Restarting tasks ... done. -----------------------[snip]------------------ We do attempt to perform a try_to_freeze in cifs_reconnect, but the connection attempt itself seems to be taking longer than 20s to time out. The connect timeout is governed by the socket send and receive timeouts, so we can shorten that period by setting those timeouts before attempting the connect instead of after. Adam Williamson tested the patch and said that it seems to have fixed suspending on his laptop when a cifs share is mounted. Reported-by: Benjamin S <da_joind@gmx.net> Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-01 16:15:30 +00:00
Tejun Heo	85ef06d1d2	block: flush MEDIA_CHANGE from drivers on close(2) Currently, only open(2) is defined as the 'clearing' point. It has two roles - first, it's an acknowledgement from userland indicating that the event has been received and kernel can clear pending states and proceed to generate more events. Secondly, it's passed on to device drivers as a hint indicating that a synchronization point has been reached and it might want to take a deeper look at the device. The latter currently is only used by sr which uses two different mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY to discover events, where the former is lighter weight and safe to be used repeatedly but may not provide full coverage. Among other things, GET_EVENT can't detect media removal while TUR can. This patch makes close(2) - blkdev_put() - indicate clearing hint for MEDIA_CHANGE to drivers. disk_check_events() is renamed to disk_flush_events() and updated to take @mask for events to flush which is or'd to ev->clearing and will be passed to the driver on the next ->check_events() invocation. This change makes sr generate MEDIA_CHANGE when media is ejected from userland - e.g. with eject(1). Note: Given the current usage, it seems @clearing hint is needlessly complex. disk_clear_events() can simply clear all events and the hint can be boolean @flush. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-01 16:17:47 +02:00
Jens Axboe	04bf7869ca	Merge branch 'for-linus' into for-3.1/core Conflicts: block/blk-throttle.c block/cfq-iosched.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-01 16:17:13 +02:00
Masatake YAMATO	55b3286d3d	dlm: show addresses in configfs Display all addresses the dlm is using for the local node from the configfs file config/dlm/<cluster>/comms/<comm>/addr_list Also make the addr file write only. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-06-30 14:45:28 -05:00
Christoph Hellwig	c6d5f5fa65	hfsplus: lift the 2TB size limit Replace the hardcoded 2TB limit with a dynamic limit based on the block size now that we have fixed the few overflows preventing operation with large volumes. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:59 +02:00
Christoph Hellwig	4ba2d5fdcf	hfsplus: fix overflow in hfsplus_read_wrapper For partitions larger than 2TB or at such an offset the hfs wrapper code in hfsplus might overflow the range representable in a 32-bit data type. Make sure we use a sector_t for the arithmetics leading to it. I'm not sure this code can be readed at all as hfs itself never supported such large volumes. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:59 +02:00
Christoph Hellwig	bf1a1b31fa	hfsplus: fix overflow in hfsplus_get_block For filesystems larger than 2TB the final sector number passed to map_bh might overflow the range representable in a 32-bit data type. Make sure we use a sector_t for it and the arithmetics calculating it. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:58 +02:00
Anton Salikhmetov	2b4f9ca8a5	hfsplus: assignments inside `if' condition clean-up Make assignments outside `if' conditions for unicode.c module where the checkpatch.pl script reported this coding style error. Signed-off-by: Anton Salikhmetov <alexo@tuxera.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:58 +02:00
Alexey Khoroshilov	032016a56a	hfsplus: Fix double iput of the same inode in hfsplus_fill_super() There is a misprint in resource deallocation code on error path in hfsplus_fill_super(): the sbi->alloc_file inode is iput twice, while the root inode in not iput at all. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-06-30 13:38:39 +02:00
Seth Forshee	50176ddefa	hfsplus: add missing call to bio_put() hfsplus leaks bio objects by failing to call bio_put() on the bios it allocates. Add the missing call to fix the leak. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: <stable@kernel.org> # .38.x, .39.x Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-06-30 13:28:32 +02:00
Boaz Harrosh	2bea038c52	pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs In current pnfs tree, all the layouts set mds_offset in their .write_pagelist member. mds_offset is only used by generic layer and should be handled by it. This patch is for upstream. It is needed in this -rc series to fix a bug in objects layout_commit. I'll send patches for objects and blocks to be squashed into current pnfs tree. TODO: It looks like the read path needs the same patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-28 14:12:11 -04:00
Vasiliy Kulikov	1d1221f375	proc: restrict access to /proc/PID/io /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-28 09:39:11 -07:00
Jan Kara	08142579b6	mm: fix assertion mapping->nrpages == 0 in end_writeback() Under heavy memory and filesystem load, users observe the assertion mapping->nrpages == 0 in end_writeback() trigger. This can be caused by page reclaim reclaiming the last page from a mapping in the following race: CPU0 CPU1 ... shrink_page_list() __remove_mapping() __delete_from_page_cache() radix_tree_delete() evict_inode() truncate_inode_pages() truncate_inode_pages_range() pagevec_lookup() - finds nothing end_writeback() mapping->nrpages != 0 -> BUG page->mapping = NULL mapping->nrpages-- Fix the problem by doing a reliable check of mapping->nrpages under mapping->tree_lock in end_writeback(). Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out by Miklos Szeredi <mszeredi@suse.de>. Cc: Jay <jinshan.xiong@whamcloud.com> Cc: Miklos Szeredi <mszeredi@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-27 18:00:13 -07:00
Bob Liu	2b4b2482e7	romfs: fix romfs_get_unmapped_area() argument check romfs_get_unmapped_area() checks argument `len' without considering PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after commit `f67d9b1576` ("nommu: add page_align to mmap"). Fix the check by changing it in same way ramfs_nommu_get_unmapped_area() was changed in ramfs/file-nommu.c. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-27 18:00:12 -07:00
Tao Ma	a212d1a71d	jbd: Use WRITE_SYNC in journal checkpoint. In journal checkpoint, we write the buffer and wait for its finish. But in cfq, the async queue has a very low priority, and in our test, if there are too many sync queues and every queue is filled up with requests, and the process will hang waiting for the log space. So this patch tries to use WRITE_SYNC in __flush_batch so that the request will be moved into sync queue and handled by cfq timely. We also use the new plug, sot that all the WRITE_SYNC requests can be given as a whole when we unplug it. Reported-by: Robin Dong <sanbai@taobao.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-28 00:06:41 +02:00
Linus Torvalds	af4087e0e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes	2011-06-27 13:32:14 -07:00
Oleg Nesterov	087806b128	redefine thread_group_leader() as exit_signal >= 0 Change de_thread() to set old_leader->exit_signal = -1. This is good for the consistency, it is no longer the leader and all sub-threads have exit_signal = -1 set by copy_process(CLONE_THREAD). And this allows us to micro-optimize thread_group_leader(), it can simply check exit_signal >= 0. This also makes sense because we should move ->group_leader from task_struct to signal_struct. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org>	2011-06-27 20:30:10 +02:00
Linus Torvalds	4699d4423c	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent bogus assert when trying to remove non-existent attribute xfs: clear XFS_IDIRTY_RELEASE on truncate down xfs: reset inode per-lifetime state when recycling it	2011-06-27 09:01:29 -07:00
Miao Xie	2f7e33d432	btrfs: fix inconsonant inode information When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-27 11:34:27 -04:00
Jan Kara	bb189247f3	jbd: Fix oops in journal_remove_journal_head() journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 journal_commit_transaction() ... processing t_forget list __journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); journal_try_to_free_buffers() journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() journal_put_journal_head(jh) journal_remove_journal_head(bh); journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(), [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-27 11:44:37 +02:00
Linus Torvalds	258e43fdb0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN cifs: free blkcipher in smbhash	2011-06-26 19:40:31 -07:00
Linus Torvalds	804a007f54	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: cifs: propagate errors from cifs_get_root() to mount(2) cifs: tidy cifs_do_mount() up a bit cifs: more breakage on mount failures cifs: close sget() races cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() cifs: move cifs_umount() call into ->kill_sb() cifs: pull cifs_mount() call up sanitize cifs_umount() prototype cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() cifs: allocate mountdata earlier cifs: leak on mount if we share superblock cifs: don't pass superblock to cifs_mount() cifs: don't leak nls on mount failure cifs: double free on mount failure take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Steve French <smfrench@gmail.com>	2011-06-26 19:39:22 -07:00
Lukas Czerner	2c2ea9451f	ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs() We should return -EINVAL when the FITRIM parameters are not sane, but currently we are exiting silently if start is beyond the end of the file system. This commit fixes this so we return -EINVAL as other file systems do. Signed-off-by: Lukas Czerner <lczerner@redhat.com> CC: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:53 +02:00
H Hartley Sweeten	81fe8c62fe	ext3/ioctl.c: silence sparse warnings about different address spaces The 'from' argument for copy_from_user and the 'to' argument for copy_to_user should both be tagged as __user address space. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:53 +02:00
Jan Kara	ee3e77f180	ext3: Improve truncate error handling New truncate calling convention allows us to handle errors from ext3_block_truncate_page(). So reorganize the code so that ext3_block_truncate_page() is called before we change inode size. This also removes unnecessary block zeroing from error recovery after failed buffered writes (zeroing isn't needed because we could have never written non-zero data to disk). We have to be careful and keep zeroing in direct IO write error recovery because there we might have already overwritten end of the last file block. Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:52 +02:00
Jan Kara	ad95c5e9bc	ext3: Fix oops in ext3_try_to_allocate_with_rsv() Block allocation is called from two places: ext3_get_blocks_handle() and ext3_xattr_block_set(). These two callers are not necessarily synchronized because xattr code holds only xattr_sem and i_mutex, and ext3_get_blocks_handle() may hold only truncate_mutex when called from writepage() path. Block reservation code does not expect two concurrent allocations to happen to the same inode and thus assertions can be triggered or reservation structure corruption can occur. Fix the problem by taking truncate_mutex in xattr code to serialize allocations. CC: Sage Weil <sage@newdream.net> CC: stable@kernel.org Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:52 +02:00
Ding Dinghua	bd5c9e1854	jbd: fix a bug of leaking jh->b_jcount journal_get_create_access should drop jh->b_jcount in error handling path Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Jan Kara	05713082ab	jbd: remove dependency on __GFP_NOFAIL The callers of start_this_handle() (or better ext3_journal_start()) are not really prepared to handle allocation failures. Such failures can for example result in silent data loss when it happens in ext3_..._writepage(). OTOH __GFP_NOFAIL is going away so we just retry allocation in start_this_handle(). This loop is potentially dangerous because the oom killer cannot be invoked for GFP_NOFS allocation, so there is a potential for infinitely looping. But still this is better than silent data loss. Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Jan Kara	40680f2fa4	ext3: Convert ext3 to new truncate calling convention Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files could not be truncated during failed writes as we change the code. In fact the test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in upper layers in do_sys_[f]truncate(), may_write(), etc. Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Lukas Czerner	99cb1a318c	jbd: Add fixed tracepoints This commit adds fixed tracepoint for jbd. It has been based on fixed tracepoints for jbd2, however there are missing those for collecting statistics, since I think that it will require more intrusive patch so I should have its own commit, if someone decide that it is needed. Also there are new tracepoints in __journal_drop_transaction() and journal_update_superblock(). The list of jbd tracepoints: jbd_checkpoint jbd_start_commit jbd_commit_locking jbd_commit_flushing jbd_commit_logging jbd_drop_transaction jbd_end_commit jbd_do_submit_data jbd_cleanup_journal_tail jbd_update_superblock_end Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Lukas Czerner	785c4bcc0d	ext3: Add fixed tracepoints This commit adds fixed tracepoints to the ext3 code. It is based on ext4 tracepoints, however due to the differences of both file systems, there are some tracepoints missing (those for delaloc and for multi-block allocator) and there are some ext3 specific as well (for reservation windows). Here is a list: ext3_free_inode ext3_request_inode ext3_allocate_inode ext3_evict_inode ext3_drop_inode ext3_mark_inode_dirty ext3_write_begin ext3_ordered_write_end ext3_writeback_write_end ext3_journalled_write_end ext3_ordered_writepage ext3_writeback_writepage ext3_journalled_writepage ext3_readpage ext3_releasepage ext3_invalidatepage ext3_discard_blocks ext3_request_blocks ext3_allocate_blocks ext3_free_blocks ext3_sync_file_enter ext3_sync_file_exit ext3_sync_fs ext3_rsv_window_add ext3_discard_reservation ext3_alloc_new_reservation ext3_reserved ext3_forget ext3_read_block_bitmap ext3_direct_IO_enter ext3_direct_IO_exit ext3_unlink_enter ext3_unlink_exit ext3_truncate_enter ext3_truncate_exit ext3_get_blocks_enter ext3_get_blocks_exit ext3_load_inode Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Josef Bacik	9b90f51353	Btrfs: make sure to update total_bitmaps when freeing cache V3 A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-25 09:31:06 -04:00
Ilya Dryomov	e0f5406727	Btrfs: fix type mismatch in find_free_extent() data parameter should be u64 because a full-sized chunk flags field is passed instead of 0/1 for distinguishing data from metadata. All underlying functions expect u64. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-25 09:31:06 -04:00
Al Viro	9403c9c598	cifs: propagate errors from cifs_get_root() to mount(2) ... instead of just failing with -EINVAL Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:43 -04:00
Al Viro	5c4f1ad7c6	cifs: tidy cifs_do_mount() up a bit Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	fa18f1bdce	cifs: more breakage on mount failures if cifs_get_root() fails, we end up with ->mount() returning NULL, which is not what callers expect. Moreover, in case of superblock reuse we end up leaking a superblock reference... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	ee01a14d9d	cifs: close sget() races have ->s_fs_info set by the set() callback passed to sget() Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	d757d71bfc	cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() all callers of cifs_umount() proceed to do the same thing; pull it into cifs_umount() itself. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	98ab494dd1	cifs: move cifs_umount() call into ->kill_sb() instead of calling it manually in case if cifs_read_super() fails to set ->s_root, just call it from ->kill_sb(). cifs_put_super() is gone now and we have cifs_sb shutdown and destruction done after the superblock is gone from ->s_instances. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	97d1152ace	cifs: pull cifs_mount() call up ... to the point prior to sget(). Now we have cifs_sb set up early enough. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	2a9b99516c	sanitize cifs_umount() prototype a) superblock argument is unused b) it always returns 0 Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	2ced6f6935	cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() no need to wait until cifs_read_super() and we need it done by the time cifs_mount() will be called. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	5d3bc605ca	cifs: allocate mountdata earlier pull mountdata allocation up, so that it won't stand in the way when we lift cifs_mount() to location before sget(). Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	d687ca380f	cifs: leak on mount if we share superblock cifs_sb and nls end up leaked... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	2c6292ae4b	cifs: don't pass superblock to cifs_mount() To close sget() races we'll need to be able to set cifs_sb up before we get the superblock, so we'll want to be able to do cifs_mount() earlier. Fortunately, it's easy to do - setting ->s_maxbytes can be done in cifs_read_super(), ditto for ->s_time_gran and as for putting MS_POSIXACL into ->s_flags, we can mirror it in ->mnt_cifs_flags until cifs_read_super() is called. Kill unused 'devname' argument, while we are at it... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	ca171baaad	cifs: don't leak nls on mount failure if cifs_sb allocation fails, we still need to drop nls we'd stashed into volume_info - the one we would've copied to cifs_sb if we could allocate the latter. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	6d6861757d	cifs: double free on mount failure if we get to out_super with ->s_root already set (e.g. with cifs_get_root() failure), we'll end up with cifs_put_super() called and ->mountdata freed twice. We'll also get cifs_sb freed twice and cifs_sb->local_nls dropped twice. The problem is, we can get to out_super both with and without ->s_root, which makes ->put_super() a bad place for such work. Switch to ->kill_sb(), have all that work done there after kill_anon_super(). Unlike ->put_super(), ->kill_sb() is called by deactivate_locked_super() whether we have ->s_root or not. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	dd85446619	take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Jeff Layton	9b8e072a31	cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN This does not work properly with CIFS as current servers do not enable support for the FILE_OPEN_BY_FILE_ID on SMB NTCreateX and not all NFS clients handle ESTALE. For now, it just plain doesn't work. Mark it BROKEN to discourage distros from enabling it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-24 17:33:30 +00:00
Chris Mason	1973f0faeb	Btrfs: make sure to record the transid in new inodes When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-24 13:13:29 -04:00
Jeff Layton	e4fb0edb7c	cifs: free blkcipher in smbhash This is currently leaked in the rc == 0 case. Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-24 17:03:55 +00:00
Linus Torvalds	5220cc9382	Merge branch 'for-linus' of git://git.kernel.dk/linux-block * 'for-linus' of git://git.kernel.dk/linux-block: block: add REQ_SECURE to REQ_COMMON_MASK block: use the passed in @bdev when claiming if partno is zero block: Add __attribute__((format(printf...) and fix fallout block: make disk_block_events() properly wait for work cancellation block: remove non-syncing __disk_block_events() and fold it into disk_block_events() block: don't use non-syncing event blocking in disk_check_events() cfq-iosched: fix locking around ioc->ioc_data assignment	2011-06-24 08:42:35 -07:00
Linus Torvalds	143e859d05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix wsize negotiation to respect max buffer size and active signing (try #4) CIFS: Fix problem with 3.0-rc1 null user mount failure	2011-06-24 08:35:04 -07:00
Jesper Juhl	46e4edbf7e	Remove unneeded version.h includes from fs/ It was pointed out by 'make versioncheck' that some includes of linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and fs/omfs/file.c). This patch removes them. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-24 08:34:22 -07:00
Dave Chinner	4a33821236	xfs: prevent bogus assert when trying to remove non-existent attribute If the attribute fork on an inode is in btree format and has multiple levels (i.e node format rather than leaf format), then a lookup failure will trigger an assert failure in xfs_da_path_shift if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to indicate to the directory btree code that not finding an entry is not a fatal error. In the case of doing a lookup for a directory name removal, this is valid as a user cannot insert an arbitrary name to remove from the directory btree. However, in the case of the attribute tree, a user has direct control over the attribute name and can ask for any random name to be removed without any validation. In this case, fsstress is asking for a non-existent user.selinux attribute to be removed, and that is causing xfs_da_path_shift() to fall off the bottom of the tree where it asserts that a lookup failure is allowed. Because the flag is not set, we die a horrible death on a debug enable kernel. Prevent this assert from firing on attribute removes by adding the op_flag XFS_DA_OP_OKNOENT to atribute removal operations. Discovered when testing on a SELinux enabled system by fsstress in test 070 by trying to remove a non-existent user.selinux attribute. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:51 -05:00
Dave Chinner	df4368a146	xfs: clear XFS_IDIRTY_RELEASE on truncate down When an inode is truncated down, speculative preallocation is removed from the inode. This should also reset the state bits for controlling whether preallocation is subsequently removed when the file is next closed. The flag is not being cleared, so repeated operations on a file that first involve a truncate (e.g. multiple repeated dd invocations on a file) give different file layouts for the second and subsequent invocations. Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure that speculative delalloc is removed on files that have been truncated down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:46 -05:00
Dave Chinner	778e24bb6d	xfs: reset inode per-lifetime state when recycling it XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:31 -05:00
Jeff Layton	1190f6a067	cifs: fix wsize negotiation to respect max buffer size and active signing (try #4 ) Hopefully last version. Base signing check on CAP_UNIX instead of tcon->unix_ext, also clean up the comments a bit more. According to Hongwei Sun's blog posting here: http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx CAP_LARGE_WRITEX is ignored when signing is active. Also, the maximum size for a write without CAP_LARGE_WRITEX should be the maxBuf that the server sent in the NEGOTIATE request. Fix the wsize negotiation to take this into account. While we're at it, alter the other wsize definitions to use sizeof(WRITE_REQ) to allow for slightly larger amounts of data to potentially be written per request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-23 17:54:39 +00:00
Linus Torvalds	bccaeafd7c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: agstart field must be 64 bits JFS: Don't save agno in the inode jfs: Update agstart when resizing volume jfs: old_agsize should be 64 bits in jfs_extendfs	2011-06-22 21:49:07 -07:00
Pavel Shilovsky	446b23a758	CIFS: Fix problem with 3.0-rc1 null user mount failure Figured it out: it was broken by `b946845a9d` commit - "cifs: cifs_parse_mount_options: do not tokenize mount options in-place". So, as a quick fix I suggest to apply this patch. [PATCH] CIFS: Fix kfree() with constant string in a null user case Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-22 21:43:56 +00:00
Tejun Heo	06d984737b	ptrace: s/tracehook_tracer_task()/ptrace_parent()/ tracehook.h is on the way out. Rename tracehook_tracer_task() to ptrace_parent() and move it from tracehook.h to ptrace.h. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:29 +02:00
Tejun Heo	4b9d33e6d8	ptrace: kill clone/exec tracehooks At this point, tracehooks aren't useful to mainline kernel and mostly just add an extra layer of obfuscation. Although they have comments, without actual in-kernel users, it is difficult to tell what are their assumptions and they're actually trying to achieve. To mainline kernel, they just aren't worth keeping around. This patch kills the following clone and exec related tracehooks. tracehook_prepare_clone() tracehook_finish_clone() tracehook_report_clone() tracehook_report_clone_complete() tracehook_unsafe_exec() The changes are mostly trivial - logic is moved to the caller and comments are merged and adjusted appropriately. The only exception is in check_unsafe_exec() where LSM_UNSAFE_PTRACE* are OR'd to bprm->unsafe instead of setting it, which produces the same result as the field is always zero on entry. It also tests p->ptrace instead of (p->ptrace & PT_PTRACED) for consistency, which also gives the same result. This doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:29 +02:00
Tejun Heo	a288eecce5	ptrace: kill trivial tracehooks At this point, tracehooks aren't useful to mainline kernel and mostly just add an extra layer of obfuscation. Although they have comments, without actual in-kernel users, it is difficult to tell what are their assumptions and they're actually trying to achieve. To mainline kernel, they just aren't worth keeping around. This patch kills the following trivial tracehooks. * Ones testing whether task is ptraced. Replace with ->ptrace test. tracehook_expect_breakpoints() tracehook_consider_ignored_signal() tracehook_consider_fatal_signal() * ptrace_event() wrappers. Call directly. tracehook_report_exec() tracehook_report_exit() tracehook_report_vfork_done() * ptrace_release_task() wrapper. Call directly. tracehook_finish_release_task() * noop tracehook_prepare_release_task() tracehook_report_death() This doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:28 +02:00
Linus Torvalds	2992c4bd57	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix decode_secinfo_maxsz NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test NFSv4.1: Fix some issues with pnfs_generic_pg_test NFSv4.1: file layout must consider pg_bsize for coalescing pnfs-obj: No longer needed to take an extra ref at add_device SUNRPC: Ensure the RPC client only quits on fatal signals NFSv4: Fix a readdir regression nfs4.1: mark layout as bad on error path in _pnfs_return_layout nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path NFS: (d)printks should use %zd for ssize_t arguments NFSv4.1: fix break condition in pnfs_find_lseg nfs4.1: fix several problems with _pnfs_return_layout NFSv4.1: allow zero fh array in filelayout decode layout NFSv4.1: allow nfs_fhget to succeed with mounted on fileid NFSv4.1: Fix a refcounting issue in the pNFS device id cache NFSv4.1: deprecate headerpadsz in CREATE_SESSION NFS41: do not update isize if inode needs layoutcommit NLM: Don't hang forever on NLM unlock requests NFS: fix umount of pnfs filesystems	2011-06-21 18:20:55 -07:00
Linus Torvalds	890879cfa0	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2: Remove obsolete parameters in the comments for some jbd2 functions ext4: fixed tracepoints cleanup ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap ext4: Fix max file size and logical block counting of extent format file ext4: correct comments for ext4_free_blocks()	2011-06-21 10:22:35 -07:00
Bryan Schumaker	1650add235	NFS: Fix decode_secinfo_maxsz I initially did the calculation in bytes, and not words Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:07 -04:00
Trond Myklebust	19982ba856	NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test And document what is going on there... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:06 -04:00
Trond Myklebust	8f7d5efbef	NFSv4.1: Fix some issues with pnfs_generic_pg_test 1. If the intention is to coalesce requests 'prev' and 'req' then we have to ensure at least that we have a layout starting at req_offset(prev). 2. If we're only requesting a minimal layout of length desc->pg_count, we need to test the length actually returned by the server before we allow the coalescing to occur. 3. We need to deal correctly with (pgio->lseg == NULL) 4. Fixup the test guarding the pnfs_update_layout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:05 -04:00
Linus Torvalds	eda0841094	Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: nfsd4: fix break_lease flags on nfsd open nfsd: link returns nfserr_delay when breaking lease nfsd: v4 support requires CRYPTO nfsd: fix dependency of nfsd on auth_rpcgss	2011-06-20 20:10:52 -07:00
Linus Torvalds	3669820650	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper fix comment in generic_permission() kill obsolete comment for follow_down() proc_sys_permission() is OK in RCU mode reiserfs_permission() doesn't need to bail out in RCU mode proc_fd_permission() is doesn't need to bail out in RCU mode nilfs2_permission() doesn't need to bail out in RCU mode logfs doesn't need ->permission() at all coda_ioctl_permission() is safe in RCU mode cifs_permission() doesn't need to bail out in RCU mode bad_inode_permission() is safe from RCU mode ubifs: dereferencing an ERR_PTR in ubifs_mount()	2011-06-20 20:09:15 -07:00
Dave Kleikamp	ecc90462b4	jfs: agstart field must be 64 bits The previous patch added the agstart field to jfs_ip, but declared it a long. We need to make sure its 64 bits on every platform. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 17:53:24 -05:00
Benny Halevy	19345cb299	NFSv4.1: file layout must consider pg_bsize for coalescing Otherwise we end up overflowing the rpc buffer size on the receive end. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-20 16:12:26 -04:00
Linus Torvalds	90a800de0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: avoid delayed metadata items during commits btrfs: fix uninitialized return value btrfs: fix wrong reservation when doing delayed inode operations btrfs: Remove unused sysfs code btrfs: fix dereference of ERR_PTR value Btrfs: fix relocation races Btrfs: set no_trans_join after trying to expand the transaction Btrfs: protect the pending_snapshots list with trans_lock Btrfs: fix path leakage on subvol deletion Btrfs: drop the delalloc_bytes check in shrink_delalloc Btrfs: check the return value from set_anon_super	2011-06-20 08:58:53 -07:00
Dave Kleikamp	d31b53e3cd	JFS: Don't save agno in the inode Resizing the file system can result in an in-memory inode being remapped to a different aggregate group (AG). A cached AG number can cause problems when trying to free or allocate inodes. Instead, save the IAG's agstart address and calculate the agno when we need it. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:53:46 -05:00
Dave Kleikamp	28e0fa894c	jfs: Update agstart when resizing volume A comment indicates that the IAG's agstart does not need to be updated since it will always point to a block in the same aggregate group, but jfs_fsck isn't so forgiving and reports it as an error. I'm fixing this in jfsutils as well, so either a new kernel or new utilities will be sufficient to fix the problem. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:32:46 -05:00
Dave Kleikamp	206b6310fd	jfs: old_agsize should be 64 bits in jfs_extendfs Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:30:04 -05:00
Al Viro	8e833fd2e1	fix comment in generic_permission() CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if no exec bits are set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:56 -04:00
Al Viro	6291176bcd	kill obsolete comment for follow_down() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:49 -04:00
Al Viro	1aec7036d0	proc_sys_permission() is OK in RCU mode nothing blocking there, since all instances of sysctl ->permissions() method are non-blocking - both of them, that is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:25 -04:00
Al Viro	1d29b5a2ed	reiserfs_permission() doesn't need to bail out in RCU mode nothing blocking other than generic_permission() (and check_acl callback does bail out in RCU mode). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:21 -04:00
Al Viro	cf12791116	proc_fd_permission() is doesn't need to bail out in RCU mode nothing blocking except generic_permission() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:50 -04:00
Al Viro	730e908f35	nilfs2_permission() doesn't need to bail out in RCU mode Nothing blocking except for generic_permission(). Which will DTRT. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:33 -04:00
Al Viro	a63ab94d67	logfs doesn't need ->permission() at all ... and never did, what with its ->permission() being what we do by default when ->permission is NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:26 -04:00
Al Viro	6b419951f1	coda_ioctl_permission() is safe in RCU mode return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:19 -04:00
Al Viro	ec12781f19	cifs_permission() doesn't need to bail out in RCU mode nothing potentially blocking except generic_permission(), which will DTRT Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:07 -04:00
Al Viro	1712c20dae	bad_inode_permission() is safe from RCU mode return -EIO; is not a blocking operation, thank you very much. Nick, what the hell have you been smoking? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:00 -04:00
Dan Carpenter	185bf87393	ubifs: dereferencing an ERR_PTR in ubifs_mount() `d251ed271d` "ubifs: fix sget races" left out the goto from this error path so the static checkers complain that we're dereferencing "sb" when it's an ERR_PTR. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:42:34 -04:00
J. Bruce Fields	105f462210	nfsd4: fix break_lease flags on nfsd open Thanks to Casey Bodley for pointing out that on a read open we pass 0, instead of O_RDONLY, to break_lease, with the result that a read open is treated like a write open for the purposes of lease breaking! Reported-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-06-20 10:38:01 -04:00
Vitaliy Ivanov	e44ba033c5	treewide: remove duplicate includes Many stupid corrections of duplicated includes based on the output of scripts/checkincludes.pl. Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-06-20 16:08:19 +02:00
Boaz Harrosh	df18d127f4	pnfs-obj: No longer needed to take an extra ref at add_device Andy's last device_cache patches, already take an extra reference on the newly inserted device_id. So we can remove it from obj-io. Without this patch the device_ids are leaked. Andy's patches are not in Linus tree yet. So I'm not sure if they are scheduled for this Kernel or the next. This patch should be added as part of these. CC: Andy Adamson <andros@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-19 14:49:51 -04:00
Linus Torvalds	8816ead9d8	Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tools/perf: Fix static build of perf tool tracing: Fix regression in printk_formats file * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Make watchdog robust vs. interruption timerfd: Fix wakeup of processes when timer is cancelled on clock change * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, MAINTAINERS: Add x86 MCE people x86, efi: Do not reserve boot services regions within reserved areas	2011-06-19 09:00:18 -07:00
Linus Torvalds	c11760c6d8	isofs: fix bh leak in isofs_fill_super() error case In isofs_fill_super(), when an iso_primary_descriptor is found, it is kept in pri_bh. The error cases don't properly release it. Fix it. Reported-and-tested-by: 김원석 <stanley.will.kim@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-18 07:25:42 -07:00
Chris Mason	e999376f09	Btrfs: avoid delayed metadata items during commits Snapshot creation has two phases. One is the initial snapshot setup, and the second is done during commit, while nobody is allowed to modify the root we are snapshotting. The delayed metadata insertion code can break that rule, it does a delayed inode update on the inode of the parent of the snapshot, and delayed directory item insertion. This makes sure to run the pending delayed operations before we record the snapshot root, which avoids corruptions. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 16:38:47 -04:00
David Sterba	35a30d7ce5	btrfs: fix uninitialized return value When allocation fails in btrfs_read_fs_root_no_name, ret is not set although it is returned, holding a garbage value. Signed-off-by: David Sterba <dsterba@suse.cz> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
Miao Xie	19fd294957	btrfs: fix wrong reservation when doing delayed inode operations We have migrated the space for the delayed inode items from trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to global_block_rsv when we doing delayed inode operations, and the following Oops happened: [ 9792.654889] ------------[ cut here ]------------ [ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681 btrfs_alloc_free_block+0xca/0x27c [btrfs]() [ 9792.654899] Hardware name: To Be Filled By O.E.M. [ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 9792.654919] Pid: 2762, comm: rm Tainted: G W 2.6.39+ #1 [ 9792.654920] Call Trace: [ 9792.654922] [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b [ 9792.654925] [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c [ 9792.654933] [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs] [ 9792.654945] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.654953] [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs] [ 9792.654963] [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs] [ 9792.654970] [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs] [ 9792.654978] [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs] [ 9792.654986] [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs] [ 9792.654997] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.655022] [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs] [ 9792.655025] [<ffffffff8147afac>] ? _cond_resched+0xe/0x22 [ 9792.655027] [<ffffffff8147b892>] ? mutex_lock+0x29/0x50 [ 9792.655039] [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs] [ 9792.655051] [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs] [ 9792.655062] [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs] [ 9792.655064] [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a [ 9792.655075] [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs] [ 9792.655077] [<ffffffff81132bd6>] evict+0x71/0x111 [ 9792.655079] [<ffffffff81132de0>] iput+0x12a/0x132 [ 9792.655081] [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155 [ 9792.655083] [<ffffffff81127b83>] ? path_put+0x1f/0x23 [ 9792.655085] [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171 [ 9792.655087] [<ffffffff81128410>] ? putname+0x34/0x36 [ 9792.655090] [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b [ 9792.655092] [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b [ 9792.655093] ---[ end trace 02b696eb02b3f768 ]--- This patch fix it by setting the reservation of the transaction handle to the correct one. Reported-by: Josef Bacik <josef@redhat.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
Maarten Lankhorst	9fe6a50fb7	btrfs: Remove unused sysfs code Removes code no longer used. The sysfs file itself is kept, because the btrfs developers expressed interest in putting new entries to sysfs. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
David Sterba	3ed4498caf	btrfs: fix dereference of ERR_PTR value smatch reports: btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing possible ERR_PTR() Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:17 -04:00
Chris Mason	e038dca803	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:16:13 -04:00
Linus Torvalds	01eff85b09	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: make log devices with write back caches work xfs: fix ->mknod() return value on xfs_get_acl() failure	2011-06-17 10:37:41 -07:00
Chris Mason	7585717f30	Btrfs: fix relocation races The recent commit to get rid of our trans_mutex introduced some races with block group relocation. The problem is that relocation needs to do some record keeping about each root, and it was relying on the transaction mutex to coordinate things in subtle ways. This fix adds a mutex just for the relocation code and makes sure it doesn't have a big impact on normal operations. The race is really fixed in btrfs_record_root_in_trans, which is where we step back and wait for the relocation code to finish accounting setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 13:36:58 -04:00
David Howells	879669961b	KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring ____call_usermodehelper() now erases any credentials set by the subprocess_inf::init() function. The problem is that commit `17f60a7da1` ("capabilites: allow the application of capability limits to usermode helpers") creates and commits new credentials with prepare_kernel_cred() after the call to the init() function. This wipes all keyrings after umh_keys_init() is called. The best way to deal with this is to put the init() call just prior to the commit_creds() call, and pass the cred pointer to init(). That means that umh_keys_init() and suchlike can modify the credentials _before_ they are published and potentially in use by the rest of the system. This prevents request_key() from working as it is prevented from passing the session keyring it set up with the authorisation token to /sbin/request-key, and so the latter can't assume the authority to instantiate the key. This causes the in-kernel DNS resolver to fail with ENOKEY unconditionally. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-17 09:40:48 -07:00
Linus Torvalds	8b97b21e0f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd: proc: Fix Oops on stat of /proc/<zombie pid>/ns/net	2011-06-16 15:02:20 -07:00
Trond Myklebust	ee7b75fc4f	NFSv4: Fix a readdir regression Commit `7ebb9315` (NFS: use secinfo when crossing mountpoints) introduces a regression when decoding an NFSv4 readdir entry that sets the rdattr_error field. By treating the resulting value as if it is a decoding error, the current code may cause us to skip valid readdir entries. Reported-by: Andy Adamson <andros@netapp.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-16 13:24:47 -04:00
Linus Torvalds	8dac6bee32	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: AFS: Use i_generation not i_version for the vnode uniquifier AFS: Set s_id in the superblock to the volume name vfs: Fix data corruption after failed write in __block_write_begin() afs: afs_fill_page reads too much, or wrong data VFS: Fix vfsmount overput on simultaneous automount fix wrong iput on d_inode introduced by `e6bc45d65d` Delay struct net freeing while there's a sysfs instance refering to it afs: fix sget() races, close leak on umount ubifs: fix sget races ubifs: split allocation of ubifs_info into a separate function fix leak in proc_set_super()	2011-06-16 10:21:59 -07:00
Christoph Hellwig	a27a263bae	xfs: make log devices with write back caches work There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-16 10:52:39 -05:00
David Howells	d6e43f751f	AFS: Use i_generation not i_version for the vnode uniquifier Store the AFS vnode uniquifier in the i_generation field, not the i_version field of the inode struct. i_version can then be given the AFS data version number. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:48 -04:00
David Howells	2e41ae225f	AFS: Set s_id in the superblock to the volume name Set s_id in the superblock to the name of the AFS volume that this superblock corresponds to. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:47 -04:00
Jan Kara	f9f07b6c13	vfs: Fix data corruption after failed write in __block_write_begin() I've got a report of a file corruption from fsxlinux on ext3. The important operations to the page were: mapwrite to a hole partial write to the page read - found the page zeroed from the end of the normal write The culprit seems to be that if get_block() fails in __block_write_begin() (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page). Thus when we retry the write, the logic in __block_write_begin() thinks zeroing of the page is needed and overwrites old data. In fact, I don't see why we should ever need to zero the uptodate bit here - either the page was uptodate when we entered __block_write_begin() and it should stay so when we leave it, or it was not uptodate and noone had right to set it uptodate during __block_write_begin() so it remains !uptodate when we leave as well. So just remove clearing of the bit. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:46 -04:00
Anton Blanchard	5e7f23373b	afs: afs_fill_page reads too much, or wrong data afs_fill_page should read the page that is about to be written but the current implementation has a number of issues. If we aren't extending the file we always read PAGE_CACHE_SIZE at offset 0. If we are extending the file we try to read the entire file. Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset, clamped to i_size. While here, avoid calling afs_fill_page when we are doing a PAGE_CACHE_SIZE write. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:46 -04:00
Al Viro	8aef188452	VFS: Fix vfsmount overput on simultaneous automount [Kudos to dhowells for tracking that crap down] If two processes attempt to cause automounting on the same mountpoint at the same time, the vfsmount holding the mountpoint will be left with one too few references on it, causing a BUG when the kernel tries to clean up. The problem is that lock_mount() drops the caller's reference to the mountpoint's vfsmount in the case where it finds something already mounted on the mountpoint as it transits to the mounted filesystem and replaces path->mnt with the new mountpoint vfsmount. During a pathwalk, however, we don't take a reference on the vfsmount if it is the same as the one in the nameidata struct, but do_add_mount() doesn't know this. The fix is to make sure we have a ref on the vfsmount of the mountpoint before calling do_add_mount(). However, if lock_mount() doesn't transit, we're then left with an extra ref on the mountpoint vfsmount which needs releasing. We can handle that in follow_managed() by not making assumptions about what we can and what we cannot get from lookup_mnt() as the current code does. The callers of follow_managed() expect that reference to path->mnt will be grabbed iff path->mnt has been changed. follow_managed() and follow_automount() keep track of whether such reference has been grabbed and assume that it'll happen in those and only those cases that'll have us return with changed path->mnt. That assumption is almost correct - it breaks in case of racing automounts and in even harder to hit race between following a mountpoint and a couple of mount --move. The thing is, we don't need to make that assumption at all - after the end of loop in follow_manage() we can check if path->mnt has ended up unchanged and do mntput() if needed. The BUG can be reproduced with the following test program: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char **argv) { int pid, ws; struct stat buf; pid = fork(); stat(argv[1], &buf); if (pid > 0) wait(&ws); return 0; } and the following procedure: (1) Mount an NFS volume that on the server has something else mounted on a subdirectory. For instance, I can mount / from my server: mount warthog:/ /mnt -t nfs4 -r On the server /data has another filesystem mounted on it, so NFS will see a change in FSID as it walks down the path, and will mark /mnt/data as being a mountpoint. This will cause the automount code to be triggered. !!! Do not look inside the mounted fs at this point !!! (2) Run the above program on a file within the submount to generate two simultaneous automount requests: /tmp/forkstat /mnt/data/testfile (3) Unmount the automounted submount: umount /mnt/data (4) Unmount the original mount: umount /mnt At this point the kernel should throw a BUG with something like the following: BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12] Note that the bug appears on the root dentry of the original mount, not the mountpoint and not the submount because sys_umount() hasn't got to its final mntput_no_expire() yet, but this isn't so obvious from the call trace: [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs] [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs] [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b [<ffffffff81186261>] mntput_no_expire+0x18d/0x199 [<ffffffff811862a8>] mntput+0x3b/0x44 [<ffffffff81186d87>] release_mounts+0xa2/0xbf [<ffffffff811876af>] sys_umount+0x47a/0x4ba [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b as do_umount() is inlined. However, you can see release_mounts() in there. Note also that it may be necessary to have multiple CPU cores to be able to trigger this bug. Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:28:16 -04:00
Török Edwin	50338b889d	fix wrong iput on d_inode introduced by `e6bc45d65d` Git bisection shows that commit `e6bc45d65d` causes BUG_ONs under high I/O load: kernel BUG at fs/inode.c:1368! [ 2862.501007] Call Trace: [ 2862.501007] [<ffffffff811691d8>] d_kill+0xf8/0x140 [ 2862.501007] [<ffffffff81169c19>] dput+0xc9/0x190 [ 2862.501007] [<ffffffff8115577f>] fput+0x15f/0x210 [ 2862.501007] [<ffffffff81152171>] filp_close+0x61/0x90 [ 2862.501007] [<ffffffff81152251>] sys_close+0xb1/0x110 [ 2862.501007] [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b A reliable way to reproduce this bug is: Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk, and apt-get remove openjdk-6-jdk. The buggy part of the patch is this: struct inode inode = NULL; ..... - if (nd.last.name[nd.last.len]) - goto slashes; inode = dentry->d_inode; - if (inode) - ihold(inode); + if (nd.last.name[nd.last.len] \|\| !inode) + goto slashes; + ihold(inode) ... if (inode) iput(inode); / truncate the inode here */ If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken), and dentry->d_inode is non-NULL, then this code now does an additional iput on the inode, which is wrong. Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0. Reference: https://lkml.org/lkml/2011/6/15/50 Reported-by: Norbert Preining <preining@logic.at> Reported-by: Török Edwin <edwintorok@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:27:39 -04:00
Linus Torvalds	13fca640bb	Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP" This reverts commit `7f81c8890c`. It turns out that it's not actually a build-time check on x86-64 UML, which does some seriously crazy stuff with VM_STACK_FLAGS. The VM_STACK_FLAGS define depends on the arch-supplied VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have arch/um/sys-x86_64/shared/sysdep/vm-flags.h: #define VM_STACK_DEFAULT_FLAGS \ (test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags) #define VM_STACK_DEFAULT_FLAGS vm_stack_flags (yes, seriously: two different #define's for that thing, with the first one being inside an "#ifdef TIF_IA32") It's possible that it is UML that should just be fixed in this area, but for now let's just undo the (very small) optimization. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:53:52 -07:00
Michal Hocko	7f81c8890c	fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP Commit `a8bef8ff6e` ("mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS and VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time one, so BUILD_BUG_ON is more appropriate. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
Eric W. Biederman	793925334f	proc: Fix Oops on stat of /proc/<zombie pid>/ns/net Don't call iput with the inode half setup to be a namespace filedescriptor. Instead rearrange the code so that we don't initialize ei->ns_ops until after I ns_ops->get succeeds, preventing us from invoking ns_ops->put when ns_ops->get failed. Reported-by: Ingo Saitz <Ingo.Saitz@stud.uni-hannover.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-06-15 14:35:29 -07:00
Fred Isaman	9e2dfdb308	nfs4.1: mark layout as bad on error path in _pnfs_return_layout Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 14:36:33 -04:00

... 3 4 5 6 7 ...

23626 Commits