linux

Commit Graph

Author	SHA1	Message	Date
Ian Abbott	bb2b6d19ec	udf: fix udf_setsize() for file data in ICB If the new size is larger than the old size and the old file data was stored in the ICB (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) and the new size still fits in the ICB, skip the call to udf_extend_file() as it does not handle this i_alloc_type value (it calls BUG()). Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 00:21:58 +02:00
Linus Torvalds	15fc5deb1f	Merge branch 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs merge fix from Chris Mason: "This fixes a merge error in rc1. The calls to mnt_want_write should have been removed." * 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove mnt_want_write call in btrfs_mksubvol	2012-08-12 21:28:41 +03:00
Alexander Block	e00da2067b	Btrfs: remove mnt_want_write call in btrfs_mksubvol We got a recursive lock in mksubvol because the caller already held a lock. I think we got into this due to a merge error. Commit `a874a63` removed the mnt_want_write call from btrfs_mksubvol and added a replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid. Commit `e7848683` however tried to move all calls to mnt_want_write above i_mutex. So somewhere while merging this, it got mixed up. The solution is to remove the mnt_want_write call completely from mksubvol. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-08-09 11:01:54 -04:00
Fengguang Wu	647d1e4c52	block: move down direct IO plugging Move unplugging for direct I/O from around ->direct_IO() down to do_blockdev_direct_IO(). This implicitly adds plugging for direct writes. CC: Li Shaohua <shli@fusionio.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-09 15:23:09 +02:00
Alexey Khoroshilov	389d7b26d9	bio: Fix potential memory leak in bio_find_or_create_slab() Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-09 15:19:25 +02:00
Trond Myklebust	47fbf7976e	NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done Ever since commit `0a57cdac3f` (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>=3.5]	2012-08-08 16:03:13 -04:00
Zach Brown	fb6ccff667	fuse: verify all ioctl retry iov elements Commit `7572777eef` attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: <stable@vger.kernel.org> [2.6.37+]	2012-08-06 18:19:24 +02:00
Theodore Ts'o	7e731bc9a1	ext4: avoid kmemcheck complaint from reading uninitialized memory Commit `03179fe923` introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-08-05 23:28:16 -04:00
Theodore Ts'o	d796c52ef0	ext4: make sure the journal sb is written in ext4_clear_journal_err() After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-08-05 19:04:57 -04:00
Al Viro	fe7c80518e	missed mnt_drop_write() in do_dentry_open() This one ought to be __mnt_drop_write(), to match __mnt_want_write() in the beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:41 +04:00
Artem Bityutskiy	5c57f20b82	UBIFS: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from UBIFS comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:41 +04:00
Artem Bityutskiy	e76e0ec984	gfs2: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from gfs comments. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:40 +04:00
Artem Bityutskiy	166ac34b74	nilfs2: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ntfs. Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:38 +04:00
Artem Bityutskiy	50640bcc0a	hfs: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from hfs. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:38 +04:00
Artem Bityutskiy	0d5c3eba2e	vfs: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from vfs comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:37 +04:00
Artem Bityutskiy	12810ad708	jbd/jbd2: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from various jbd and jbd2. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:36 +04:00
Artem Bityutskiy	b257031408	btrfs: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from btrfs comments. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:35 +04:00
Artem Bityutskiy	34eaadaf22	btrfs: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from btrfs. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:35 +04:00
Artem Bityutskiy	f6463b0da6	ext4: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from ext4 comments. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:34 +04:00
Artem Bityutskiy	7652bdfcb5	ext4: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:33 +04:00
Artem Bityutskiy	d3009c6cff	ext3: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:32 +04:00
Artem Bityutskiy	f0cd2dbb6c	vfs: kill write_super and sync_supers Finally we can kill the 'sync_supers' kernel thread along with the '->write_super()' superblock operation because all the users are gone. Now every file-system is supposed to self-manage own superblock and its dirty state. The nice thing about killing this thread is that it improves power management. Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up every 5 seconds no matter what - even if there were no dirty superblocks and even if there were no file-systems using this service (e.g., btrfs and journalled ext4 do not need it). So it was wasting power most of the time. And because the thread was in the core of the kernel, all systems had to have it. So I am quite happy to make it go away. Interestingly, this thread is a left-over from the pdflush kernel thread which was a self-forking kernel thread responsible for all the write-back in old Linux kernels. It was turned into per-block device BDI threads, and 'sync_supers' was a left-over. Thus, R.I.P, pdflush as well. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 01:24:44 +04:00
Linus Torvalds	d42d1dabf3	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd Pull exofs update from Boaz Harrosh: "They are all mostly fixes, except the most important patch by Artem Bityutskiy which removes the use of s_dirt. After this patch s_dirt can be completely removed from the tree." * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Fix out-of-bounds access in _ios_obj() exofs: Use proper max_IO calculations from ore exofs: Fix __r4w_get_page when offset is beyond i_size exofs: stop using s_dirt exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page)	2012-08-03 13:24:07 -07:00
Boaz Harrosh	7de6e28417	pnfs-obj: Better IO pattern in case of unaligned offset Depending on layout and ARCH, ORE has some limits on max IO sizes which is communicated on (what else) ore_layout->max_io_length, which is always stripe aligned. This was considered as the pg_test boundary for splitting and starting a new IO. But in the case of a long IO where the start offset is not aligned what would happen is that both end of IO[N] and start of IO[N+1] would be unaligned, causing each IO boundary parity unit to be calculated and written twice. So what we do in this patch is split the very start of an unaligned IO, up to a stripe boundary, and then next IO's can continue fully aligned til the end. We might be sacrificing the case where the full unaligned IO would fit within a single max_io_length, but the sacrifice is well worth the elimination of double calculation and parity units IO. Actually the sacrificing is marginal and is almost unmeasurable. TODO: If we know the total expected linear segment that will be received, at pg_init, we could use that information in many places: 1. blocks-layout get_layout write segment size 2. Better mds-threshold 3. In above situation for a better clean split I will do this in future submission. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:42:51 -04:00
Peng Tao	f616638409	NFS41: add pg_layout_private to nfs_pageio_descriptor To allow layout driver to pass private information around pg_init/pg_doio. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:41:18 -04:00
Idan Kedar	21d1f58aed	pnfs: nfs4_proc_layoutget returns void since the only user of nfs4_proc_layoutget is send_layoutget, which ignores its return value, there is no reason to return any value. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:39:06 -04:00
Idan Kedar	8554116e17	pnfs: defer release of pages in layoutget we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:38:54 -04:00
Jeff Layton	3dd4765fce	nfs: tear down caches in nfs_init_writepagecache when allocation fails ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:36:07 -04:00
Linus Torvalds	c8924234bd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two ceph fixes from Sage Weil: "The first patch fixes up the old crufty open intent code to use the atomic_open stuff properly, and the second fixes a possible null deref and memory leak with the crypto keys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix crypto key null deref, memory leak ceph: simplify+fix atomic_open	2012-08-02 10:57:31 -07:00
Linus Torvalds	410fc4ce8a	- Fixes a bug when the lower filesystem mount options include 'acl', but the eCryptfs mount options do not - Cleanups in the messaging code - Better handling of empty files in the lower filesystem to improve usability. Failed file creations are now cleaned up and empty lower files are converted into eCryptfs during open(). - The write-through cache changes are being reverted due to bugs that are not easy to fix. Stability outweighs the performance enhancements here. - Improvement to the mount code to catch unsupported ciphers specified in the mount options -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCgAGBQJQGhVWAAoJENaSAD2qAscKjvcQAMgF66sl8KjwzFCgojh30a4u 1hCXltxUCLjXxjXxyygUHb/pH0xdvC+Ss3FTtsPXAjgYm2lXjjoOmVG8WAvwHHx1 2ZjDo+8fQ4XA8Rl9kYuvt/abF0IssNRK3csTWplR7lpoQ8AWbkpkag1I4WZhibey cgs/zECl8ACTJ5zQ+AyRGnrssq4jI7xZAKWLK0+KKk7S9yIRI7K/xdz1xK39jGK6 N09Dw3VWY/bMcMq77ZXBtyHdP7KR7wKUtCeQttmCvdf20Ocy6AXzr1FRKvUxionF Sf31tJim0u9OO8hmy8cjCyWEy9LHnXnSd/5vn+Qd9ok9GvuiYmKw07rbXi/gjhBX ai5PKtl05WiQgp80BybUYfIY1Hq71MsppNi6h9Zgiid5rEvWWvCBWBWP95G8DTmC 6TwLaCG0rh8uuZyeiVrs3xZQ2IG5Zmu0CX3XGyfsaLvqmQWhtT5ZQVMeMQEEBxyQ ur9SSU2O/nC8ceLB7fzGmZPTLZUWOuYQnd24NJNK+7j0P+Km7pqmDYpCdwmFpx3C CQ0gGaJGHeycvBF327bwxPmsPdO4fy+nmEL8vrEXPTyQ3ZVAPvxZK9t8Jpk4UFOl JSTWFiK0mvgE+5dX2kB0nzN7iD7hcMgmozht50qQP3OUkI1kkBrEHgHVO3KzgUIA +aRAljeLdelq44JBxjiB =4vDd -----END PGP SIGNATURE----- Merge tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: - Fixes a bug when the lower filesystem mount options include 'acl', but the eCryptfs mount options do not - Cleanups in the messaging code - Better handling of empty files in the lower filesystem to improve usability. Failed file creations are now cleaned up and empty lower files are converted into eCryptfs during open(). - The write-through cache changes are being reverted due to bugs that are not easy to fix. Stability outweighs the performance enhancements here. - Improvement to the mount code to catch unsupported ciphers specified in the mount options * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: check for eCryptfs cipher support at mount eCryptfs: Revert to a writethrough cache model eCryptfs: Initialize empty lower files when opening them eCryptfs: Unlink lower inode when ecryptfs_create() fails eCryptfs: Make all miscdev functions use daemon ptr in file private_data eCryptfs: Remove unused messaging declarations and function eCryptfs: Copy up POSIX ACL and read-only flags from lower mount	2012-08-02 10:56:34 -07:00
Linus Torvalds	630103ea2c	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS update from Steve French: "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs. I am holding up a few more days on merging the remainder of the SMB2/SMB2.1 enablement although it is nearing review completion, in order to address some review comments from Jeff Layton on a few of the subsequent SMB2 patches, and also to debug an unrelated cifs problem that Pavel discovered." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Add SMB2 support for rmdir CIFS: Move rmdir code to ops struct CIFS: Add SMB2 support for mkdir operation CIFS: Separate protocol specific part from mkdir CIFS: Simplify cifs_mkdir call	2012-08-02 10:54:11 -07:00
Sage Weil	5ef50c3bec	ceph: simplify+fix atomic_open The initial ->atomic_open op was carried over from the old intent code, which was incomplete and didn't really work. Replace it with a fresh method. In particular: * always attempt to do an atomic open+lookup, both for the create case and for lookups of existing files. * fix symlink handling by returning 1 to the VFS so that we can follow the link to its destination. This fixes a longstanding ceph bug (#2392). Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-02 09:11:19 -07:00
Boaz Harrosh	9e62bb4458	ore: Fix out-of-bounds access in _ios_obj() _ios_obj() is accessed by group_index not device_table index. The oc->comps array is only a group_full of devices at a time it is not like ore_comp_dev() which is indexed by a global device_table index. This did not BUG until now because exofs only uses a single COMP for all devices. But with other FSs like PanFS this is not true. This bug was only in the write_path, all other users were using it correctly [This is a bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 16:41:56 +03:00
Boaz Harrosh	be388f3d9a	exofs: Use proper max_IO calculations from ore exofs_max_io_pages should just use the ORE's calculated layout->max_io_length, And avoid unnecessary BUGs, calculations made here were also a layering violation. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 16:39:17 +03:00
Boaz Harrosh	4b74f6ea84	exofs: Fix __r4w_get_page when offset is beyond i_size It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global ZERO_PAGE, if offset is beyond i_size. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:58:22 +03:00
Artem Bityutskiy	66153f6e0f	exofs: stop using s_dirt Exofs has the '->write_super()' handler and makes some use of the '->s_dirt' superblock flag, but it really needs neither of them because it never sets 's_dirt' to one which means the VFS never calls its '->write_super()' handler. Thus, remove both. Note, I am trying to remove both 's_dirt' and 'write_super()' from VFS altogether once all users are gone. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:52:13 +03:00
Kautuk Consul	0e8d96dd2c	exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page) readpage_strip can be called from several code paths all of which require that the page be locked before any operations are carried out. Since we export the exofs_readpage callback to the VFS, add a BUG_ON to check for PageLocked(page) to make sure that this understanding is never compromised. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:52:12 +03:00
Jianpeng Ma	53362a05ae	fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices For regular file, write operaion used blk_plug function.But for block file,write operation did not use blk_plug. This patch is also for write-cache mode for block-device. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-02 09:50:39 +02:00
Linus Torvalds	a0e881b7c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is not dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...	2012-08-01 10:26:23 -07:00
J. Bruce Fields	068535f1fe	locks: remove unused lm_release_private In commit `3b6e2723f3` ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-01 09:01:46 -07:00
Linus Torvalds	ac694dbdbc	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...	2012-07-31 19:25:39 -07:00
Linus Torvalds	6dbb35b0a7	NFS client updates for Linux 3.6 Features include: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM cV9Ex8zh2Y4aiUXCy3uA =KSix -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO	2012-07-31 18:45:44 -07:00
Mel Gorman	192e501b04	nfs: prevent page allocator recursions with swap over NFS. GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO, just not of any filesystem data. The problem is that previously NOFS was correct because that avoids recursion into the NFS code. With swap-over-NFS, it is no longer correct as swap IO can lead to this recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:48 -07:00
Mel Gorman	a564b8f039	nfs: enable swap on NFS Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:48 -07:00
Mel Gorman	29418aa4bd	nfs: disable data cache revalidation for swapfiles The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
Mel Gorman	d56b4ddf77	nfs: teach the NFS client how to treat PG_swapcache pages Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
Minchan Kim	8e125cd855	vmscan: remove obsolete shrink_control comment `09f363c7` ("vmscan: fix shrinker callback bug in fs/super.c") fixed a shrinker callback which was returning -1 when nr_to_scan is zero, which caused excessive slab scanning. But `635697c6` ("vmscan: fix initial shrinker size handling") fixed the problem, again so we can freely return -1 although nr_to_scan is zero. So let's revert `09f363c7` because the comment added in `09f363c7` made an unnecessary rule. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:43 -07:00
Aneesh Kumar K.V	24669e5847	hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:40 -07:00
Wanpeng Li	3965c9ae47	mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads Since per-BDI flusher threads were introduced in 2.6, the pdflush mechanism is not used any more. But the old interface exported through /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless. For back-compatibility, printk warning information and return 2 to notify the users that the interface is removed. Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:40 -07:00
Linus Torvalds	08843b79fb	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...	2012-07-31 14:42:28 -07:00
Linus Torvalds	cc8362b1f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...	2012-07-31 14:35:28 -07:00
Jeff Layton	ad0fcd4eb6	nfs: explicitly reject LOCK_MAND flock() requests We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly return -EINVAL if someone requests it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-31 14:42:20 -04:00
NeilBrown	b042414feb	nfs: increase number of permitted callback connections. By default a sunrpc service is limited to (N+3)*20 connections where N is the number of threads. This is 80 when N==1. If this number is exceeded a warning is printed suggesting that the number of threads be increased. However with services which run a single thread, this is impossible. For such services there is a ->sv_maxconn setting that can be used to forcibly increase the limit, and silence the message. This is used by lockd. The nfs client uses a sunrpc service to handle callbacks and it too is single-threaded, so to avoid the useless messages, and to allow a reasonable number of concurrent connections, we need to set ->sv_maxconn. 1024 seems like a good number. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-31 12:33:29 -04:00
Jan Kara	d9c95bdd53	fs: Remove old freezing mechanism Now that all users are converted, we can remove functions, variables, and constants defined by the old freezing mechanism. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:54 +04:00
Jan Kara	1e8b212fe5	ext2: Implement freezing The only missing piece to make freezing work reliably with ext2 is to stop iput() of unlinked inode from deleting the inode on frozen filesystem. So add a necessary protection to ext2_evict_inode(). We also provide appropriate ->freeze_fs and ->unfreeze_fs functions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:53 +04:00
Jan Kara	b2b5ef5c8e	btrfs: Convert to new freezing mechanism We convert btrfs_file_aio_write() to use new freeze check. We also add proper freeze protection to btrfs_page_mkwrite(). We also add freeze protection to the transaction mechanism to avoid starting transactions on frozen filesystem. At minimum this is necessary to stop iput() of unlinked file to change frozen filesystem during truncation. Checks in cleaner_kthread() and transaction_kthread() can be safely removed since btrfs_freeze() will lock the mutexes and thus block the threads (and they shouldn't have anything to do anyway). CC: linux-btrfs@vger.kernel.org CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:52 +04:00
Jan Kara	2c22b337b5	nilfs2: Convert to new freezing mechanism We change nilfs_page_mkwrite() to provide proper freeze protection for writeable page faults (we must wait for frozen filesystem even if the page is fully mapped). We remove all vfs_check_frozen() checks since they are now handled by the generic code. CC: linux-nilfs@vger.kernel.org CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:52 +04:00
Jan Kara	fbf8fb7650	ntfs: Convert to new freezing mechanism Move check in ntfs_file_aio_write_nolock() to ntfs_file_aio_write() and use new freeze protection. CC: linux-ntfs-dev@lists.sourceforge.net CC: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:51 +04:00
Jan Kara	58ef6a75c3	fuse: Convert to new freezing mechanism Convert check in fuse_file_aio_write() to using new freeze protection. CC: fuse-devel@lists.sourceforge.net CC: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:50 +04:00
Jan Kara	39263d5e71	gfs2: Convert to new freezing mechanism We update gfs2_page_mkwrite() to use new freeze protection and the transaction code to use freeze protection while the transaction is running. That is needed to stop iput() of unlinked file from modifying the filesystem. The rest is handled by the generic code. CC: cluster-devel@redhat.com CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:50 +04:00
Jan Kara	fef6925cd4	ocfs2: Convert to new freezing mechanism Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze protection. We also protect several ioctl entry points which were missing the protection. Finally, we add freeze protection to the journaling mechanism so that iput() of unlinked inode cannot modify a frozen filesystem. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:49 +04:00
Jan Kara	d9457dc056	xfs: Convert to new freezing code Generic code now blocks all writers from standard write paths. So we add blocking of all writers coming from ioctl (we get a protection of ioctl against racing remount read-only as a bonus) and convert xfs_file_aio_write() to a non-racy freeze protection. We also keep freeze protection on transaction start to block internal filesystem writes such as removal of preallocated blocks. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> CC: xfs@oss.sgi.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:48 +04:00
Jan Kara	8e8ad8a57c	ext4: Convert to new freezing mechanism We remove most of frozen checks since upper layer takes care of blocking all writes. We have to handle protection in ext4_page_mkwrite() in a special way because we cannot use generic block_page_mkwrite(). Also we add a freeze protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify a frozen filesystem (we cannot easily instrument ext4_journal_start() / ext4_journal_stop() with freeze protection because we are missing the superblock pointer in ext4_journal_stop() in nojournal mode). CC: linux-ext4@vger.kernel.org CC: "Theodore Ts'o" <tytso@mit.edu> BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:48 +04:00
Jan Kara	14da920014	fs: Protect write paths by sb_start_write - sb_end_write There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:47 +04:00
Jan Kara	5d37e9e6de	fs: Skip atime update on frozen filesystem It is unexpected to block reading of frozen filesystem because of atime update. Also handling blocking on frozen filesystem because of atime update would make locking more complex than it already is. So just skip atime update when filesystem is frozen like we skip it when filesystem is remounted read-only. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:38 +04:00
Jan Kara	eb04c28288	fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:40:38 +04:00
Jan Kara	5accdf82ba	fs: Improve filesystem freezing handling vfs_check_frozen() tests are racy since the filesystem can be frozen just after the test is performed. Thus in write paths we can end up marking some pages or inodes dirty even though the file system is already frozen. This creates problems with flusher thread hanging on frozen filesystem. Another problem is that exclusion between ->page_mkwrite() and filesystem freezing has been handled by setting page dirty and then verifying s_frozen. This guaranteed that either the freezing code sees the faulted page, writes it, and writeprotects it again or we see s_frozen set and bail out of page fault. This works to protect from page being marked writeable while filesystem freezing is running but has an unpleasant artefact of leaving dirty (although unmodified and writeprotected) pages on frozen filesystem resulting in similar problems with flusher thread as the first problem. This patch aims at providing exclusion between write paths and filesystem freezing. We implement a writer-freeze read-write semaphore in the superblock. Actually, there are three such semaphores because of lock ranking reasons - one for page fault handlers (->page_mkwrite), one for all other writers, and one of internal filesystem purposes (used e.g. to track running transactions). Write paths which should block freezing (e.g. directory operations, ->aio_write(), ->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem takes the writer side. Only that we don't really want to bounce cachelines of the semaphores between CPUs for each write happening. So we implement the reader side of the semaphore as a per-cpu counter and the writer side is implemented using s_writers.frozen superblock field. [AV: microoptimize sb_start_write(); we want it fast in normal case] BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:30:13 +04:00
Linus Torvalds	2e3ee61348	Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQF0/wAAoJECvKgwp+S8JaszsP/16EO5F5mUCOFgncVRp+8R9U BxuKJ61j2R9ckHA+ngMEg72W5vJQds64cjywZnz6HMr0/+3tXUf4QBbU4/4sCeai 0lpK8MCKgp5KHHCxgO8zyoSaboankUgoDcbSmGJREV1WXoR8VWXsO9gXqiiH9XOe e8ADjds/YdxkQbOYDRgZKvLzwWS61K9Kwq5/56GASh2uflw7rkJZ38xqvGbo3YiQ IJJwOUYfJjFadIewYARQmkZZyWeAmtY0ADh15Z8pJt+iY4PgcDDlaWagwUH2Oaoi vhTFO4KnCjhSpc872et21g/jN/VrcQqzuUF/LUE9rW5irXeDZVCDrCQOuHQ+3Uo5 YuV3rpNABW/LU8AvtIwt9hUunaKUnrXaSluoL9LzH2VpH++JljQeR4yZ62Q5rpRs z4Bow25p7tlbcIJWzueMPdOFUr5s3P6XfQHoLVRWqN94eJ+Z1DPOgBlOain6cCUN oivPh4FxZfUscNmX8/7cHaTNEpGzJ0FzLPMNFnGQ2zwG0gk41fnMdWb19aVIE5GL /Q96TB24k/v+o1lbxGmyPi0L+Aq+NknvNm+p/YJHAAQrIpu5t2hPvka8/m7Nj7tu c3rM75RKiZkEI+3U6Ws1DhhQPtVcfNIVlNYQMGzlIndtK6T0ByueQ4eN5Z7ltjSE pS89rv2hyBw7yCaTU6ui =gZrM -----END PGP SIGNATURE----- Merge tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback updates from Wu Fengguang: "Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices." Fix up trivial conflict in comment in fs/sync.c * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix some comment errors block: Convert BDI proportion calculations to flexible proportions lib: Fix possible deadlock in flexible proportion code lib: Proportions with flexible period	2012-07-30 22:14:04 -07:00
Linus Torvalds	1fad1e9a74	NFS client updates for Linux 3.6 Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg 0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4 QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0 NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC 4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL YNVVGhvNxccMU5EU9YR4 =Lc59 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...	2012-07-30 19:16:57 -07:00
Alex Elder	aa711ee340	ceph: define snap counts as u32 everywhere There are two structures in which a count of snapshots are maintained: struct ceph_snap_context { ... u32 num_snaps; ... } and struct ceph_snap_realm { ... u32 num_prior_parent_snaps; /* had prior to parent_since */ ... u32 num_snaps; ... } These fields never take on negative values (e.g., to hold special meaning), and so are really inherently unsigned. Furthermore they take their value from over-the-wire or on-disk formatted 32-bit values. So change their definition to have type u32, and change some spots elsewhere in the code to account for this change. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:47 -07:00
Alan Cox	21ec6ffa46	ceph: fix potential double free We re-run the loop but we don't re-set the attrs pointer back to NULL. Signed-off-by: Alan Cox <alan@linux.intel.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:35 -07:00
Sage Weil	a53aab645c	ceph: close old con before reopening on mds reconnect When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:32 -07:00
Linus Torvalds	27c1ee3f92	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's first set of patches: "Non-MM patches: - lots of misc bits - tree-wide have_clk() cleanups - quite a lot of printk tweaks. I draw your attention to "printk: convert the format for KERN_<LEVEL> to a 2 byte pattern" which looks a bit scary. But afaict it's solid. - backlight updates - lib/ feature work (notably the addition and use of memweight()) - checkpatch updates - rtc updates - nilfs updates - fatfs updates (partial, still waiting for acks) - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc - new fault-injection feature work" * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) drivers/misc/lkdtm.c: fix missing allocation failure check lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() fault-injection: add tool to run command with failslab or fail_page_alloc fault-injection: add selftests for cpu and memory hotplug powerpc: pSeries reconfig notifier error injection module memory: memory notifier error injection module PM: PM notifier error injection module cpu: rewrite cpu-notifier-error-inject module fault-injection: notifier error injection c/r: fcntl: add F_GETOWNER_UIDS option resource: make sure requested range is included in the root range include/linux/aio.h: cpp->C conversions fs: cachefiles: add support for large files in filesystem caching pps: return PTR_ERR on error in device_create taskstats: check nla_reserve() return sysctl: suppress kmemleak messages ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ipc: compat: use signed size_t types for msgsnd and msgrcv ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC ipc: add COMPAT_SHMLBA support ...	2012-07-30 17:25:34 -07:00
Cyrill Gorcunov	1d151c337d	c/r: fcntl: add F_GETOWNER_UIDS option When we restore file descriptors we would like them to look exactly as they were at dumping time. With help of fcntl it's almost possible, the missing snippet is file owners UIDs. To be able to read their values the F_GETOWNER_UIDS is introduced. This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise returning -EINVAL. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:21 -07:00
Justin Lecher	98c350cda2	fs: cachefiles: add support for large files in filesystem caching Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:21 -07:00
Djalal Harouni	bc452b4b65	proc: do not allow negative offsets on /proc/<pid>/environ __mem_open() which is called by both /proc/<pid>/environ and /proc/<pid>/mem ->open() handlers will allow the use of negative offsets. /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ. Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open() to allow negative offsets only on /proc/<pid>/mem. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Djalal Harouni	e8905ec27e	proc: environ_read() make sure offset points to environment address range Currently the following offset and environment address range check in environ_read() of /proc/<pid>/environ is buggy: int this_len = mm->env_end - (mm->env_start + src); if (this_len <= 0) break; Large or negative offsets on /proc/<pid>/environ converted to 'unsigned long' may pass this check since '(mm->env_start + src)' can overflow and 'this_len' will be positive. This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since (mm->env_start + src) will point and read from another VMA. There are two fixes here plus some code cleaning: 1) Fix the overflow by checking if the offset that was converted to unsigned long will always point to the [mm->env_start, mm->env_end] address range. 2) Remove the truncation that was made to the result of the check, storing the result in 'int this_len' will alter its value and we can not depend on it. For kernels that have commit `b409e578d` ("proc: clean up /proc/<pid>/environ handling") which adds the appropriate ptrace check and saves the 'mm' at ->open() time, this is not a security issue. This patch is taken from the grsecurity patch since it was just made available. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Jovi Zhang	108ceeb020	coredump: fix wrong comments on core limits of pipe coredump case In commit `898b374af6` ("exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit"), the core limits recursive check value was changed from 0 to 1, but the corresponding comments were not updated. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Steven J. Magnani	deb8274a0c	fat: refactor shortname parsing Nearly identical shortname parsing is performed in fat_search_long() and __fat_readdir(). Extract this code into a function that may be called by both. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Steven J. Magnani	a943ed71c9	fat: accessors for msdos_dir_entry 'start' fields Simplify code by providing accessor functions for the directory entry start cluster fields. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Namjae Jeon	497d48bd27	hfsplus: use -ENOMEM when kzalloc() fails Use -ENOMEM return value instead of -EINVAL when kzalloc() fails. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vyacheslav Dubeyko	f5974c8f8c	nilfs2: add omitted comments for different structures in driver implementation Add omitted comments for different structures in driver implementation. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Ryusuke Konishi	572d8b3945	nilfs2: fix deadlock issue between chcp and thaw ioctls An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Ryusuke Konishi	fe0627e7b3	nilfs2: fix timing issue between rmcp and chcp ioctls The checkpoint deletion ioctl (rmcp ioctl) has potential for breaking snapshot because it is not fully exclusive with checkpoint mode change ioctl (chcp ioctl). The rmcp ioctl first tests if the specified checkpoint is a snapshot or not within nilfs_cpfile_delete_checkpoint function, and then calls nilfs_cpfile_delete_checkpoints function to actually invalidate the checkpoint only if it's not a snapshot. However, the checkpoint can be changed into a snapshot by the chcp ioctl between these two operations. In that case, calling nilfs_cpfile_delete_checkpoints() wrongly invalidates the snapshot, which leads to snapshot list corruption and snapshot count mismatch. This fixes the issue by changing nilfs_cpfile_delete_checkpoints() so that it reconfirms the target checkpoints are snapshot or not. This second check is exclusive with the chcp operation since it is protected by an existing semaphore. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Fernando Luis Vazquez Cao	278038ac53	nilfs2: remove references to long gone super operations ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove references to them in the NTFS code. Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vyacheslav Dubeyko	6b0f3393e3	nilfs2: add omitted comment for ns_mount_state field of the_nilfs structure Add omitted comment for ns_mount_state field of the_nilfs structure. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vladimir Serbinenko	6ed6a722f9	minixfs: fix block limit check On minix2 and minix3 usually max_size is 7fffffff and the check in question prohibits creation of last block spanning right before 7fffffff, due to downward rounding during the division. Fix it by using multiplication instead. [akpm@linux-foundation.org: fix up code layout, use local `sb'] Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Akinobu Mita	6017b485ca	ext4: use memweight() Convert ext4_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT4FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following change. - Remove unnecessary map == NULL check in ext4_count_free() which always takes non-null pointer as the memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	10d470849a	ext3: use memweight() Convert ext3_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT3FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext3_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT3FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	ecd0afa3ce	ext2: use memweight() Convert ext2_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT2FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext2_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT2FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	a75613ec73	ocfs2: use memweight() Use memweight to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	0121ad62c2	affs: use memweight() Use memweight() to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	9b58f6d4aa	qnx4fs: use memweight() Use memweight() to count the total number of bits clear in memory area. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Joe Perches	533574c6bc	btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout Use the generic printk_get_level() to search a message for a kern_level. Add __printf to verify format and arguments. Fix a few messages that had mismatches in format and arguments. Add #ifdef CONFIG_PRINTK blocks to shrink the object size a bit when not using printk. [akpm@linux-foundation.org: whitespace tweak] Signed-off-by: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:14 -07:00
Kees Cook	54b501992d	coredump: warn about unsafe suid_dumpable / core_pattern combo When suid_dumpable=2, detect unsafe core_pattern settings and warn when they are seen. Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Kees Cook	9520628e8c	fs: make dumpable=2 require fully qualified path When the suid_dumpable sysctl is set to "2", and there is no core dump pipe defined in the core_pattern sysctl, a local user can cause core files to be written to root-writable directories, potentially with user-controlled content. This means an admin can unknowningly reintroduce a variation of CVE-2006-2451, allowing local users to gain root privileges. $ cat /proc/sys/fs/suid_dumpable 2 $ cat /proc/sys/kernel/core_pattern core $ ulimit -c unlimited $ cd / $ ls -l core ls: cannot access core: No such file or directory $ touch core touch: cannot touch `core': Permission denied $ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 & $ pid=$! $ sleep 1 $ kill -SEGV $pid $ ls -l core -rw------- 1 root kees 458752 Jun 21 11:35 core $ sudo strings core \| grep evil OHAI=evil-string-here While cron has been fixed to abort reading a file when there is any parse error, there are still other sensitive directories that will read any file present and skip unparsable lines. Instead of introducing a suid_dumpable=3 mode and breaking all users of mode 2, this only disables the unsafe portion of mode 2 (writing to disk via relative path). Most users of mode 2 (e.g. Chrome OS) already use a core dump pipe handler, so this change will not break them. For the situations where a pipe handler is not defined but mode 2 is still active, crash dumps will only be written to fully qualified paths. If a relative path is defined (e.g. the default "core" pattern), dump attempts will trigger a printk yelling about the lack of a fully qualified path. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Sasha Levin	779302e678	fs/xattr.c:getxattr(): improve handling of allocation failures This allocation can be as large as 64k. - Add __GFP_NOWARN so the falied kmalloc() is silent - Fall back to vmalloc() if the kmalloc() failed Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Fernando Luis Vazquez Cao	32b4560b04	ntfs: remove references to long gone super operations and unimplemented methods ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove refereces to them in the NTFS code. Remove unnecessary comments about unimplemented methods while at it (suggested by Christoph Hellwig). Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Sage Weil	1fe60e51a3	libceph: move feature bits to separate header This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 16:23:22 -07:00
Bryan Schumaker	89d77c8fa8	NFS: Convert v4 into a module This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:52 -04:00
Bryan Schumaker	1c606fb74c	NFS: Convert v3 into a module This patch exports symbols and moves over the final structures needed by the v3 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set. The module (nfs3.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:46 -04:00
Bryan Schumaker	ddda8e0aa8	NFS: Convert v2 into a module The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:41 -04:00
Bryan Schumaker	fac1e8e4ef	NFS: Keep module parameters in the generic NFS client Otherwise we break backwards compatibility when v4 becomes a modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:31 -04:00
Bryan Schumaker	19d87ca362	NFS: Split out remaining NFS v4 inode functions Somehow I missed this in my previous patch series, but these functions are only needed by the v4 code and should be moved to a v4-only file. I wasn't exactly sure where I should put these functions, so I moved them into nfs4super.c where I could make them static. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:20 -04:00
Bryan Schumaker	6a74490dca	NFS: Pass super operations and xattr handlers in the nfs_subversion I can set all variables in the nfs_fill_super() function, allowing me to remove the nfs4_fill_super() function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:05 -04:00
Bryan Schumaker	1179acc6a3	NFS: Only initialize the ACL client in the v3 case v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to initialize the ACL client only when we are using v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:05:54 -04:00
Bryan Schumaker	ff9099f266	NFS: Create a try_mount rpc op I'm already looking up the nfs subversion in nfs_fs_mount(), so I have easy access to rpc_ops that used to be difficult to reach. This allows me to set up a different mount path for NFS v2/3 and NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:53 -04:00
Bryan Schumaker	e8f25e6d6d	NFS: Remove the NFS v4 xdev mount function I can now share this code with the v2 and v3 code by using the NFS subversion structure. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:45 -04:00
Bryan Schumaker	ab7017a3a0	NFS: Add version registering framework This patch adds in the code to track multiple versions of the NFS protocol. I created default structures for v2, v3 and v4 so that each version can continue to work while I convert them into kernel modules. I also removed the const parameter from the rpc_version array so that I can change it at runtime. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:17 -04:00
David Howells	a427b9ec4e	NFS: Fix a number of bugs in the idmapper Fix a number of bugs in the NFS idmapper code: (1) Only registered key types can be passed to the core keys code, so register the legacy idmapper key type. This is a requirement because the unregister function cleans up keys belonging to that key type so that there aren't dangling pointers to the module left behind - including the key->type pointer. (2) Rename the legacy key type. You can't have two key types with the same name, and (1) would otherwise require that. (3) complete_request_key() must be called in the error path of nfs_idmap_legacy_upcall(). (4) There is one idmap struct for each nfs_client struct. This means that idmap->idmap_key_cons is shared without the use of a lock. This is a problem because key_instantiate_and_link() - as called indirectly by idmap_pipe_downcall() - releases anyone waiting for the key to be instantiated. What happens is that idmap_pipe_downcall() running in the rpc.idmapd thread, releases the NFS filesystem in whatever thread that is running in to continue. This may then make another idmapper call, overwriting idmap_key_cons before idmap_pipe_downcall() gets the chance to call complete_request_key(). I think that reading idmap_key_cons only once, before key_instantiate_and_link() is called, and then caching the result in a variable is sufficient. Bug (4) is the cause of: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP CPU 1 Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246 RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80 RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50 RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0 R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80 R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006 FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710) Stack: ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8 Call Trace: [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc] [<ffffffff8117f833>] vfs_write+0xb3/0x180 [<ffffffff8117fb5a>] sys_write+0x4a/0x90 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8801a3645d40> CR2: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.4]	2012-07-30 18:57:39 -04:00
Jeff Layton	5cf02d09b5	nfs: skip commit in releasepage if we're freeing memory for fs-related reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-07-30 18:55:59 -04:00
Peng Tao	159e0561e3	pnfsblock: bail out partial page IO Current block layout driver read/write code assumes page aligned IO in many places. Add a checker to validate the assumption. Otherwise there would be data corruption like when application does open(O_WRONLY) and page unaliged write. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:52:06 -04:00
Jeff Layton	f44106e217	nfs: fix fl_type tests in NFSv4 code fl_type is not a bitmap. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:09:13 -04:00
Fred Isaman	c95908e4c5	NFS: fix pnfs regression with directio writes Commit `57208fa7e5` "NFS: Create an write_pageio_init() function" did not modify the calls in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:09:06 -04:00
Fred Isaman	59948db3be	NFS: fix pnfs regression with directio reads Commit `1abb50886a` "NFS: Create an read_pageio_init() function" did not modify the call in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:08:53 -04:00
Randy Dunlap	0add3e8567	nfs: fix stub return type warnings Fix numerous repeated warnings by making the stub function void instead of non-void: fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl': fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 17:30:24 -04:00
Jan Kara	4a55c1017b	nfsd: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: linux-nfs@vger.kernel.org CC: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:51 +04:00
Jan Kara	e7848683ae	btrfs: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:51 +04:00
Jan Kara	e24f17da35	fat: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call outside of i_mutex as in other places. CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:50 +04:00
Jan Kara	c30dabfe5d	fs: Push mnt_want_write() outside of i_mutex Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes without it. This isn't really a problem because mnt_want_write() is a non-blocking operation (essentially has a trylock semantics) but when the function starts to handle also frozen filesystems, it will get a full lock semantics and thus proper lock ordering has to be established. So move all mnt_want_write() calls outside of i_mutex. One non-trivial case needing conversion is kern_path_create() / user_path_create() which didn't include mnt_want_write() but now needs to because it acquires i_mutex. Because there are virtual file systems which don't bother with freeze / remount-ro protection we actually provide both versions of the function - one which calls mnt_want_write() and one which does not. [AV: scratch the previous, mnt_want_write() has been moved to kern_path_create() by now] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:49 +04:00
Jan Kara	14ae417c6f	sysfs: Push file_update_time() into bin_page_mkwrite() CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:47 +04:00
Jan Kara	a63e9b2e76	gfs2: Push file_update_time() into gfs2_page_mkwrite() CC: Steven Whitehouse <swhiteho@redhat.com> CC: cluster-devel@redhat.com Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:46 +04:00
Jan Kara	120c2bcad8	9p: Push file_update_time() into v9fs_vm_page_mkwrite() CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:46 +04:00
Jan Kara	3ca9c3bd8a	ceph: Push file_update_time() into ceph_page_mkwrite() CC: Sage Weil <sage@newdream.net> CC: ceph-devel@vger.kernel.org Acked-by: Sage Weil <sage@newdream.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:45 +04:00
Jan Kara	5e8830dc85	fs: Push file_update_time() into __block_page_mkwrite() Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:44 +04:00
Al Viro	64894cf843	simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early The write ref to vfsmount taken in lookup_open()/atomic_open() is going to be dropped; we take the one to stay in dentry_open(). Just grab the temporary in caller if it looks like we are going to need it (create/truncate/writable open) and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write() inside, check that flag and treat "false" as "mnt_want_write() has just failed". mnt_want_write() is cheap and the things get considerably simpler and more robust that way - we get it and drop it in the same function, to start with, rather than passing a "has something in the guts of really scary functions taken it" back to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 00:53:35 +04:00
Linus Torvalds	37cd9600a9	xfs: update for 3.6-rc1 Numerous cleanups and several bug fixes. Here are some highlights: * Discontiguous directory buffer support * Inode allocator refactoring * Removal of the IO lock in inode reclaim * Implementation of .update_time * Fix for handling of EOF in xfs_vm_writepage * Fix for races in xfsaild, and idle mode is re-enabled * Fix for a crash in xfs_buf completion handlers on unmount. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQFtCvAAoJENaLyazVq6ZOIuEQAINJXb4SK9oBrdwGmq+Vsqf2 Eh4OmzZmdnSPrxfFGmqvyL9DdUBvGBuidwOcVLMAXGtzbxE9USK9NuKC5zN/hJip 8tIyv/8bqZ0aD4RJlHGN5zKFoQh/9Tag+JsaaqWstO8Ir1tA/5p04hDAz492btfT 49SvnV64sJ1fi7pmaJblMWMMtlWJjD6iOldaHwnKBQ3LKmcgy9sD9DY5HiGOTr1j ecKtucX7B8Q9oFLKHaKEwTYZRRYDNuTbqZmI6hlEcA5hT280jotsGA4q/aXx/gHS lZuBaqVtNFT5WCKm+j/et76tmTfIh0CSbo64ZfgSOESy2BkEVXHg5XJ1gDvPdV+L 6eBlUx3jaiNyFVHxVzFhzwKC/XdaITCd/ixFEogRDmoppDXencTCibLJXHNXxupN BCAyTLCxEJIE9WCeOMmwHA0450bMY4or13NGep57pIvG8GomtdG1WncTRIo84KV5 0W5ocaUTGP7ROsr+KF8U9C7H866OHzVFijA+vvcTy8GtsT/xOCFxuJrqPVb+kgD7 mIKaoK7iH6Kufu433TzsLEcUkF36gq/7NytPKjQhURLpZhxkHG3rq6LC0HXp6uuZ QgX5Y5Gl7SwDovIrndXmQXRnGrzvqHLguZl65+rB1CKggjemkLSdSLhryoNVjLU2 iB7/hvzOUdYFMRRz2mLc =2wkC -----END PGP SIGNATURE----- Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs Pull xfs update from Ben Myers: "Numerous cleanups and several bug fixes. Here are some highlights: - Discontiguous directory buffer support - Inode allocator refactoring - Removal of the IO lock in inode reclaim - Implementation of .update_time - Fix for handling of EOF in xfs_vm_writepage - Fix for races in xfsaild, and idle mode is re-enabled - Fix for a crash in xfs_buf completion handlers on unmount." Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h} due to duplicate patches that had already been merged for 3.5. * tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits) xfs: wait for the write the superblock on unmount xfs: re-enable xfsaild idle mode and fix associated races xfs: remove iolock lock classes xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes xfs: do not take the iolock in xfs_inactive xfs: remove xfs_inactive_attrs xfs: clean up xfs_inactive xfs: do not read the AGI buffer in xfs_dialloc until nessecary xfs: refactor xfs_ialloc_ag_select xfs: add a short cut to xfs_dialloc for the non-NULL agbp case xfs: remove the alloc_done argument to xfs_dialloc xfs: split xfs_dialloc xfs: remove xfs_ialloc_find_free Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. xfs: remove xfs_inotobp xfs: merge xfs_itobp into xfs_imap_to_bp xfs: handle EOF correctly in xfs_vm_writepage xfs: implement ->update_time xfs: fix comment typo of struct xfs_da_blkinfo. xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks ...	2012-07-30 13:37:53 -07:00
Sage Weil	8842b3be96	ceph: clean up useless d_parent checks d_parent is never NULL, and IS_ROOT() is the proper way to check for a (non-self-referential) parent. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:54 -07:00
Al Viro	f8310c5920	fix O_EXCL handling for devices O_EXCL without O_CREAT has different semantics; it's "fail if already opened", not "fail if already exists". commit `71574865` broke that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-30 11:50:30 +04:00
Mark Tinguely	9a57fa8ee7	xfs: wait for the write the superblock on unmount v2: Add the xfs_buf_lock to xfs_quiesce_attr(). Add explaination why xfs_buf_lock() is used to wait for write. xfs_wait_buftarg() does not wait for the completion of the write of the uncached superblock. This write can race with the shutdown of the log and causes a panic if the write does not win the race. During the log write, xfsaild_push() will lock the buffer and set the XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait for the write to complete. The buffer's lock is held until the write is complete, so we can block on a xfs_buf_lock() request to be notified that the write is complete. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:34:19 -05:00
Brian Foster	8375f922aa	xfs: re-enable xfsaild idle mode and fix associated races xfsaild idle mode logic currently leads to a couple hangs: 1.) If xfsaild is rescheduled in during an incremental scan (i.e., tout != 0) and the target has been updated since the previous run, we can hit the new target and go into idle mode with a still populated ail. 2.) A wake up is only issued when the target is pushed forward. The wake up can race with xfsaild if it is currently in the process of entering idle mode, causing future wake up events to be lost. These hangs have been reproduced and verified as fixed by running xfstests 273 in a loop on a slightly modified upstream kernel. The kernel is modified to re-enable idle mode as previously implemented (when count == 0) and with a revert of commit `670ce93f`, which includes performance improvements that make this harder to reproduce. The solution, the algorithm for which has been outlined by Dave Chinner, is to modify xfsaild to enter idle mode only when the ail is empty and the push target has not been moved forward since the last push. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:27:57 -05:00
Christoph Hellwig	4f59af758f	xfs: remove iolock lock classes Content-Disposition: inline; filename=xfs-remove-iolock-classes Now that we never take the iolock during inode reclaim we don't need to play games with lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:23:51 -05:00
Christoph Hellwig	5a15322da1	xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes Same rational as the last patch - these inodes are not reachable, so don't bother with locking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:22:20 -05:00
Christoph Hellwig	0b56185b0d	xfs: do not take the iolock in xfs_inactive An inode that enters xfs_inactive has been removed from all global lists but the inode hash, and can't be recycled in xfs_iget before it has been marked reclaimable. Thus taking the iolock in here is not nessecary at all, and given the amount of lockdep false positives it has triggered already I'd rather remove the locking. The only change outside of xfs_inactive is relaxing an assert in xfs_itruncate_extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:16:49 -05:00
Christoph Hellwig	fe67be036f	xfs: remove xfs_inactive_attrs Remove this helper as the code flow is a lot more obvious when it gets merged into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:15:33 -05:00
Christoph Hellwig	b373e98daa	xfs: clean up xfs_inactive The code to reserve log space and join the inode to the transaction is common for all cases, so don't duplicate it. Also remove the trivial xfs_inactive_symlink_local helper which can simply be opencode now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:13:09 -05:00
Christoph Hellwig	be60fe54b2	xfs: do not read the AGI buffer in xfs_dialloc until nessecary Refactor the AG selection loop in xfs_dialloc to operate on the in-memory perag data as much as possible. We only read the AGI buffer once we have selected an AG to allocate inodes now instead of for every AG considered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:10:54 -05:00
Christoph Hellwig	55d6af64cb	xfs: refactor xfs_ialloc_ag_select Loop over the in-core perag structures and prefer using pagi_freecount over going out to the AGI buffer where possible. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:08:13 -05:00
Christoph Hellwig	4bb61069d2	xfs: add a short cut to xfs_dialloc for the non-NULL agbp case In this case we already have selected an AG and know it has free space beause the buffer lock never got released. Jump directly into xfs_dialloc_ag and short cut the AG selection loop. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:03:23 -05:00
Christoph Hellwig	08358906ed	xfs: remove the alloc_done argument to xfs_dialloc We can simplify check the IO_agbp pointer for being non-NULL instead of passing another argument through two layers of function calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:00:31 -05:00
Christoph Hellwig	f2ecc5e453	xfs: split xfs_dialloc Move the actual allocation once we have selected an allocation group into a separate helper, and make xfs_dialloc a wrapper around it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 15:56:49 -05:00
Al Viro	bf8848918d	lockd: handle lockowner allocation failure in nlmclnt_proc() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 23:17:39 +04:00
Al Viro	446945ab9a	lockd: shift grabbing a reference to nlm_host into nlm_alloc_call() It's used both for client and server hosts; we can't do nlmclnt_release_host() on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON() for calling the wrong one. Makes life simpler for callers, actually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 23:09:57 +04:00
Kees Cook	a51d9eaa41	fs: add link restriction audit reporting Adds audit messages for unexpected link restriction violations so that system owners will have some sort of potentially actionable information about misbehaving processes. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:43:08 +04:00
Kees Cook	800179c9b8	fs: add link restrictions This adds symlink and hardlink restrictions to the Linux VFS. Symlinks: A long-standing class of security issues is the symlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given symlink (i.e. a root process follows a symlink belonging to another user). For a likely incomplete list of hundreds of examples across the years, please see: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp The solution is to permit symlinks to only be followed when outside a sticky world-writable directory, or when the uid of the symlink and follower match, or when the directory owner matches the symlink's owner. Some pointers to the history of earlier discussion that I could find: 1996 Aug, Zygo Blaxell http://marc.info/?l=bugtraq&m=87602167419830&w=2 1996 Oct, Andrew Tridgell http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html 1997 Dec, Albert D Cahalan http://lkml.org/lkml/1997/12/16/4 2005 Feb, Lorenzo Hernández García-Hierro http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html 2010 May, Kees Cook https://lkml.org/lkml/2010/5/30/144 Past objections and rebuttals could be summarized as: - Violates POSIX. - POSIX didn't consider this situation and it's not useful to follow a broken specification at the cost of security. - Might break unknown applications that use this feature. - Applications that break because of the change are easy to spot and fix. Applications that are vulnerable to symlink ToCToU by not having the change aren't. Additionally, no applications have yet been found that rely on this behavior. - Applications should just use mkstemp() or O_CREATE\|O_EXCL. - True, but applications are not perfect, and new software is written all the time that makes these mistakes; blocking this flaw at the kernel is a single solution to the entire class of vulnerability. - This should live in the core VFS. - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135) - This should live in an LSM. - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188) Hardlinks: On systems that have user-writable directories on the same partition as system files, a long-standing class of security issues is the hardlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given hardlink (i.e. a root process follows a hardlink created by another user). Additionally, an issue exists where users can "pin" a potentially vulnerable setuid/setgid file so that an administrator will not actually upgrade a system fully. The solution is to permit hardlinks to only be created when the user is already the existing file's owner, or if they already have read/write access to the existing file. Many Linux users are surprised when they learn they can link to files they have no access to, so this change appears to follow the doctrine of "least surprise". Additionally, this change does not violate POSIX, which states "the implementation may require that the calling process has permission to access the existing file"[1]. This change is known to break some implementations of the "at" daemon, though the version used by Fedora and Ubuntu has been fixed[2] for a while. Otherwise, the change has been undisruptive while in use in Ubuntu for the last 1.5 years. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 This patch is based on the patches in Openwall and grsecurity, along with suggestions from Al Viro. I have added a sysctl to enable the protected behavior, and documentation. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:37:58 +04:00
Jeff Layton	3134f37e93	vfs: don't let do_last pass negative dentry to audit_inode I can reliably reproduce the following panic by simply setting an audit rule on a recent 3.5.0+ kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 PGD 7acd9067 PUD 7b8fb067 PMD 0 Oops: 0000 [#86] SMP Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] CPU 0 Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs RIP: 0010:[<ffffffff810d1250>] [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800 FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530) Stack: ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358 ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968 ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000 Call Trace: [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160 [<ffffffff810d4968>] __audit_inode+0x118/0x2d0 [<ffffffff811955a9>] do_last+0x999/0xe80 [<ffffffff81191fe8>] ? inode_permission+0x18/0x50 [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130 [<ffffffff81195b4a>] path_openat+0xba/0x420 [<ffffffff81196111>] do_filp_open+0x41/0xa0 [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120 [<ffffffff811855cd>] do_sys_open+0xed/0x1c0 [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300 [<ffffffff811856c1>] sys_open+0x21/0x30 [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b RSP <ffff88007aebfc38> CR2: 0000000000000040 The problem is that do_last is passing a negative dentry to audit_inode. The comments on lookup_open note that it can pass back a negative dentry if O_CREAT is not set. This patch fixes the oops, but I'm not clear on whether there's a better approach. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:27:03 +04:00
Al Viro	e4fad8e5d2	consolidate pipe file creation Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:19 +04:00
Al Viro	b5bcdda327	take grabbing f->f_path to do_dentry_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:18 +04:00
Al Viro	5c33b183a3	uninline file_free_rcu() What inline? Its only use is passing its address to call_rcu(), for fuck sake! Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:17 +04:00
Al Viro	0b1d90119a	ecryptfs_lookup_interpose(): allocate dentry_info first less work on failure that way Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:17 +04:00
Al Viro	bc65a1215e	sanitize ecryptfs_lookup() * ->lookup() never gets hit with . or .. * dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's no need to d_drop() the sucker. * wrong name printed in one of the printks (NULL, in fact) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:16 +04:00
Al Viro	a8104a9fcd	pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. One side effect - attempt to create a cross-device link on a read-only fs fails with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:15 +04:00
Al Viro	8e4bfca1d1	mknod: take sanity checks on mode into the very beginning Note that applying umask can't affect their results. While that affects errno in cases like mknod("/no_such_directory/a", 030000) yielding -EINVAL (due to impossible mode_t) instead of -ENOENT (due to inexistent directory), IMO that makes a lot more sense, POSIX allows to return either and any software that relies on getting -ENOENT instead of -EINVAL in that case deserves everything it gets. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:14 +04:00
Al Viro	921a1650de	new helper: done_path_create() releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:13 +04:00
Linus Torvalds	173f865474	The usual collection of bug fixes and optimizations. Perhaps of greatest note is a speed up for parallel, non-allocating DIO writes, since we no longer take the i_mutex lock in that case. For bug fixes, we fix an incorrect overhead calculation which caused slightly incorrect results for df(1) and statfs(2). We also fixed bugs in the metadata checksum feature. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQE1SzAAoJENNvdpvBGATwzOMQAKxVkaTqkMc+cUahLLUkFdGQ xnmigHufVuOvv+B8l1p6vbYx+qOftztqXF1WC41j8mTfEDs19jKXv3LI57TJ+bIW a/Kp1CjMicRs/HGhFPNWp1D+N8J95MTFP6Y9biXSmBBvefg2NSwxpg48yZtjUy1/ Zl0414AqMvTJyqKKOUa++oyl3XlawnzDZ+6a7QPKsrAaDOU5977W4y2tZkNFk84d qRjTfaiX13aVe7XupgQHe/jtk40Sj38pyGVGiAGlHOZhZtYKE6MzB8OreGiMTy8z rmJz/CQUsQ8+sbKYhAcDru+bMrKghbO0u78XRz9CY+YpVFF/Xs2QiXoV0ZOGkIm6 eokq7jEV0K+LtDEmM3PkmUPgXfYw5damTv8qWuBUFd4wtVE5x/DmK8AJVMidCAUN GkVR+rEbbEi7RCwsuac/aKB8baVQCTiJ5tfNTgWh9zll+9GZSk+U71Pp0KdcJGiS nxitAZ+20hZN2CQctlmaGbCPTPYCWU4hQ3IuMdTlQTQAs8S0y1FtylTRsXcC1eVR i1hBS/dVw5PVCaqoX79zYrByUymgX0ZaYY6seRT6+U9xPGDCSNJ9mjXZL3fttnOC d5gsx/pbMIAv52G5Hj6DfsXR2JFmmxsaIzsLtRvKi9q89d84XaZfbUsHYjn4Neym 5lTKaSQHU71cKCxrStHC =VAVB -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The usual collection of bug fixes and optimizations. Perhaps of greatest note is a speed up for parallel, non-allocating DIO writes, since we no longer take the i_mutex lock in that case. For bug fixes, we fix an incorrect overhead calculation which caused slightly incorrect results for df(1) and statfs(2). We also fixed bugs in the metadata checksum feature." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: undo ext4_calc_metadata_amount if we fail to claim space ext4: don't let i_reserved_meta_blocks go negative ext4: fix hole punch failure when depth is greater than 0 ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() ext4: weed out ext4_write_super ext4: remove unnecessary superblock dirtying ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() ext4: remove useless marking of superblock dirty ext4: fix ext4 mismerge back in January ext4: remove dynamic array size in ext4_chksum() ext4: remove unused variable in ext4_update_super() ext4: make quota as first class supported feature ext4: don't take the i_mutex lock when doing DIO overwrites ext4: add a new nolock flag in ext4_map_blocks ext4: split ext4_file_write into buffered IO and direct IO ext4: remove an unused statement in ext4_mb_get_buddy_page_lock() ext4: fix out-of-date comments in extents.c ext4: use s_csum_seed instead of i_csum_seed for xattr block ext4: use proper csum calculation in ext4_rename ext4: fix overhead calculation used by ext4_statfs() ...	2012-07-27 20:52:25 -07:00
Stanislav Kinsbursky	2c142baa7b	NFSd: make boot_time variable per network namespace NFSd's boot_time represents grace period start point in time. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	a51c84ed50	NFSd: make grace end flag per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	5630f7fa97	Lockd: move grace period management from lockd() to per-net functions Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	5ccb0066f2	LockD: pass actual network namespace to grace period management functions Passed network namespace replaced hard-coded init_net Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	db9c455341	LockD: manage grace list per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	9695c7057f	SUNRPC: service request network namespace helper introduced This is a cleanup patch - makes code looks simplier. It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky	5e1533c788	NFSd: make nfsd4_manager allocated per network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky	08d44a35a9	LockD: make lockd manager allocated per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	66547b0251	LockD: manage grace period per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	e2edaa98cb	Lockd: add more debug to host shutdown functions Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	d5850ff9ea	Lockd: host complaining function introduced Just a small cleanup. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	caa4e76b6f	LockD: manage used host count per networks namespace This patch introduces moves nrhosts in per-net data. It also adds kernel warning to nlm_shutdown_hosts_net() about remaining hosts in specified network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	3cf7fb07e0	LockD: manage garbage collection timeout per networks namespace This patch moves next_gc to per-net data. Note: passed network can be NULL (when Lockd kthread is exiting of Lockd module is removing). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	27adaddc8d	LockD: make garbage collector network namespace aware. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	b26411f85d	LockD: mark host per network namespace on garbage collect This is required for per-network NLM shutdown and cleanup. This patch passes init_net for a while. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
J. Bruce Fields	99dbb8fe09	nfsd4: fix missing fault_inject.h include Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:30:12 -04:00
J. Bruce Fields	96d6d59cea	locks: move lease-specific code out of locks_delete_lock No point putting something only used by one caller into common code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:18:00 -04:00
Pavel Shilovsky	1a500f010f	CIFS: Add SMB2 support for rmdir Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:50 -05:00
Pavel Shilovsky	f958ca5d88	CIFS: Move rmdir code to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:47 -05:00
Pavel Shilovsky	a0e731839d	CIFS: Add SMB2 support for mkdir operation Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:43 -05:00
Pavel Shilovsky	f436720e94	CIFS: Separate protocol specific part from mkdir Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:40 -05:00
Pavel Shilovsky	ff691e9694	CIFS: Simplify cifs_mkdir call Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:16 -05:00
Linus Torvalds	84eda28060	Merge branch 'kmap_atomic' of git://github.com/congwang/linux Pull final kmap_atomic cleanups from Cong Wang: "This should be the final round of cleanup, as the definitions of enum km_type finally get removed from the whole tree. The patches have been in linux-next for a long time." * 'kmap_atomic' of git://github.com/congwang/linux: pipe: remove KM_USER0 from comments vmalloc: remove KM_USER0 from comments feature-removal-schedule.txt: remove kmap_atomic(page, km_type) tile: remove km_type definitions um: remove km_type definitions asm-generic: remove km_type definitions avr32: remove km_type definitions frv: remove km_type definitions powerpc: remove km_type definitions arm: remove km_type definitions highmem: remove the deprecated form of kmap_atomic tile: remove usage of enum km_type frv: remove the second parameter of kmap_atomic_primary() jbd2: remove the second argument of kmap_atomic	2012-07-27 11:26:48 -07:00
Filipe Brandenburger	3b6e2723f3	locks: prevent side-effects of locks_release_private before file_lock is initialized When calling fcntl(fd, F_SETLEASE, lck) [with lck=F_WRLCK or F_RDLCK], the custom signal or owner (if any were previously set using F_SETSIG or F_SETOWN fcntls) would be reset when F_SETLEASE was called for the second time on the same file descriptor. This bug is a regression of 2.6.37 and is described here: https://bugzilla.kernel.org/show_bug.cgi?id=43336 This patch reverts a commit from Oct 2004 (with subject "nfs4 lease: move the f_delown processing") which originally introduced the lm_release_private callback. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 09:39:55 -04:00
Stephen Rothwell	a1857ebe75	Btrfs: using vmalloc and friends needs vmalloc.h On powerpc, we don't get the implicit vmalloc.h include, and as a result the build fails noisily: fs/btrfs/send.c: In function 'fs_path_free': fs/btrfs/send.c:185:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] fs/btrfs/send.c: In function 'fs_path_ensure_buf': fs/btrfs/send.c:215:4: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration] fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c: In function 'iterate_dir_item': fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c: In function 'btrfs_ioctl_send': fs/btrfs/send.c:4463:17: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4469:17: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4475:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] fs/btrfs/send.c:4475:20: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4483:21: warning: assignment makes pointer from integer without a cast [enabled by default] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-26 18:08:30 -07:00
Linus Torvalds	e2aed8dfa5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull large btrfs update from Chris Mason: "This pull request is very large, and the two main features in here have been under testing/devel for quite a while. We have subvolume quotas from the strato developers. This enables full tracking of how many blocks are allocated to each subvolume (and all snapshots) and you can set limits on a per-subvolume basis. You can also create quota groups and toss multiple subvolumes into a big group. It's everything you need to be a web hosting company and give each user their own subvolume. The userland side of the quotas is being refreshed, they'll send out details on where to grab it soon. Next is the kernel side of btrfs send/receive from Alexander Block. This leverages the same infrastructure as the quota code to figure out relationships between blocks and their owners. It can then compute the difference between two snapshots and sends the diffs in a neutral format into userland. The basic model: create a snapshot send that snapshot as the initial backup make changes create a second snapshot send the incremental as a backup delete the first snapshot (use the second snapshot for the next incremental) The receive portion is all in userland, and in the 'next' branch of my btrfs-progs repo. There's still some work to do in terms of optimizing the send side from kernel to userland. The really important part is figuring out how two snapshots are different, and this is where we are concentrating right now. The initial send of a dataset is a little slower than tar, but the incremental sends are dramatically faster than what rsync can do. On top of all of that, we have a nice queue of fixes, cleanups and optimizations." Fix up trivial modify/del conflict in fs/btrfs/ioctl.c Also fix up semantic conflict in fs/btrfs/send.c: the interface to dentry_open() changed in commit `765927b2d5` ("switch dentry_open() to struct path, make it grab references itself"), and since it now grabs whatever references it needs, we should no longer do the mntget() on the mnt (and we need to dput() the dentry reference we took). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits) Btrfs: uninit variable fixes in send/receive Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive Btrfs: add btrfs_compare_trees function Btrfs: introduce subvol uuids and times Btrfs: make iref_to_path non static Btrfs: add a barrier before a waitqueue_active check Btrfs: call the ordered free operation without any locks held Btrfs: Check INCOMPAT flags on remount and add helper function Btrfs: add helper for tree enumeration btrfs: allow cross-subvolume file clone Btrfs: improve multi-thread buffer read Btrfs: make btrfs's allocation smoothly with preallocation Btrfs: lock the transition from dirty to writeback for an eb Btrfs: fix potential race in extent buffer freeing Btrfs: don't return true in releasepage unless we actually freed the eb Btrfs: suppress printk() if all device I/O stats are zero Btrfs: remove unwanted printk() for btrfs device I/O stats Btrfs: rewrite BTRFS_SETGET_FUNCS Btrfs: zero unused bytes in inode item Btrfs: kill free_space pointer from inode structure ... Conflicts: fs/btrfs/ioctl.c	2012-07-26 14:48:55 -07:00
Linus Torvalds	548ed10228	dlm for 3.6 This set includes a major redesign of recording the master node for resources. The old dir hash table, which just held the master node for each resource, has been removed. The rsb hash table has always duplicated the master node value from the dir, and is now the single record of it. Having two full hash tables of all resources has always been a waste, especially since one just duplicated a single value from the other. Local requests will now often require one instead of two lengthy hash table searches. The other substantial change is made possible by the dirtbl removal, and fixes a long standing race between resource removal and lookup by reworking how removal is done. At the same time it improves the efficiency of removal by avoiding repeated searches through a hash bucket. The other commits include minor fixes and changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQEFuFAAoJEDgbc8f8gGmqklAQAL2gMCp3asEbbUa2FRd5cZQT IfELLBS0kqr6ghEDpV1XTVig8vCwhGoa9UC/jYEtqnmo02ImjCOvxM4wcSAjbEhc L0J/JwfQNPtAbs8NN7vo7p2n5DPCWcKr3YiksvrPpQdXXAVTAfC4SkK028hF6n8B LtMrADqsz+dTqg6UQTCRsMyZlLrFNPZiR5IRy72BQLmg+lcNNtARb0T++W2EfAAV 03RnEUQPxbrKNZpVAAdncX5NUZz/D+DB5DMLF2q+hfquuGvheizmZHipnhOexeKV YZcQA/232BH0kqbKTGoIyqgMlCeKTgL56lBMyax3naEbBu4s80ZPOXjuA4pUpx5p 24mCtXcozoZ9WCJZwSNUqVGfG22HKSMGXTVoXxteamL/8CQ9XRPAvH/UpHi38LBm pzQ+aZGHDXpDlMlQYTnD4n7NEiq8IV5p9sOZnK2EjSshoiWefQomx73wFpwqdRgm TYtb583BjdU1QYLYGiJTy9yqg693Ket3pSipOVoqH/qMGM+d7Z3cwoMvcoHTVkOf Z2XiFvfTjWcA+4givvUOWTd9FuFJ+KQr+mPo867lIJyuT0KDttVDwZyQPqN7m7ma ymS9tVmrmfVmq97XkypVDkZELchqaW41Y8BKQ4c5JxRDLQX6Di7Pq36S6XL7mrOf medHFc9e6jhrS45lvpw+ =1bGM -----END PGP SIGNATURE----- Merge tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updatesfrom David Teigland: "This set includes a major redesign of recording the master node for resources. The old dir hash table, which just held the master node for each resource, has been removed. The rsb hash table has always duplicated the master node value from the dir, and is now the single record of it. Having two full hash tables of all resources has always been a waste, especially since one just duplicated a single value from the other. Local requests will now often require one instead of two lengthy hash table searches. The other substantial change is made possible by the dirtbl removal, and fixes a long standing race between resource removal and lookup by reworking how removal is done. At the same time it improves the efficiency of removal by avoiding repeated searches through a hash bucket. The other commits include minor fixes and changes." * tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix missing dir remove dlm: fix conversion deadlock from recovery dlm: use wait_event_timeout dlm: fix race between remove and lookup dlm: use idr instead of list for recovered rsbs dlm: use rsbtbl as resource directory	2012-07-26 14:03:42 -07:00
Linus Torvalds	98077a7205	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (40 commits) cifs: ensure that we always do cifsFileInfo_get under the spinlock CIFS: Make CAP_* checks protocol independent CIFS: Allow SMB2 statistics to be tracked CIFS: Move clear/print_stats code to ops struct CIFS: Add echo request support for SMB2 CIFS: Move echo code to osp struct CIFS: Add SMB2 support for async requests CIFS: Setup async request in ops struct CIFS: Add SMB2 support for build_path_to_root CIFS: Move building path to root to ops struct CIFS: Query SMB2 inode info CIFS: Move query inode info code to ops struct CIFS: Add SMB2 support for is_path_accessible CIFS: Move is_path_accessible to ops struct CIFS: Move informational tcon calls to ops struct CIFS: Move getting dfs referalls to ops struct CIFS: Process reconnects for SMB2 shares CIFS: Add tree connect/disconnect capability for SMB2 CIFS: Add session setup/logoff capability for SMB2 CIFS: Add capability to send SMB2 negotiate message ...	2012-07-26 14:00:52 -07:00
Josh Boyer	8ded2bbc18	posix_types.h: Cleanup stale __NFDBITS and related definitions Recently, glibc made a change to suppress sign-conversion warnings in FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the kernel's definition of __NFDBITS if applications #include <linux/types.h> after including <sys/select.h>. A build failure would be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2 flags to gcc. It was suggested that the kernel should either match the glibc definition of __NFDBITS or remove that entirely. The current in-kernel uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no uses of the related __FDELT and __FDMASK defines. Given that, we'll continue the cleanup that was started with commit `8b3d1cda4f` ("posix_types: Remove fd_set macros") and drop the remaining unused macros. Additionally, linux/time.h has similar macros defined that expand to nothing so we'll remove those at the same time. Reported-by: Jeff Law <law@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> CC: <stable@vger.kernel.org> Signed-off-by: Josh Boyer <jwboyer@redhat.com> [ .. and fix up whitespace as per akpm ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-26 13:36:43 -07:00
Linus Torvalds	fa93669a19	Driver core merge for 3.6-rc1 Here's the big driver core pull request for 3.6-rc1. Unlike 3.5, this kernel should be a lot tamer, with the printk changes now settled down. All we have here is some extcon driver updates, w1 driver updates, a few printk cleanups that weren't needed for 3.5, but are good to have now, and some other minor fixes/changes in the driver core. All of these have been in the linux-next releases for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAlARgIUACgkQMUfUDdst+ynDHgCfRNwIB9L+zZvjcKE5e1BhDbUl wVUAn398DFgbJ1+PjGkd1EMR2uVTh7Ou =MIFu -----END PGP SIGNATURE----- Merge tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core changes from Greg Kroah-Hartman: "Here's the big driver core pull request for 3.6-rc1. Unlike 3.5, this kernel should be a lot tamer, with the printk changes now settled down. All we have here is some extcon driver updates, w1 driver updates, a few printk cleanups that weren't needed for 3.5, but are good to have now, and some other minor fixes/changes in the driver core. All of these have been in the linux-next releases for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits) printk: Export struct log size and member offsets through vmcoreinfo Drivers: hv: Change the hex constant to a decimal constant driver core: don't trigger uevent after failure extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device sysfs: fail dentry revalidation after namespace change fix sysfs: fail dentry revalidation after namespace change extcon: spelling of detach in function doc extcon: arizona: Stop microphone detection if we give up on it extcon: arizona: Update cable reporting calls and split headset PM / Runtime: Do not increment device usage counts before probing kmsg - do not flush partial lines when the console is busy kmsg - export "continuation record" flag to /dev/kmsg kmsg - avoid warning for CONFIG_PRINTK=n compilations kmsg - properly print over-long continuation lines driver-core: Use kobj_to_dev instead of re-implementing it driver-core: Move kobj_to_dev from genhd.h to device.h driver core: Move deferred devices to the end of dpm_list before probing driver core: move uevent call to driver_register driver core: fix shutdown races with probe/remove(v3) Extcon: Arizona: Add driver for Wolfson Arizona class devices ...	2012-07-26 11:25:33 -07:00
Linus Torvalds	b13bc8dda8	Staging tree patches for 3.6-rc1 Here's the big staging tree merge for the 3.6-rc1 merge window. There are some patches in here outside of drivers/staging/, notibly the iio code (which is still stradeling the staging / not staging boundry), the pstore code, and the tracing code. All of these have gotten ackes from the various subsystem maintainers to be included in this tree. The pstore and tracing patches are related, and are coming here as they replace one of the android staging drivers. Otherwise, the normal staging mess. Lots of cleanups and a few new drivers (some iio drivers, and the large csr wireless driver abomination.) Note, you will get a merge issue with the following files: drivers/staging/comedi/drivers/s626.h drivers/staging/gdm72xx/netlink_k.c both of which should be trivial for you to handle. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAlAQiD8ACgkQMUfUDdst+ykxhgCeMUjvc+1RTtSprzvkzpejgoUU 6A4AnAleWMnkaCD8vruGnRdGl/Qtz51+ =mN6M -----END PGP SIGNATURE----- Merge tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging tree patches from Greg Kroah-Hartman: "Here's the big staging tree merge for the 3.6-rc1 merge window. There are some patches in here outside of drivers/staging/, notibly the iio code (which is still stradeling the staging / not staging boundry), the pstore code, and the tracing code. All of these have gotten acks from the various subsystem maintainers to be included in this tree. The pstore and tracing patches are related, and are coming here as they replace one of the android staging drivers. Otherwise, the normal staging mess. Lots of cleanups and a few new drivers (some iio drivers, and the large csr wireless driver abomination.) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up trivial conflicts in drivers/staging/comedi/drivers/s626.h and drivers/staging/gdm72xx/netlink_k.c * tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1108 commits) staging: csr: delete a bunch of unused library functions staging: csr: remove csr_utf16.c staging: csr: remove csr_pmem.h staging: csr: remove CsrPmemAlloc staging: csr: remove CsrPmemFree() staging: csr: remove CsrMemAllocDma() staging: csr: remove CsrMemCalloc() staging: csr: remove CsrMemAlloc() staging: csr: remove CsrMemFree() and CsrMemFreeDma() staging: csr: remove csr_util.h staging: csr: remove CsrOffSetOf() stating: csr: remove unneeded #includes in csr_util.c staging: csr: make CsrUInt16ToHex static staging: csr: remove CsrMemCpy() staging: csr: remove CsrStrLen() staging: csr: remove CsrVsnprintf() staging: csr: remove CsrStrDup staging: csr: remove CsrStrChr() staging: csr: remove CsrStrNCmp staging: csr: remove CsrStrCmp ...	2012-07-26 11:14:49 -07:00
Chris Mason	b24baf6917	Btrfs: uninit variable fixes in send/receive Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 19:21:10 -04:00
Chris Mason	113c1cb530	Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 19:19:10 -04:00
Alexander Block	31db9f7c23	Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:30:19 +02:00
Alexander Block	7069830a9e	Btrfs: add btrfs_compare_trees function This function is used to find the differences between two trees. The tree compare skips whole subtrees if it detects shared tree blocks and thus is pretty fast. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:30:14 +02:00
Alexander Block	8ea05e3a42	Btrfs: introduce subvol uuids and times This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:28:38 +02:00
Alexander Block	91cb916ca2	Btrfs: make iref_to_path non static Make iref_to_path non static (needed in send) and rename it to btrfs_iref_to_path Signed-off-by: Alexander Block <ablock84@googlemail.com>	2012-07-25 23:25:06 +02:00
Chris Mason	cd1cfc4915	Btrfs: add a barrier before a waitqueue_active check We were missing wakeups on the delayed ref waitqueue due to races on waitqueue_active. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:15:08 -04:00
Chris Mason	e9fbcb4220	Btrfs: call the ordered free operation without any locks held Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> cc: stable@kernel.org	2012-07-25 16:15:07 -04:00
Mitch Harder	2b0ce2c290	Btrfs: Check INCOMPAT flags on remount and add helper function In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:14:31 -04:00
Chris Mason	b478b2baa3	Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:11:38 -04:00
Jeff Layton	764a1b1ace	cifs: ensure that we always do cifsFileInfo_get under the spinlock The readpages bug is a regression that was introduced in `6993f74a5`. This also fixes a couple of similar bugs in the uncached read and write codepaths. Also, prevent this sort of thing in the future by having cifsFileInfo_get take the spinlock itself, and adding a _locked variant for use in places that are already holding the lock. The _put code has always done that so this makes for a less confusing interface. Cc: <stable@vger.kernel.org> # 3.5.x Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-25 14:51:30 -05:00
Arne Jansen	e679376911	Btrfs: add helper for tree enumeration Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-25 17:33:18 +02:00
David Sterba	362a20c5e2	btrfs: allow cross-subvolume file clone Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-25 17:33:09 +02:00
Stanislav Kinsbursky	57c8b13e3c	NFSd: set nfsd_serv to NULL after service destruction In nfsd_destroy(): if (destroy) svc_shutdown_net(nfsd_serv, net); svc_destroy(nfsd_server); svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets nfsd_serv to NULL, causing a NULL dereference on the following line. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:21:31 -04:00

... 2 3 4 5 6 ...

28343 Commits