linux

Author	SHA1	Message	Date
Bryan Donlan	de18f3b2d6	ext3: return -EIO not -ESTALE on directory traversal through deleted inode ext3_iget() returns -ESTALE if invoked on a deleted inode, in order to report errors to NFS properly. However, in ext[234]_lookup(), this -ESTALE can be propagated to userspace if the filesystem is corrupted such that a directory entry references a deleted inode. This leads to a misleading error message - "Stale NFS file handle" - and confusion on the part of the admin. The bug can be easily reproduced by creating a new filesystem, making a link to an unused inode using debugfs, then mounting and attempting to ls -l said link. This patch thus changes ext3_lookup to return -EIO if it receives -ESTALE from ext3_iget(), as ext3 does for other filesystem metadata corruption; and also invokes the appropriate ext*_error functions when this case is detected. Signed-off-by: Bryan Donlan <bdonlan@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:52 -07:00
Wei Yongjun	45f9021780	ext3: use unsigned instead of int for type of blocksize in fs/ext3/namei.c Use unsigned instead of int for the parameter which carries a blocksize. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:52 -07:00
Jan Kara	ecca9af0a9	jbd: fix oops in jbd_journal_init_inode() on corrupted fs On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Make JBD gracefully handle that. Signed-off-by: Jan Kara <jack@suse.cz> Cc: <dmaciejak@fortinet.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:52 -07:00
Cyrus Massoumi	039fd8ce62	ext3: remove the BKL in ext3/ioctl.c Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove the BKL around ext3_ioctl(). Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:52 -07:00
Nikanth Karthikesan	97f76d3d19	vfs: check bh->b_blocknr only if BH_Mapped is set Check bh->b_blocknr only if BH_Mapped is set. akpm: I doubt if b_blocknr is ever uninitialised here, but it could conceivably cause a problem if we're doing a lookup for block zero. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
Jeff Layton	d2caa3c549	writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3 ) The dirtied_when value on an inode is supposed to represent the first time that an inode has one of its pages dirtied. This value is in units of jiffies. It's used in several places in the writeback code to determine when to write out an inode. The problem is that these checks assume that dirtied_when is updated periodically. If an inode is continuously being used for I/O it can be persistently marked as dirty and will continue to age. Once the time compared to is greater than or equal to half the maximum of the jiffies type, the logic of the time_() macros inverts and the opposite of what is needed is returned. On 32-bit architectures that's just under 25 days (assuming HZ == 1000). As the least-recently dirtied inode, it'll end up being the first one that pdflush will try to write out. sync_sb_inodes does this check: / Was this inode dirtied after sync_sb_inodes was called? */ if (time_after(inode->dirtied_when, start)) break; ...but now dirtied_when appears to be in the future. sync_sb_inodes bails out without attempting to write any dirty inodes. When this occurs, pdflush will stop writing out inodes for this superblock. Nothing can unwedge it until jiffies moves out of the problematic window. This patch fixes this problem by changing the checks against dirtied_when to also check whether it appears to be in the future. If it does, then we consider the value to be far in the past. This should shrink the problematic window of time to such a small period (30s) as not to matter. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Wu Fengguang	b6fac63cc1	vfs: skip I_CLEAR state inodes clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so _outside_ of inode_lock. So any I_FREEING testing is incomplete without a coupled testing of I_CLEAR. So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and add_dquot_ref(). Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara reminds fixing the other two cases. Masayoshi MIZUMA has a nice panic flow: ===================================================================== [process A] \| [process B] \| \| \| prune_icache() \| drop_pagecache() \| spin_lock(&inode_lock) \| drop_pagecache_sb() \| inode->i_state \|= I_FREEING; \| \| \| spin_unlock(&inode_lock) \| V \| \| \| spin_lock(&inode_lock) \| V \| \| \| dispose_list() \| \| \| list_del() \| \| \| clear_inode() \| \| \| inode->i_state = I_CLEAR \| \| \| \| \| V \| \| \| if (inode->i_state & (I_FREEING\|I_WILL_FREE)) \| \| \| continue; <==== NOT MATCH \| \| \| \| \| \| (DANGER from here on! Accessing disposing inode!) \| \| \| \| \| \| __iget() \| \| \| list_move() <===== PANIC on poisoned list !! V V \| (time) ===================================================================== Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
David Howells	33e5d76979	nommu: fix a number of issues with the per-MM VMA patch Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Stoyan Gaydarov	c293498be6	Btrfs: BUG to BUG_ON changes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:05:11 -04:00
Dan Carpenter	3e7ad38d20	Btrfs: remove dead code Remove an unneeded return statement and conditional Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Dan Carpenter	ff0a5836ac	Btrfs: remove dead code merge is always NULL at this point. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Wu Fengguang	d4a789474a	Btrfs: fix typos in comments Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Jim Owens	2e966ed22c	Btrfs: remove unused ftrace include Signed-off-by: jim owens <jowens@hp.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:02:55 -04:00
Heiko Carstens	93dbfad7ac	Btrfs: fix __ucmpdi2 compile bug on 32 bit builds We get this on 32 builds: fs/built-in.o: In function `extent_fiemap': (.text+0x1019f2): undefined reference to `__ucmpdi2' Happens because of a switch statement with a 64 bit argument. Convert this to an if statement to fix this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:33:45 -04:00
Shen Feng	09771430f3	Btrfs: free inode struct when btrfs_new_inode fails btrfs_new_inode doesn't call iput to free the inode when it fails. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Amit Gud	b5555f7711	Btrfs: fix race in worker_loop Need to check kthread_should_stop after schedule_timeout() before calling schedule(). This causes threads to sleep with potentially no one to wake them up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop. Signed-off-by: Amit Gud <gud@ksu.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:01:27 -04:00
Sage Weil	dccae99995	Btrfs: add flushoncommit mount option The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes the committed state a fully consistent view of the file system from the application's perspective (i.e., it includes all completed file system operations). This was previously the behavior only when a snapshot is created. This is used by Ceph to ensure that completed writes make it to the platter along with the metadata operations they are bound to (by BTRFS_IOC_TRANS_{START,END}). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:59:01 -04:00
Sage Weil	3a5e14048a	Btrfs: notreelog mount option Add a 'notreelog' mount option to disable the tree log (used by fsync, O_SYNC writes). This is much slower, but the tree logging produces inconsistent views into the FS for ceph. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:49:40 -04:00
Eric Paris	a9572a15a8	Btrfs: introduce btrfs_show_options btrfs options can change at times other than mount, yet /proc/mounts shows the options string used when the fs was mounted (an example would be when btrfs determines that barriers aren't useful and turns them off.) This patch instead outputs the actual options in use by btrfs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Chris Mason	fa9c0d795f	Btrfs: rework allocation clustering Because btrfs is copy-on-write, we end up picking new locations for blocks very often. This makes it fairly difficult to maintain perfect read patterns over time, but we can at least do some optimizations for writes. This is done today by remembering the last place we allocated and trying to find a free space hole big enough to hold more than just one allocation. The end result is that we tend to write sequentially to the drive. This happens all the time for metadata and it happens for data when mounted -o ssd. But, the way we record it is fairly racey and it tends to fragment the free space over time because we are trying to allocate fairly large areas at once. This commit gets rid of the races by adding a free space cluster object with dedicated locking to make sure that only one process at a time is out replacing the cluster. The free space fragmentation is somewhat solved by allowing a cluster to be comprised of smaller free space extents. This part definitely adds some CPU time to the cluster allocations, but it allows the allocator to consume the small holes left behind by cow. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 09:47:43 -04:00
Chris Mason	8e73f27501	Btrfs: Optimize locking in btrfs_next_leaf() btrfs_next_leaf was using blocking locks when it could have been using faster spinning ones instead. This adds a few extra checks around the pieces that block and switches over to spinning locks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:14:18 -04:00
Chris Mason	c8c42864f6	Btrfs: break up btrfs_search_slot into smaller pieces btrfs_search_slot was doing too many things at once. This breaks it up into more reasonable units. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	04018de5d4	Btrfs: kill the pinned_mutex This patch removes the pinned_mutex. The extent io map has an internal tree lock that protects the tree itself, and since we only copy the extent io map when we are committing the transaction we don't need it there. We also don't need it when caching the block group since searching through the tree is also protected by the internal map spin lock. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	6226cb0a5e	Btrfs: kill the block group alloc mutex This patch removes the block group alloc mutex used to protect the free space tree for allocations and replaces it with a spin lock which is used only to protect the free space rb tree. This means we only take the lock when we are directly manipulating the tree, which makes us a touch faster with multi-threaded workloads. This patch also gets rid of btrfs_find_free_space and replaces it with btrfs_find_space_for_alloc, which takes the number of bytes you want to allocate, and empty_size, which is used to indicate how much free space should be at the end of the allocation. It will return an offset for the allocator to use. If we don't end up using it we _must_ call btrfs_add_free_space to put it back. This is the tradeoff to kill the alloc_mutex, since we need to make sure nobody else comes along and takes our space. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	2552d17e32	Btrfs: clean up find_free_extent I've replaced the strange looping constructs with a list_for_each_entry on space_info->block_groups. If we have a hint we just jump into the loop with the block group and start looking for space. If we don't find anything we start at the beginning and start looking. We never come out of the loop with a ref on the block_group _unless_ we found space to use, then we drop it after we set the trans block_group. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:19 -04:00
Josef Bacik	70cb074345	Btrfs: free space cache cleanups This patch cleans up the free space cache code a bit. It better documents the idiosyncrasies of tree_search_offset and makes the code make a bit more sense. I took out the info allocation at the start of __btrfs_add_free_space and put it where it makes more sense. This was left over cruft from when alloc_mutex existed. Also all of the re-searches we do to make sure we inserted properly. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:19 -04:00
Chris Mason	bedf762ba3	Btrfs: unplug in the async bio submission threads Btrfs pages being written get set to writeback, and then may go through a number of steps before they hit the block layer. This includes compression, checksumming and async bio submission. The end result is that someone who writes a page and then does wait_on_page_writeback is likely to unplug the queue before the bio they cared about got there. We could fix this by marking bios sync, or by doing more frequent unplugs, but this commit just changes the async bio submission code to unplug after it has processed all the bios for a device. The async bio submission does a fair job of collection bios, so this shouldn't be a huge problem for reducing merging at the elevator. For streaming O_DIRECT writes on a 5 drive array, it boosts performance from 386MB/s to 460MB/s. Thanks to Hisashi Hifumi for helping with this work. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:32:58 -04:00
Chris Mason	b765ead57d	Btrfs: keep processing bios for a given bdev if our proc is batching Btrfs uses async helper threads to submit write bios so the checksumming helper threads don't block on the disk. The submit bio threads may process bios for more than one block device, so when they find one device congested they try to move on to other devices instead of blocking in get_request_wait for one device. This does a pretty good job of keeping multiple devices busy, but the congested flag has a number of problems. A congested device may still give you a request, and other procs that aren't backing off the congested device may starve you out. This commit uses the io_context stored in current to decide if our process has been made a batching process by the block layer. If so, it keeps sending IO down for at least one batch. This helps make sure we do a good amount of work each time we visit a bdev, and avoids large IO stalls in multi-device workloads. It's also very ugly. A better solution is in the works with Jens Axboe. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:27:10 -04:00
Miklos Szeredi	fc280c9692	fuse: allow private mappings of "direct_io" files Allow MAP_PRIVATE mmaps of "direct_io" files. This is necessary for execute support. MAP_SHARED mappings require some sort of coherency between the underlying file and the mapping. With "direct_io" it is difficult to provide this, so for the moment just disallow shared (read-write and read-only) mappings altogether. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-02 14:25:35 +02:00
Miklos Szeredi	f4975c67dd	fuse: allow kernel to access "direct_io" files Allow the kernel read and write on "direct_io" files. This is necessary for nfs export and execute support. The implementation is simple: if an access from the kernel is detected, don't perform get_user_pages(), just use the kernel address provided by the requester to copy from/to the userspace filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-02 14:25:34 +02:00
Jan Kara	146bca72c7	udf: Don't write integrity descriptor too often We update information in logical volume integrity descriptor after each allocation (as LVID contains free space, number of directories and files on disk etc.). If the filesystem is on some phase change media, this leads to its quick degradation as such media is able to handle only 10000 overwrites or so. We solve the problem by writing new information into LVID only on umount, remount-ro and sync. This solves the problem at the price of longer media inconsistency (previously media became consistent after pdflush flushed dirty LVID buffer) but that should be acceptable. Report by and patch written in cooperation with Rich Coe <Richard.Coe@med.ge.com>. Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 13:36:28 +02:00
Jan Kara	4034600516	udf: Try anchor in block 256 first Anchor block can be located at several places on the medium. Two of the locations are relative to media end which is problematic to detect. Also some drives report some block as last but are not able to read it or any block nearby before it. So let's first try block 256 and if it is all fine, don't look at other possible locations of anchor blocks to avoid IO errors. This change required a larger reorganization of code but the new code is hopefully more readable and definitely shorter. Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:56 +02:00
Jan Kara	225feded89	udf: Some type fixes and cleanups Make udf_check_valid() return 1 if the validity check passed and 0 otherwise. So far it was the other way around which was a bit confusing. Also make udf_vrs() return loff_t which is really the type it should return (not int). Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:55 +02:00
Clemens Ladisch	1197e4dfcf	udf: use hardware sector size This patch makes the UDF FS driver use the hardware sector size as the default logical block size, which is required by the UDF specifications. While the previous default of 2048 bytes was correct for optical disks, it was not for hard disks or USB storage devices, and made it impossible to use such a device with the default mount options. (The Linux mkudffs tool uses a default block size of 2048 bytes even on devices with smaller hardware sectors, so this bug is unlikely to be noticed unless UDF-formatted USB storage devices are exchanged with other OSs.) To avoid regressions for people who use loopback optical disk images or who used the (sometimes wrong) defaults of mkudffs, we also try with a block size of 2048 bytes if no anchor was found with the hardware sector size. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:55 +02:00
Clemens Ladisch	4136801aec	udf: fix novrs mount option The novrs mount option was broken due to a missing break. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:54 +02:00
Jan Kara	59285c28d1	udf: Fix oops when invalid character in filename occurs Functions udf_CS0toNLS() and udf_NLStoCS0() didn't count with the fact that NLS can return negative length when invalid character is given to it for conversion. Thus interesting things could happen (such as overwriting random memory with the rest of filename). Add appropriate checks. Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:53 +02:00
Coly Li	557f5a1468	udf: return f_fsid for statfs(2) This patch makes udf return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:53 +02:00
Jan Kara	f90981fed9	udf: Add checks to not underflow sector_t Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:52 +02:00
Marcin Slusarz	87bc730c07	udf: fix default mode and dmode options handling On x86 (and several other archs) mode_t is defined as "unsigned short" and comparing unsigned shorts to negative ints is broken (because short is promoted to int and then compared). Fix it. Reported-and-tested-by: Laurent Riffard <laurent.riffard@free.fr> Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:52 +02:00
Jan Kara	e650b94add	udf: fix sparse warnings: Fix sparse warnings: fs/udf/balloc.c:843:3: warning: returning void-valued expression fs/udf/balloc.c:847:3: warning: returning void-valued expression fs/udf/balloc.c:851:3: warning: returning void-valued expression fs/udf/balloc.c:855:3: warning: returning void-valued expression Reported-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:51 +02:00
roel kluin	30930554f2	udf: unsigned last[i] cannot be less than 0 unsigned last[i] cannot be less than 0 Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:50 +02:00
Marcin Slusarz	7ac9bcd5da	udf: implement mode and dmode mounting options "dmode" allows overriding permissions of directories and "mode" allows overriding permissions of files. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:50 +02:00
Marcin Slusarz	530f1a5e3e	udf: reduce stack usage of udf_get_filename Allocate strings with kmalloc. Checkstack output: Before: udf_get_filename: 600 After: udf_get_filename: 136 Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:49 +02:00
Marcin Slusarz	ba9aadd80c	udf: reduce stack usage of udf_load_pvoldesc Allocate strings with kmalloc. Checkstack output: Before: udf_process_sequence: 712 After: udf_process_sequence: 200 Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:48 +02:00
Pekka Enberg	97e961fdbf	Fix the udf code not to pass structs on stack where possible. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:47 +02:00
Pekka Enberg	5ca4e4be84	Remove struct typedefs from fs/udf/ecma_167.h et al. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Jan Kara <jack@suse.cz>	2009-04-02 12:29:47 +02:00
Ingo Molnar	8302294f43	Merge branch 'tracing/core-v2' into tracing-for-linus Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c	2009-04-02 00:49:02 +02:00
Felix Blyakher	f36345ff9a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus	2009-04-01 16:58:39 -05:00
Linus Torvalds	4fe70410d9	Merge branch 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits) SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port NSM: Fix unaligned accesses in nsm_init_private() NFS: Simplify logic to compare socket addresses in client.c NFS: Start PF_INET6 callback listener only if IPv6 support is available lockd: Start PF_INET6 listener only if IPv6 support is available SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 SUNRPC: rpcb_register() should handle errors silently SUNRPC: Simplify kernel RPC service registration SUNRPC: Simplify svc_unregister() SUNRPC: Allow callers to pass rpcb_v4_register a NULL address SUNRPC: rpcbind actually interprets r_owner string SUNRPC: Clean up address type casts in rpcb_v4_register() SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() SUNRPC: Change svc_create_xprt() to take a @family argument SUNRPC: svc_setup_socket() gets protocol family from socket SUNRPC: Pass a family argument to svc_register() ...	2009-04-01 10:58:42 -07:00
Linus Torvalds	395d73413c	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) ext4: Regularize mount options ext4: fix locking typo in mballoc which could cause soft lockup hangs ext4: fix typo which causes a memory leak on error path jbd2: Update locking coments ext4: Rename pa_linear to pa_type ext4: add checks of block references for non-extent inodes ext4: Check for an valid i_mode when reading the inode from disk ext4: Use WRITE_SYNC for commits which are caused by fsync() ext4: Add auto_da_alloc mount option ext4: Use struct flex_groups to calculate get_orlov_stats() ext4: Use atomic_t's in struct flex_groups ext4: remove /proc tuning knobs ext4: Add sysfs support ext4: Track lifetime disk writes ext4: Fix discard of inode prealloc space with delayed allocation. ext4: Automatically allocate delay allocated blocks on rename ext4: Automatically allocate delay allocated blocks on close ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl ext4: Simplify delalloc code by removing mpage_da_writepages() ext4: Save stack space by removing fake buffer heads ...	2009-04-01 10:57:49 -07:00
Trond Myklebust	cc85906110	Merge branch 'devel' into for-linus	2009-04-01 13:28:15 -04:00
Mans Rullgard	ad5b365c12	NSM: Fix unaligned accesses in nsm_init_private() This fixes unaligned accesses in nsm_init_private() when creating nlm_reboot keys. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-04-01 13:24:14 -04:00
Linus Torvalds	c226fd659f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: try to free metadata pages when we free btree blocks Btrfs: add extra flushing for renames and truncates Btrfs: make sure btrfs_update_delayed_ref doesn't increase ref_mod Btrfs: optimize fsyncs on old files Btrfs: tree logging unlink/rename fixes Btrfs: Make sure i_nlink doesn't hit zero too soon during log replay Btrfs: limit balancing work while flushing delayed refs Btrfs: readahead checksums during btrfs_finish_ordered_io Btrfs: leave btree locks spinning more often Btrfs: Only let very young transactions grow during commit Btrfs: Check for a blocking lock before taking the spin Btrfs: reduce stack in cow_file_range Btrfs: reduce stalls during transaction commit Btrfs: process the delayed reference queue in clusters Btrfs: try to cleanup delayed refs while freeing extents Btrfs: reduce stack usage in some crucial tree balancing functions Btrfs: do extent allocation and reference count updates in the background Btrfs: don't preallocate metadata blocks during btrfs_search_slot	2009-04-01 10:20:44 -07:00
Ian Kent	8f63aaa8b9	autofs4: fix lookup deadlock A deadlock can occur when user space uses a signal (autofs version 4 uses SIGCHLD for this) to effect expire completion. The order of events is: Expire process completes, but before being able to send SIGCHLD to it's parent ... Another process walks onto a different mount point and drops the directory inode semaphore prior to sending the request to the daemon as it must ... A third process does an lstat on on the expired mount point causing it to wait on expire completion (unfortunately) holding the directory semaphore. The mount request then arrives at the daemon which does an lstat and, deadlock. For some time I was concerned about releasing the directory semaphore around the expire wait in autofs4_lookup as well as for the mount call back. I finally realized that the last round of changes in this function made the expiring dentry and the lookup dentry separate and distinct so the check and possible wait can be done anywhere prior to the mount call back. This patch moves the check to just before the mount call back and inside the directory inode mutex release. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:23 -07:00
Ian Kent	56fcef7511	autofs4: cleanup expire code duplication A significant portion of the autofs_dev_ioctl_expire() and autofs4_expire_multi() functions is duplicated code. This patch cleans that up. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:23 -07:00
Johannes Weiner	00fcf2cb6f	ecryptfs: use kzfree() Use kzfree() instead of memset() + kfree(). Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:23 -07:00
Wu Fengguang	c3b1b1cbf0	ramfs: add support for "mode=" mount option Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12843 "I use ramfs instead of tmpfs for /tmp because I don't use swap on my laptop. Some apps need 1777 mode for /tmp directory, but ramfs does not support 'mode=' mount option." Reported-by: Avan Anishchuk <matimatik@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:22 -07:00
Davide Libenzi	395108880e	epoll keyed wakeups: make eventfd use keyed wakeups Introduce keyed event wakeups inside the eventfd code. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Davide Libenzi	2dfa4eeab0	epoll keyed wakeups: teach epoll about hints coming with the wakeup key Use the events hint now sent by some devices, to avoid unnecessary wakeups for events that are of no interest for the caller. This code handles both devices that are sending keyed events, and the ones that are not (and event the ones that sometimes send events, and sometimes don't). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Davide Libenzi	bcd0b235bf	eventfd: improve support for semaphore-like behavior People started using eventfd in a semaphore-like way where before they were using pipes. That is, counter-based resource access. Where a "wait()" returns immediately by decrementing the counter by one, if counter is greater than zero. Otherwise will wait. And where a "post(count)" will add count to the counter releasing the appropriate amount of waiters. If eventfd the "post" (write) part is fine, while the "wait" (read) does not dequeue 1, but the whole counter value. The problem with eventfd is that a read() on the fd returns and wipes the whole counter, making the use of it as semaphore a little bit more cumbersome. You can do a read() followed by a write() of COUNTER-1, but IMO it's pretty easy and cheap to make this work w/out extra steps. This patch introduces a new eventfd flag that tells eventfd to only dequeue 1 from the counter, allowing simple read/write to make it behave like a semaphore. Simple test here: http://www.xmailserver.org/eventfd-sem.c To be back-compatible with earlier kernels, userspace applications should probe for the availability of this feature via #ifdef EFD_SEMAPHORE fd = eventfd2 (CNT, EFD_SEMAPHORE); if (fd == -1 && errno == EINVAL) <fallback> #else <fallback> #endif Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <linux-api@vger.kernel.org> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Tony Battersby	4f0989dbfa	epoll: use real type instead of void * eventpoll.c uses void * in one place for no obvious reason; change it to use the real type instead. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Tony Battersby	e057e15ff6	epoll: clean up ep_modify ep_modify() doesn't need to set event.data from within the ep->lock spinlock as the comment suggests. The only place event.data is used is ep_send_events_proc(), and this is protected by ep->mtx instead of ep->lock. Also update the comment for mutex_lock() at the top of ep_scan_ready_list(), which mentions epoll_ctl(EPOLL_CTL_DEL) but not epoll_ctl(EPOLL_CTL_MOD). ep_modify() can also use spin_lock_irq() instead of spin_lock_irqsave(). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Tony Battersby	d1bc90dd5d	epoll: remove unnecessary xchg xchg in ep_unregister_pollwait() is unnecessary because it is protected by either epmutex or ep->mtx (the same protection as ep_remove()). If xchg was necessary, it would be insufficient to protect against problems: if multiple concurrent calls to ep_unregister_pollwait() were possible then a second caller that returns without doing anything because nwait == 0 could return before the waitqueues are removed by the first caller, which looks like it could lead to problematic races with ep_poll_callback(). So remove xchg and add comments about the locking. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Tony Battersby	d030588282	epoll: remember the event if epoll_wait returns -EFAULT If epoll_wait returns -EFAULT, the event that was being returned when the fault was encountered will be forgotten. This is not a big deal since EFAULT will happen only if a buggy userspace program passes in a bad address, in which case what happens later usually doesn't matter. However, it is easy to remember the event for later, and this patch makes a simple change to do that. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Tony Battersby	abff55cee1	epoll: don't use current in irq context ep_call_nested() (formerly ep_poll_safewake()) uses "current" (without dereferencing it) to detect callback recursion, but it may be called from irq context where the use of current is generally discouraged. It would be better to use get_cpu() and put_cpu() to detect the callback recursion. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Davide Libenzi	bb57c3edcd	epoll: remove debugging code Remove debugging code from epoll. There's no need for it to be included into mainline code. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Davide Libenzi	296e236e96	epoll: fix epoll's own poll (update) Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:19 -07:00
Davide Libenzi	5071f97ec6	epoll: fix epoll's own poll Fix a bug inside the epoll's f_op->poll() code, that returns POLLIN even though there are no actual ready monitored fds. The bug shows up if you add an epoll fd inside another fd container (poll, select, epoll). The problem is that callback-based wake ups used by epoll does not carry (patches will follow, to fix this) any information about the events that actually happened. So the callback code, since it can't call the file* ->poll() inside the callback, chains the file* into a ready-list. So, suppose you added an fd with EPOLLOUT only, and some data shows up on the fd, the file* mapped by the fd will be added into the ready-list (via wakeup callback). During normal epoll_wait() use, this condition is sorted out at the time we're actually able to call the file's f_op->poll(). Inside the old epoll's f_op->poll() though, only a quick check !list_empty(ready-list) was performed, and this could have led to reporting POLLIN even though no ready fds would show up at a following epoll_wait(). In order to correctly report the ready status for an epoll fd, the ready-list must be checked to see if any really available fd+event would be ready in a following epoll_wait(). Operation (calling f_op->poll() from inside f_op->poll()) that, like wake ups, must be handled with care because of the fact that epoll fds can be added to other epoll fds. Test code: / * epoll_test by Davide Libenzi (Simple code to test epoll internals) * Copyright (C) 2008 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Davide Libenzi <davidel@xmailserver.org> * / #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <signal.h> #include <limits.h> #include <poll.h> #include <sys/epoll.h> #include <sys/wait.h> #define EPWAIT_TIMEO (1 1000) #ifndef POLLRDHUP #define POLLRDHUP 0x2000 #endif #define EPOLL_MAX_CHAIN 100L #define EPOLL_TF_LOOP (1 << 0) struct epoll_test_cfg { long size; long flags; }; static int xepoll_create(int n) { int epfd; if ((epfd = epoll_create(n)) == -1) { perror("epoll_create"); exit(2); } return epfd; } static void xepoll_ctl(int epfd, int cmd, int fd, struct epoll_event evt) { if (epoll_ctl(epfd, cmd, fd, evt) < 0) { perror("epoll_ctl"); exit(3); } } static void xpipe(int fds) { if (pipe(fds)) { perror("pipe"); exit(4); } } static pid_t xfork(void) { pid_t pid; if ((pid = fork()) == (pid_t) -1) { perror("pipe"); exit(5); } return pid; } static int run_forked_proc(int (proc)(void ), void data) { int status; pid_t pid; if ((pid = xfork()) == 0) exit((proc)(data)); if (waitpid(pid, &status, 0) != pid) { perror("waitpid"); return -1; } return WIFEXITED(status) ? WEXITSTATUS(status): -2; } static int check_events(int fd, int timeo) { struct pollfd pfd; fprintf(stdout, "Checking events for fd %d\n", fd); memset(&pfd, 0, sizeof(pfd)); pfd.fd = fd; pfd.events = POLLIN \| POLLOUT; if (poll(&pfd, 1, timeo) < 0) { perror("poll()"); return 0; } if (pfd.revents & POLLIN) fprintf(stdout, "\tPOLLIN\n"); if (pfd.revents & POLLOUT) fprintf(stdout, "\tPOLLOUT\n"); if (pfd.revents & POLLERR) fprintf(stdout, "\tPOLLERR\n"); if (pfd.revents & POLLHUP) fprintf(stdout, "\tPOLLHUP\n"); if (pfd.revents & POLLRDHUP) fprintf(stdout, "\tPOLLRDHUP\n"); return pfd.revents; } static int epoll_test_tty(void data) { int epfd, ifd = fileno(stdin), res; struct epoll_event evt; if (check_events(ifd, 0) != POLLOUT) { fprintf(stderr, "Something is cooking on STDIN (%d)\n", ifd); return 1; } epfd = xepoll_create(1); fprintf(stdout, "Created epoll fd (%d)\n", epfd); memset(&evt, 0, sizeof(evt)); evt.events = EPOLLIN; xepoll_ctl(epfd, EPOLL_CTL_ADD, ifd, &evt); if (check_events(epfd, 0) & POLLIN) { res = epoll_wait(epfd, &evt, 1, 0); if (res == 0) { fprintf(stderr, "Epoll fd (%d) is ready when it shouldn't!\n", epfd); return 2; } } return 0; } static int epoll_wakeup_chain(void data) { struct epoll_test_cfg tcfg = data; int i, res, epfd, bfd, nfd, pfds[2]; pid_t pid; struct epoll_event evt; memset(&evt, 0, sizeof(evt)); evt.events = EPOLLIN; epfd = bfd = xepoll_create(1); for (i = 0; i < tcfg->size; i++) { nfd = xepoll_create(1); xepoll_ctl(bfd, EPOLL_CTL_ADD, nfd, &evt); bfd = nfd; } xpipe(pfds); if (tcfg->flags & EPOLL_TF_LOOP) { xepoll_ctl(bfd, EPOLL_CTL_ADD, epfd, &evt); / * If we're testing for loop, we want that the wakeup * triggered by the write to the pipe done in the child * process, triggers a fake event. So we add the pipe * read size with EPOLLOUT events. This will trigger * an addition to the ready-list, but no real events * will be there. The the epoll kernel code will proceed * in calling f_op->poll() of the epfd, triggering the * loop we want to test. / evt.events = EPOLLOUT; } xepoll_ctl(bfd, EPOLL_CTL_ADD, pfds[0], &evt); / * The pipe write must come after the poll(2) call inside * check_events(). This tests the nested wakeup code in * fs/eventpoll.c:ep_poll_safewake() * By having the check_events() (hence poll(2)) happens first, * we have poll wait queue filled up, and the write(2) in the * child will trigger the wakeup chain. / if ((pid = xfork()) == 0) { sleep(1); write(pfds[1], "w", 1); exit(0); } res = check_events(epfd, 2000) & POLLIN; if (waitpid(pid, NULL, 0) != pid) { perror("waitpid"); return -1; } return res; } static int epoll_poll_chain(void data) { struct epoll_test_cfg tcfg = data; int i, res, epfd, bfd, nfd, pfds[2]; pid_t pid; struct epoll_event evt; memset(&evt, 0, sizeof(evt)); evt.events = EPOLLIN; epfd = bfd = xepoll_create(1); for (i = 0; i < tcfg->size; i++) { nfd = xepoll_create(1); xepoll_ctl(bfd, EPOLL_CTL_ADD, nfd, &evt); bfd = nfd; } xpipe(pfds); if (tcfg->flags & EPOLL_TF_LOOP) { xepoll_ctl(bfd, EPOLL_CTL_ADD, epfd, &evt); / * If we're testing for loop, we want that the wakeup * triggered by the write to the pipe done in the child * process, triggers a fake event. So we add the pipe * read size with EPOLLOUT events. This will trigger * an addition to the ready-list, but no real events * will be there. The the epoll kernel code will proceed * in calling f_op->poll() of the epfd, triggering the * loop we want to test. / evt.events = EPOLLOUT; } xepoll_ctl(bfd, EPOLL_CTL_ADD, pfds[0], &evt); / * The pipe write mush come before the poll(2) call inside * check_events(). This tests the nested f_op->poll calls code in * fs/eventpoll.c:ep_eventpoll_poll() * By having the pipe write(2) happen first, we make the kernel * epoll code to load the ready lists, and the following poll(2) * done inside check_events() will test nested poll code in * ep_eventpoll_poll(). / if ((pid = xfork()) == 0) { write(pfds[1], "w", 1); exit(0); } sleep(1); res = check_events(epfd, 1000) & POLLIN; if (waitpid(pid, NULL, 0) != pid) { perror("waitpid"); return -1; } return res; } int main(int ac, char av) { int error; struct epoll_test_cfg tcfg; fprintf(stdout, "\n******* Testing TTY events\n"); error = run_forked_proc(epoll_test_tty, NULL); fprintf(stdout, error == 0 ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = 3; tcfg.flags = 0; fprintf(stdout, "\n****** Testing short wakeup chain\n"); error = run_forked_proc(epoll_wakeup_chain, &tcfg); fprintf(stdout, error == POLLIN ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = EPOLL_MAX_CHAIN; tcfg.flags = 0; fprintf(stdout, "\n****** Testing long wakeup chain (HOLD ON)\n"); error = run_forked_proc(epoll_wakeup_chain, &tcfg); fprintf(stdout, error == 0 ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = 3; tcfg.flags = 0; fprintf(stdout, "\n****** Testing short poll chain\n"); error = run_forked_proc(epoll_poll_chain, &tcfg); fprintf(stdout, error == POLLIN ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = EPOLL_MAX_CHAIN; tcfg.flags = 0; fprintf(stdout, "\n****** Testing long poll chain (HOLD ON)\n"); error = run_forked_proc(epoll_poll_chain, &tcfg); fprintf(stdout, error == 0 ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = 3; tcfg.flags = EPOLL_TF_LOOP; fprintf(stdout, "\n****** Testing loopy wakeup chain (HOLD ON)\n"); error = run_forked_proc(epoll_wakeup_chain, &tcfg); fprintf(stdout, error == 0 ? "****** OK\n": "****** FAIL (%d)\n", error); tcfg.size = 3; tcfg.flags = EPOLL_TF_LOOP; fprintf(stdout, "\n****** Testing loopy poll chain (HOLD ON)\n"); error = run_forked_proc(epoll_poll_chain, &tcfg); fprintf(stdout, error == 0 ? "****** OK\n": "******** FAIL (%d)\n", error); return 0; } Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:18 -07:00
Harvey Harrison	63cd885426	ntfs: remove private wrapper of endian helpers The base versions handle constant folding now and are shorter than these private wrappers, use them directly. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:18 -07:00
Eric Sandeen	c2d7543851	filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems Now that the filesystem freeze operation has been elevated to the VFS, and is just an ioctl away, some sort of safety net for unintentionally frozen root filesystems may be in order. The timeout thaw originally proposed did not get merged, but perhaps something like this would be useful in emergencies. For example, freeze /path/to/mountpoint may freeze your root filesystem if you forgot that you had that unmounted. I chose 'j' as the last remaining character other than 'h' which is sort of reserved for help (because help is generated on any unknown character). I've tested this on a non-root fs with multiple (nested) freezers, as well as on a system rendered unresponsive due to a frozen root fs. [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:17 -07:00
KAMEZAWA Hiroyuki	327c0e9686	vmscan: fix it to take care of nodemask try_to_free_pages() is used for the direct reclaim of up to SWAP_CLUSTER_MAX pages when watermarks are low. The caller to alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to be used but this is not passed to try_to_free_pages(). This can lead to unnecessary reclaim of pages that are unusable by the caller and int the worst case lead to allocation failure as progress was not been make where it is needed. This patch passes the nodemask used for alloc_pages_nodemask() to try_to_free_pages(). Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:15 -07:00
Johannes Weiner	2678958e12	ramfs-nommu: use generic lru cache Instead of open-coding the lru-list-add pagevec batching when expanding a file mapping from zero, defer to the appropriate page cache function that also takes care of adding the page to the lru list. This is cleaner, saves code and reduces the stack footprint by 16 words worth of pagevec. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.com> Cc: MinChan Kim <minchan.kim@gmail.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:15 -07:00
Hugh Dickins	851a039cc5	mm: page_mkwrite change prototype to match fault: fix sysfs Fix warnings and return values in sysfs bin_page_mkwrite(), fixing fs/sysfs/bin.c: In function `bin_page_mkwrite': fs/sysfs/bin.c:250: warning: passing argument 2 of `bb->vm_ops->page_mkwrite' from incompatible pointer type fs/sysfs/bin.c: At top level: fs/sysfs/bin.c:280: warning: initialization from incompatible pointer type Expects to have my [PATCH next] sysfs: fix some bin_vm_ops errors Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: "Eric W. Biederman" <ebiederm@aristanetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Nick Piggin	56a76f8275	fs: fix page_mkwrite error cases in core code and btrfs page_mkwrite is called with neither the page lock nor the ptl held. This means a page can be concurrently truncated or invalidated out from underneath it. Callers are supposed to prevent truncate races themselves, however previously the only thing they can do in case they hit one is to raise a SIGBUS. A sigbus is wrong for the case that the page has been invalidated or truncated within i_size (eg. hole punched). Callers may also have to perform memory allocations in this path, where again, SIGBUS would be wrong. The previous patch ("mm: page_mkwrite change prototype to match fault") made it possible to properly specify errors. Convert the generic buffer.c code and btrfs to return sane error values (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit without doing anything, and the fault will be retried properly). This fixes core code, and converts btrfs as a template/example. All other filesystems defining their own page_mkwrite should be fixed in a similar manner. Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Nick Piggin	c2ec175c39	mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Ravikiran G Thirumalai	2584e51732	mm: reintroduce and deprecate rlimit based access for SHM_HUGETLB Allow non root users with sufficient mlock rlimits to be able to allocate hugetlb backed shm for now. Deprecate this though. This is being deprecated because the mlock based rlimit checks for SHM_HUGETLB is not consistent with mmap based huge page allocations. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:12 -07:00
Ravikiran G Thirumalai	8a0bdec194	mm: fix SHM_HUGETLB to work with users in hugetlb_shm_group Fix hugetlb subsystem so that non root users belonging to hugetlb_shm_group can actually allocate hugetlb backed shm. Currently non root users cannot even map one large page using SHM_HUGETLB when they belong to the gid in /proc/sys/vm/hugetlb_shm_group. This is because allocation size is verified against RLIMIT_MEMLOCK resource limit even if the user belongs to hugetlb_shm_group. This patch 1. Fixes hugetlb subsystem so that users with CAP_IPC_LOCK and users belonging to hugetlb_shm_group don't need to be restricted with RLIMIT_MEMLOCK resource limits 2. This patch also disables mlock based rlimit checking (which will be reinstated and marked deprecated in a subsequent patch). Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:12 -07:00
Edward Shishkin	e3a7cca1ef	vfs: add/use account_page_dirtied() Add a helper function account_page_dirtied(). Use that from two callsites. reiser4 adds a function which adds a third callsite. Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:12 -07:00
Alexey Dobriyan	0f043a81eb	proc tty: remove struct tty_operations::read_proc struct tty_operations::proc_fops took it's place and there is one less create_proc_read_entry() user now! Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:10 -07:00
Alexey Dobriyan	ae149b6bec	proc tty: add struct tty_operations::proc_fops Used for gradual switch of TTY drivers from using ->read_proc which helps with gradual switch from ->read_proc for the whole tree. As side effect, fix possible race condition when ->data initialized after PDE is hooked into proc tree. ->proc_fops takes precedence over ->read_proc. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:08 -07:00
Dmitri Vorobiev	ced117c73e	Remove two unneeded exports and make two symbols static in fs/mpage.c Commit `29a814d2ee` (vfs: add hooks for ext4's delayed allocation support) exported the following functions mpage_bio_submit() __mpage_writepage() for the benefit of ext4's delayed allocation support. Since commit `a1d6cc563b` (ext4: Rework the ext4_da_writepages() function), these functions are not used by the ext4 driver anymore. However, the now unnecessary exports still remain, and this patch removes those. Moreover, these two functions can become static again. The issue was spotted by namespacecheck. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-01 07:38:54 -04:00
Al Viro	47e4491b40	Cleanup after commit `585d3bc06f` fsync_bdev() export and a bunch of stubs for !CONFIG_BLOCK case had been left behind Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-01 07:07:16 -04:00
Al Viro	e5824c97a9	Trim includes of fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:28 -04:00
Al Viro	d9e66c7296	Don't crap into descriptor table in binfmt_som Same story as in binfmt_elf, except that in binfmt_som we actually forget to close the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:28 -04:00
Al Viro	d0f35dde6e	Trim includes in binfmt_elf Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	e7b9b550f5	Don't mess with descriptor table in load_elf_binary() ... since we don't tell anyone which descriptor does the file get. We used to, but only in case of ELF binary with a.out loader and that stuff has been gone for a while. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	5ad4e53bd5	Get rid of indirect include of fs_struct.h Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	ce3b0f8d5c	New helper - current_umask() current->fs->umask is what most of fs_struct users are doing. Put that into a helper function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	f1191b50ec	check_unsafe_exec() doesn't care about signal handlers sharing ... since we'll unshare sighand anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	498052bba5	New locking/refcounting for fs_struct * all changes of current->fs are done under task_lock and write_lock of old fs->lock * refcount is not atomic anymore (same protection) * its decrements are done when removing reference from current; at the same time we decide whether to free it. * put_fs_struct() is gone * new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do execve() and only subthreads share fs_struct. Cleared when finishing exec (success and failure alike). Makes CLONE_FS fail with -EAGAIN if set. * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread is in progress. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	3e93cd6718	Take fs_struct handling to new file (fs/fs_struct.c) Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	f8ef3ed2be	Get rid of bumping fs_struct refcount in pivot_root(2) Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:25 -04:00
Chris Mason	d57e62b897	Btrfs: try to free metadata pages when we free btree blocks COW means we cycle though blocks fairly quickly, and once we free an extent on disk, it doesn't make much sense to keep the pages around. This commit tries to immediately free the page when we free the extent, which lowers our memory footprint significantly. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-03-31 14:27:58 -04:00
Chris Mason	5a3f23d515	Btrfs: add extra flushing for renames and truncates Renames and truncates are both common ways to replace old data with new data. The filesystem can make an effort to make sure the new data is on disk before actually replacing the old data. This is especially important for rename, which many application use as though it were atomic for both the data and the metadata involved. The current btrfs code will happily replace a file that is fully on disk with one that was just created and still has pending IO. If we crash after transaction commit but before the IO is done, we'll end up replacing a good file with a zero length file. The solution used here is to create a list of inodes that need special ordering and force them to disk before the commit is done. This is similar to the ext3 style data=ordering, except it is only done on selected files. Btrfs is able to get away with this because it does not wait on commits very often, even for fsync (which use a sub-commit). For renames, we order the file when it wasn't already on disk and when it is replacing an existing file. Larger files are sent to filemap_flush right away (before the transaction handle is opened). For truncates, we order if the file goes from non-zero size down to zero size. This is a little different, because at the time of the truncate the file has no dirty bytes to order. But, we flag the inode so that it is added to the ordered list on close (via release method). We also immediately add it to the ordered list of the current transaction so that we can try to flush down any writes the application sneaks in before commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-03-31 14:27:58 -04:00
Boaz Harrosh	0d8fe329a8	fs: Add exofs to Kernel build - Add exofs to fs/Kconfig under "menu 'Miscellaneous filesystems'" - Add exofs to fs/Makefile Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:40 +03:00
Boaz Harrosh	214c8adb87	exofs: Documentation Added some documentation in exofs.txt, as well as a BUGS file. For further reading, operation instructions, example scripts and up to date infomation and code please see: http://open-osd.org Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:38 +03:00
Boaz Harrosh	8cf74b3936	exofs: export_operations implement export_operations and set in superblock. It is now posible to export exofs via nfs Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:36 +03:00
Boaz Harrosh	ba9e5e98ca	exofs: super_operations and file_system_type This patch ties all operation vectors into a file system superblock and registers the exofs file_system_type at module's load time. * The file system control block (AKA on-disk superblock) resides in an object with a special ID (defined in common.h). Information included in the file system control block is used to fill the in-memory superblock structure at mount time. This object is created before the file system is used by mkexofs.c It contains information such as: - The file system's magic number - The next inode number to be allocated Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:34 +03:00
Boaz Harrosh	e6af00f1d1	exofs: dir_inode and directory operations implementation of directory and inode operations. * A directory is treated as a file, and essentially contains a list of <file name, inode #> pairs for files that are found in that directory. The object IDs correspond to the files' inode numbers and are allocated using a 64bit incrementing global counter. * Each file's control block (AKA on-disk inode) is stored in its object's attributes. This applies to both regular files and other types (directories, device files, symlinks, etc.). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:31 +03:00
Boaz Harrosh	beaec07ba6	exofs: address_space_operations OK Now we start to read and write from osd-objects. We try to collect at most contiguous pages as possible in a single write/read. The first page index is the object's offset. TODO: In 64-bit a single bio can carry at most 128 pages. Add support of chaining multiple bios Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:29 +03:00
Boaz Harrosh	982980d753	exofs: symlink_inode and fast_symlink_inode operations Generic implementation of symlink ops. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:27 +03:00
Boaz Harrosh	e806271916	exofs: file and file_inode operations implementation of the file_operations and inode_operations for regular data files. Most file_operations are generic vfs implementations except: - exofs_truncate will truncate the OSD object as well - Generic file_fsync is not good for none_bd devices so open code it - The default for .flush in Linux is todo nothing so call exofs_fsync on the file. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:24 +03:00
Boaz Harrosh	b14f8ab284	exofs: Kbuild, Headers and osd utils This patch includes osd infrastructure that will be used later by the file system. Also the declarations of constants, on disk structures, and prototypes. And the Kbuild+Kconfig files needed to build the exofs module. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-03-31 19:44:20 +03:00
Adrian Hunter	de0975781a	UBIFS: fix recovery bug UBIFS did not recovery in a situation in which it could have. The relevant function assumed there could not be more nodes in an eraseblock after a corrupted node, but in fact the last (NAND) page written might contain anything. The correct approach is to check for empty space (0xFF bytes) from then on. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>	2009-03-31 14:58:40 +03:00
Felix Blyakher	1aacc064e0	Revert "xfs: increase the maximum number of supported ACL entries" This reverts commit `8b11217173`. Premature unintended commit. Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-03-31 00:23:37 -05:00
NeilBrown	bff61975b3	md: move lots of #include lines out of .h files and into .c This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:33:13 +11:00
Felix Blyakher	5123bc35d2	Merge branch 'master' of git://git.kernel.org/pub/scm/fs/xfs/xfs	2009-03-30 22:17:44 -05:00
Felix Blyakher	930861c4e6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-03-30 22:08:33 -05:00
Linus Torvalds	d17abcd541	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: oprofile: Thou shalt not call __exit functions from __init functions cpumask: remove the now-obsoleted pcibus_to_cpumask(): generic cpumask: remove cpumask_t from core cpumask: convert rcutorture.c cpumask: use new cpumask_ functions in core code. cpumask: remove references to struct irqaction's mask field. cpumask: use mm_cpumask() wrapper: kernel/fork.c cpumask: use set_cpu_active in init/main.c cpumask: remove node_to_first_cpu cpumask: fix seq_bitmap_*() functions. cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL	2009-03-30 18:00:26 -07:00
Linus Torvalds	cf2f7d7c90	Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: Revert "proc: revert /proc/uptime to ->read_proc hook" proc 2/2: remove struct proc_dir_entry::owner proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc proc: fix sparse warnings in pagemap_read() proc: move fs/proc/inode-alloc.txt comment into a source file	2009-03-30 16:06:04 -07:00
Jeff Mahoney	3a355cc61d	reiserfs: xattr_create is unused with xattrs disabled This patch ifdefs xattr_create when xattrs aren't enabled. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 14:28:58 -07:00
Alexey Dobriyan	a9caa3de24	Revert "proc: revert /proc/uptime to ->read_proc hook" This reverts commit `6c87df37dc`. proc files implemented through seq_file do pread(2) now. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:58 +04:00
Alexey Dobriyan	99b7623380	proc 2/2: remove struct proc_dir_entry::owner Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy as correctly noted at bug #12454. Someone can lookup entry with NULL ->owner, thus not pinning enything, and release it later resulting in module refcount underflow. We can keep ->owner and supply it at registration time like ->proc_fops and ->data. But this leaves ->owner as easy-manipulative field (just one C assignment) and somebody will forget to unpin previous/pin current module when switching ->owner. ->proc_fops is declared as "const" which should give some thoughts. ->read_proc/->write_proc were just fixed to not require ->owner for protection. rmmod'ed directories will be empty and return "." and ".." -- no harm. And directories with tricky enough readdir and lookup shouldn't be modular. We definitely don't want such modular code. Removing ->owner will also make PDE smaller. So, let's nuke it. Kudos to Jeff Layton for reminding about this, let's say, oversight. http://bugzilla.kernel.org/show_bug.cgi?id=12454 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:44 +04:00
Alexey Dobriyan	3dec7f59c3	proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc struct proc_dir_entry::owner is going to be removed. Now it's only necessary to protect PDEs which are using ->read_proc, ->write_proc hooks. However, ->owner assignments are racy and make it very easy for someone to switch ->owner on live PDE (as some subsystems do) without fixing refcounts and so on. http://bugzilla.kernel.org/show_bug.cgi?id=12454 So, ->owner is on death row. Proxy file operations exist already (proc_file_operations), just bump usecount when necessary. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:27 +04:00
Milind Arun Choudhary	09729a9919	proc: fix sparse warnings in pagemap_read() fs/proc/task_mmu.c:696:12: warning: cast removes address space of expression fs/proc/task_mmu.c:696:9: warning: incorrect type in assignment (different address spaces) fs/proc/task_mmu.c:696:9: expected unsigned long long [noderef] [usertype] <asn:1>out fs/proc/task_mmu.c:696:9: got unsigned long long [usertype] <noident> fs/proc/task_mmu.c:697:12: warning: cast removes address space of expression fs/proc/task_mmu.c:697:9: warning: incorrect type in assignment (different address spaces) fs/proc/task_mmu.c:697:9: expected unsigned long long [noderef] [usertype] <asn:1>end fs/proc/task_mmu.c:697:9: got unsigned long long [usertype] <noident> fs/proc/task_mmu.c:723:12: warning: cast removes address space of expression fs/proc/task_mmu.c:723:26: error: subtraction of different types can't work (different address spaces) fs/proc/task_mmu.c:725:24: error: subtraction of different types can't work (different address spaces) Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:22 +04:00
Randy Dunlap	1681bc30f2	proc: move fs/proc/inode-alloc.txt comment into a source file so that people will realize that it exists and can update it as needed. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:13:12 +04:00
Benny Halevy	2076601632	nfsd: remove nfsd4_ops array size There's no need for it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-30 17:03:11 -04:00
Linus Torvalds	e1c5024828	Merge branch 'reiserfs-updates' from Jeff Mahoney * reiserfs-updates: (35 commits) reiserfs: rename [cn]_* variables reiserfs: rename p_._ variables reiserfs: rename p_s_tb to tb reiserfs: rename p_s_inode to inode reiserfs: rename p_s_bh to bh reiserfs: rename p_s_sb to sb reiserfs: strip trailing whitespace reiserfs: cleanup path functions reiserfs: factor out buffer_info initialization reiserfs: add atomic addition of selinux attributes during inode creation reiserfs: use generic readdir for operations across all xattrs reiserfs: journaled xattrs reiserfs: use generic xattr handlers reiserfs: remove i_has_xattr_dir reiserfs: make per-inode xattr locking more fine grained reiserfs: eliminate per-super xattr lock reiserfs: simplify xattr internal file lookups/opens reiserfs: Clean up xattrs when REISERFS_FS_XATTR is unset reiserfs: remove IS_PRIVATE helpers reiserfs: remove link detection code ... Fixed up conflicts manually due to: - quota name cleanups vs variable naming changes: fs/reiserfs/inode.c fs/reiserfs/namei.c fs/reiserfs/stree.c fs/reiserfs/xattr.c - exported include header cleanups include/linux/reiserfs_fs.h	2009-03-30 12:33:01 -07:00
Jeff Mahoney	ee93961be1	reiserfs: rename [cn]_* variables This patch renames n_, c_, etc variables to something more sane. This is the sixth in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:40 -07:00
Jeff Mahoney	d68caa9530	reiserfs: rename p_._ variables This patch is a simple s/p_._//g to the reiserfs code. This is the fifth in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:40 -07:00
Jeff Mahoney	a063ae1792	reiserfs: rename p_s_tb to tb This patch is a simple s/p_s_tb/tb/g to the reiserfs code. This is the fourth in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:40 -07:00
Jeff Mahoney	995c762ea4	reiserfs: rename p_s_inode to inode This patch is a simple s/p_s_inode/inode/g to the reiserfs code. This is the third in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	ad31a4fc03	reiserfs: rename p_s_bh to bh This patch is a simple s/p_s_bh/bh/g to the reiserfs code. This is the second in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	a9dd364358	reiserfs: rename p_s_sb to sb This patch is a simple s/p_s_sb/sb/g to the reiserfs code. This is the first in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	0222e6571c	reiserfs: strip trailing whitespace This patch strips trailing whitespace from the reiserfs code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	3cd6dbe6fe	reiserfs: cleanup path functions This patch cleans up some redundancies in the reiserfs tree path code. decrement_bcount() is essentially the same function as brelse(), so we use that instead. decrement_counters_in_path() is exactly the same function as pathrelse(), so we kill that and use pathrelse() instead. There's also a bit of cleanup that makes the code a bit more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	fba4ebb5f0	reiserfs: factor out buffer_info initialization This is the first in a series of patches to make balance_leaf() not quite so insane. This patch factors out the open coded initializations of buffer_info structures and defines a few initializers for the 4 cases they're used. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	57fe60df62	reiserfs: add atomic addition of selinux attributes during inode creation Some time ago, some changes were made to make security inode attributes be atomically written during inode creation. ReiserFS fell behind in this area, but with the reworking of the xattr code, it's now fairly easy to add. The following patch adds the ability for security attributes to be added automatically during inode creation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	a41f1a4715	reiserfs: use generic readdir for operations across all xattrs The current reiserfs xattr implementation open codes reiserfs_readdir and frees the path before calling the filldir function. Typically, the filldir function is something that modifies the file system, such as a chown or an inode deletion that also require reading of an inode associated with each direntry. Since the file system is modified, the path retained becomes invalid for the next run. In addition, it runs backwards in attempt to minimize activity. This is clearly suboptimal from a code cleanliness perspective as well as performance-wise. This patch implements a generic reiserfs_for_each_xattr that uses the generic readdir and a specific filldir routine that simply populates an array of dentries and then performs a specific operation on them. When all files have been operated on, it then calls the operation on the directory itself. The result is a noticable code reduction and better performance. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	0ab2621ebd	reiserfs: journaled xattrs Deadlocks are possible in the xattr code between the journal lock and the xattr sems. This patch implements journalling for xattr operations. The benefit is twofold: * It gets rid of the deadlock possibility by always ensuring that xattr write operations are initiated inside a transaction. * It corrects the problem where xattr backing files aren't considered any differently than normal files, despite the fact they are metadata. I discussed the added journal load with Chris Mason, and we decided that since xattrs (versus other journal activity) is fairly rare, the introduction of larger transactions to support journaled xattrs wouldn't be too big a deal. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	48b32a3553	reiserfs: use generic xattr handlers Christoph Hellwig had asked me quite some time ago to port the reiserfs xattrs to the generic xattr interface. This patch replaces the reiserfs-specific xattr handling code with the generic struct xattr_handler. However, since reiserfs doesn't split the prefix and name when accessing xattrs, it can't leverage generic_{set,get,list,remove}xattr without needlessly reconstructing the name on the back end. Update 7/26/07: Added missing dput() to deletion path. Update 8/30/07: Added missing mark_inode_dirty when i_mode is used to represent an ACL and no previous ACL existed. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	8ecbe550a1	reiserfs: remove i_has_xattr_dir With the changes to xattr root locking, the i_has_xattr_dir flag is no longer needed. This patch removes it. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	8b6dd72a44	reiserfs: make per-inode xattr locking more fine grained The per-inode locking can be made more fine-grained to surround just the interaction with the filesystem itself. This really only applies to protecting reads during a write, since concurrent writes are barred with inode->i_mutex at the vfs level. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	d984561b32	reiserfs: eliminate per-super xattr lock With the switch to using inode->i_mutex locking during lookups/creation in the xattr root, the per-super xattr lock is no longer needed. This patch removes it. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	6c17675e1e	reiserfs: simplify xattr internal file lookups/opens The xattr file open/lookup code is needlessly complex. We can use vfs-level operations to perform the same work, and also simplify the locking constraints. The locking advantages will be exploited in future patches. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	a72bdb1cd2	reiserfs: Clean up xattrs when REISERFS_FS_XATTR is unset The current reiserfs xattr implementation will not clean up old xattr files if files are deleted when REISERFS_FS_XATTR is unset. This results in inaccessible lost files, wasting space. This patch compiles in basic xattr knowledge, such as how to delete them and change ownership for quota tracking. If the file system has never used xattrs, then the operation is quite fast: it returns immediately when it sees there is no .reiserfs_priv directory. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	6dfede6963	reiserfs: remove IS_PRIVATE helpers There are a number of helper functions for marking a reiserfs inode private that were leftover from reiserfs did its own thing wrt to private inodes. S_PRIVATE has been in the kernel for some time, so this patch removes the helpers and uses IS_PRIVATE instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	010f5a21a3	reiserfs: remove link detection code Early in the reiserfs xattr development, there was a plan to use hardlinks to save disk space for identical xattrs. That code never materialized and isn't going to, so this patch removes the detection code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	ec6ea56b2f	reiserfs: xattr reiserfs_get_page takes offset instead of index This patch changes reiserfs_get_page to take an offset rather than an index since no callers calculate the index differently. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	f437c529e3	reiserfs: small variable cleanup This patch removes the xinode and mapping variables from reiserfs_xattr_{get,set}. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	0030b64570	reiserfs: use reiserfs_error() This patch makes many paths that are currently using warnings to handle the error. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	1e5e59d431	reiserfs: introduce reiserfs_error() Although reiserfs can currently handle severe errors such as journal failure, it cannot handle less severe errors like metadata i/o failure. The following patch adds a reiserfs_error() function akin to the one in ext3. Subsequent patches will use this new error handler to handle errors more gracefully in general. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	32e8b10629	reiserfs: rearrange journal abort This patch kills off reiserfs_journal_abort as it is never called, and combines __reiserfs_journal_abort_{soft,hard} into one function called reiserfs_abort_journal, which performs the same work. It is silent as opposed to the old version, since the message was always issued after a regular 'abort' message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	c3a9c2109f	reiserfs: rework reiserfs_panic ReiserFS panics can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	78b6513d28	reiserfs: add locking around error buffer The formatting of the error buffer is race prone. It uses static buffers for both formatting and output. While overwriting the error buffer can product garbled output, overwriting the format buffer with incompatible % directives can cause crashes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	cacbe3d7ad	reiserfs: prepare_error_buf wrongly consumes va_arg vsprintf will consume varargs on its own. Skipping them manually results in garbage in the error buffer, or Oopses in the case of pointers. This patch removes the advancement and fixes a number of bugs where crashes were observed as side effects of a regular error report. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	45b03d5e8e	reiserfs: rework reiserfs_warning ReiserFS warnings can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	1d889d9958	reiserfs: make some warnings informational In several places, reiserfs_warning is used when there is no warning, just a notice. This patch changes some of them to indicate that the message is merely informational. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Jeff Mahoney	a5437152ee	reiserfs: use more consistent printk formatting The output format between a warning/error/panic/info/etc changes with which one is used. The following patch makes the messages more internally consistent, but also more consistent with other Linux filesystems. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Jeff Mahoney	eba0030559	reiserfs: use buffer_info for leaf_paste_entries This patch makes leaf_paste_entries more consistent with respect to the other leaf operations. Using buffer_info instead of buffer_head directly allows us to get a superblock pointer for use in error handling. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Jeff Mahoney	600ed41675	reiserfs: audit transaction ids to always be unsigned ints This patch fixes up the reiserfs code such that transaction ids are always unsigned ints. In places they can currently be signed ints or unsigned longs. The former just causes an annoying clm-2200 warning and may join a transaction when it should wait. The latter is just for correctness since the disk format uses a 32-bit transaction id. There aren't any runtime problems that result from it not wrapping at the correct location since the value is truncated correctly even on big endian systems. The 0 value might make it to disk, but the mount-time checks will bump it to 10 itself. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Jeff Mahoney	702d21c6f6	reiserfs: add support for mount count incrementing The following patch adds the fields for tracking mount counts and last fsck timestamps to the superblock. It also increments the mount count on every read-write mount. Reiserfsprogs 3.6.21 added support for these fields. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Linus Torvalds	2d25ee36c8	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: Fix a lockdep warning in fasync_helper() Add a missing unlock_kernel() in raw_open()	2009-03-30 11:31:47 -07:00
Linus Torvalds	b80e0d2716	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix fuse_file_lseek returning with lock held	2009-03-30 10:08:10 -07:00
Linus Torvalds	ffd1428514	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: needs crc32_le jfs: Fix error handling in metapage_writepage() jfs: return f_fsid for statfs(2) jfs: remove xtLookupList() jfs: clean up a dangling comment	2009-03-30 10:02:36 -07:00
Dan Carpenter	5291658d87	fuse: fix fuse_file_lseek returning with lock held This bug was found with smatch (http://repo.or.cz/w/smatch.git/). If we return directly the inode->i_mutex lock doesn't get released. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-03-30 17:26:24 +02:00
Jonathan Corbet	4a6a449969	Fix a lockdep warning in fasync_helper() Lockdep gripes if file->f_lock is taken in a no-IRQ situation, since that is not always the case. We don't really want to disable IRQs for every acquisition of f_lock; instead, just move it outside of fasync_lock. Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-30 08:00:24 -06:00
Tero Roponen	a4e49cb69e	trivial: remove unused variable 'path' in alloc_file() 'struct path' is not used in alloc_file(). Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:03 +02:00
Masatake YAMATO	3e3cb64f6c	trivial: fix a pdlfush -> pdflush typo in comment Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:03 +02:00
Alberto Bertogli	c7eee1b836	trivial: Fix typo in bio_split()'s documentation Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:02 +02:00
Matt LaPlante	692105b8ac	trivial: fix typos/grammar errors in Kconfig texts Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:01 +02:00
Uwe Kleine-Koenig	973c32bebf	trivial: fix typo "kernal" -> "kernel" Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:21:57 +02:00
Rusty Russell	af76aba00f	cpumask: fix seq_bitmap_*() functions. 1) seq_bitmap_list() should take a const. 2) All the seq_bitmap should use cpumask_bits(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-30 22:05:11 +10:30
Christoph Hellwig	27174203f5	xfs: cleanup uuid handling The uuid table handling should not be part of a semi-generic uuid library but in the XFS code using it, so move those bits to xfs_mount.c and refactor the whole glob to make it a proper abstraction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-03-30 10:21:31 +02:00
Andy Adamson	e354d571bb	nfsd: embed nfsd4_current_state in nfsd4_compoundres Remove the allocation of struct nfsd4_compound_state. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-29 16:20:12 -04:00
Christoph Hellwig	1a5902c5d2	xfs: remove m_attroffset With the upcoming v3 inodes the default attroffset needs to be calculated for each specific inode, so we can't cache it in the superblock anymore. Also replace the assert for wrong inode sizes with a proper error check also included in non-debug builds. Note that the ENOSYS return for that might seem odd, but that error is returned by xfs_mount_validate_sb for all theoretically valid but not supported filesystem geometries. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>	2009-03-29 19:26:46 +02:00
Malcolm Parsons	9da096fd13	xfs: fix various typos Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-03-29 09:55:42 +02:00
Hisashi Hifumi	bddaafa11a	xfs: pagecache usage optimization Hi. I introduced "is_partially_uptodate" aops for XFS. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. #sysbench --num-threads=4 --max-requests=100000 --test=fileio --file-num=1 \ --file-block-size=8K --file-total-size=1G --file-test-mode=rndrw \ --file-fsync-freq=0 --file-rw-ratio=0.5 run -2.6.29-rc6 Test execution summary: total time: 123.8645s total number of events: 100000 total time taken by event execution: 442.4994 per-request statistics: min: 0.0000s avg: 0.0044s max: 0.3387s approx. 95 percentile: 0.0118s -2.6.29-rc6-patched Test execution summary: total time: 108.0757s total number of events: 100000 total time taken by event execution: 417.7505 per-request statistics: min: 0.0000s avg: 0.0042s max: 0.3217s approx. 95 percentile: 0.0118s arch: ia64 pagesize: 16k blocksize: 4k Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-03-29 09:53:38 +02:00
Christoph Hellwig	6447c36209	xfs: remove m_litino With the upcoming v3 inodes the inode data/attr area size needs to be calculated for each specific inode, so we can't cache it in the superblock anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-03-29 09:51:14 +02:00
Christoph Hellwig	a19d9f887d	xfs: kill ino64 mount option The ino64 mount option adds a fixed offset to 32bit inode numbers to bring them into the 64bit range. There's no need for this kind of debug tool given that it's easy to produce real 64bit inode numbers for testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-03-29 09:51:08 +02:00
Christoph Hellwig	a0b0b8a5b3	xfs: kill mutex_t typedef People continue to complain about this for weird reasons, but there's really no point in keeping this typedef for a couple of users anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com>	2009-03-29 09:51:00 +02:00
Hugh Dickins	7c2c7d9930	fix setuid sometimes wouldn't check_unsafe_exec() also notes whether the fs_struct is being shared by more threads than will get killed by the exec, and if so sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid. But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient use of get_fs_struct(), which also raises that sharing count. This might occasionally cause a setuid program not to change euid, in the same way as happened with files->count (check_unsafe_exec also looks at sighand->count, but /proc doesn't raise that one). We'd prefer exec not to unshare fs_struct: so fix this in procfs, replacing get_fs_struct() by get_fs_path(), which does path_get while still holding task_lock, instead of raising fs->count. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org ___ fs/proc/base.c \| 50 +++++++++++++++-------------------------------- 1 file changed, 16 insertions(+), 34 deletions(-) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-28 17:30:00 -07:00
Hugh Dickins	e426b64c41	fix setuid sometimes doesn't Joe Malicki reports that setuid sometimes doesn't: very rarely, a setuid root program does not get root euid; and, by the way, they have a health check running lsof every few minutes. Right, check_unsafe_exec() notes whether the files_struct is being shared by more threads than will get killed by the exec, and if so sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid. But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient use of get_files_struct(), which also raises that sharing count. There's a rather simple fix for this: exec's check on files->count has been redundant ever since 2.6.1 made it unshare_files() (except while compat_do_execve() omitted to do so) - just remove that check. [Note to -stable: this patch will not apply before 2.6.29: earlier releases should just remove the files->count line from unsafe_exec().] Reported-by: Joe Malicki <jmalicki@metacarta.com> Narrowed-down-by: Michael Itz <mitz@metacarta.com> Tested-by: Joe Malicki <jmalicki@metacarta.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-28 17:30:00 -07:00
Hugh Dickins	53e9309e01	compat_do_execve should unshare_files 2.6.26's commit `fd8328be87` "sanitize handling of shared descriptor tables in failing execve()" moved the unshare_files() from flush_old_exec() and several binfmts to the head of do_execve(); but forgot to make the same change to compat_do_execve(), leaving a CLONE_FILES files_struct shared across exec from a 32-bit process on a 64-bit kernel. It's arguable whether the files_struct really ought to be unshared across exec; but 2.6.1 made that so to stop the loading binary's fd leaking into other threads, and a 32-bit process on a 64-bit kernel ought to behave in the same way as 32 on 32 and 64 on 64. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-28 17:30:00 -07:00
Chuck Lever	3c8c45dfab	NFS: Simplify logic to compare socket addresses in client.c Callback requests from IPv4 servers are now always guaranteed to be AF_INET, and never mapped IPv4 AF_INET6 addresses. Both nfs_match_client() and nfs_find_client() can now share the same address comparison logic, so fold them together. We can also dispense with of most of the conditional compilation in here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 16:51:04 -04:00
Trond Myklebust	d188262d60	Merge commit '9f4c899c0d90e1b51b6864834f3877b47c161a0e' into devel	2009-03-28 16:50:58 -04:00
Chuck Lever	f738f51703	NFS: Start PF_INET6 callback listener only if IPv6 support is available Apparently a lot of people need to disable IPv6 completely on their distributor-built systems, which have CONFIG_IPV6_MODULE enabled at build time. They do this by blacklisting the ipv6.ko module. This causes the creation of the NFSv4 callback service listener to fail if CONFIG_IPV6_MODULE is set, but the module cannot be loaded. Now that the kernel's PF_INET6 RPC listeners are completely separate from PF_INET listeners, we can always start PF_INET. Then the NFS client can try to start a PF_INET6 listener, but it isn't required to be available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 16:02:43 -04:00
Chuck Lever	eb16e90778	lockd: Start PF_INET6 listener only if IPv6 support is available Apparently a lot of people need to disable IPv6 completely on their distributor-built systems, which have CONFIG_IPV6_MODULE enabled at build time. They do this by blacklisting the ipv6.ko module. This causes the creation of the lockd service listener to fail if CONFIG_IPV6_MODULE is set, but the module cannot be loaded. Now that the kernel's PF_INET6 RPC listeners are completely separate from PF_INET listeners, we can always start PF_INET. Then lockd can try to start PF_INET6, but it isn't required to be available. Note this has the added benefit that NLM callbacks from AF_INET6 servers will never come from AF_INET remotes. We no longer have to worry about matching mapped IPv4 addresses to AF_INET when comparing addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 16:01:16 -04:00
Chuck Lever	26298caaca	NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks We're about to convert over to using separate PF_INET and PF_INET6 listeners, instead of a single PF_INET6 listener that also receives AF_INET requests and maps them to AF_INET6. Clear the way by removing the logic in lockd and the NFSv4 callback server that creates an AF_INET6 service listener. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:55:06 -04:00
Chuck Lever	49a9072f29	SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() Since an RPC service listener's protocol family is specified now via svc_create_xprt(), it no longer needs to be passed to svc_create() or svc_create_pooled(). Remove that argument from the synopsis of those functions, and remove the sv_family field from the svc_serv struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:48 -04:00
Chuck Lever	9652ada3fb	SUNRPC: Change svc_create_xprt() to take a @family argument The sv_family field is going away. Pass a protocol family argument to svc_create_xprt() instead of extracting the family from the passed-in svc_serv struct. Again, as this is a listener socket and not an address, we make this new argument an "int" protocol family, instead of an "sa_family_t." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:36 -04:00
Chuck Lever	adbbe92956	NFSD: If port value written to /proc/fs/nfsd/portlist is invalid, return EINVAL Make sure port value read from user space by write_ports is valid before passing it to svc_find_xprt(). If it wasn't, the writer would get ENOENT instead of EINVAL. Noticed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:53:42 -04:00
Theodore Ts'o	06705bff91	ext4: Regularize mount options Add support for using the mount options "barrier" and "nobarrier", and "auto_da_alloc" and "noauto_da_alloc", which is more consistent than "barrier=<0\|1>" or "auto_da_alloc=<0\|1>". Most other ext3/ext4 mount options use the foo/nofoo naming convention. We allow the old forms of these mount options for backwards compatibility. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-28 10:59:57 -04:00
Theodore Ts'o	512a004382	ext3: Use WRITE_SYNC for commits which are caused by fsync() If a commit is triggered by fsync(), set a flag indicating the journal blocks associated with the transaction should be flushed out using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz>	2009-03-27 22:14:27 -04:00
Theodore Ts'o	a64c8610bd	block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks When doing synchronous writes because wbc->sync_mode is set to WBC_SYNC_ALL, send the write request using WRITE_SYNC, so that we don't unduly block system calls such as fsync(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz>	2009-03-27 22:14:10 -04:00
Theodore Ts'o	e7c9e3e99a	ext4: fix locking typo in mballoc which could cause soft lockup hangs Smatch (http://repo.or.cz/w/smatch.git/) complains about the locking in ext4_mb_add_n_trim() from fs/ext4/mballoc.c 4438 list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order], 4439 pa_inode_list) { 4440 spin_lock(&tmp_pa->pa_lock); 4441 if (tmp_pa->pa_deleted) { 4442 spin_unlock(&pa->pa_lock); 4443 continue; 4444 } Brown paper bag time... Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-03-27 19:43:21 -04:00
Dan Carpenter	a7b19448dd	ext4: fix typo which causes a memory leak on error path This was found by smatch (http://repo.or.cz/w/smatch.git/) Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-03-27 19:42:54 -04:00
Linus Torvalds	3ae5080f4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...	2009-03-27 16:23:12 -07:00
Felix Blyakher	8b11217173	xfs: increase the maximum number of supported ACL entries With big installation current 25 maximum number of supported ACL entries is not enough any more. Increase the limit to 100. Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-03-27 17:28:43 -05:00
Linus Torvalds	2c9e15a011	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits) ext2: Zero our b_size in ext2_quota_read() trivial: fix typos/grammar errors in fs/Kconfig quota: Coding style fixes quota: Remove superfluous inlines quota: Remove uppercase aliases for quota functions. nfsd: Use lowercase names of quota functions jfs: Use lowercase names of quota functions udf: Use lowercase names of quota functions ufs: Use lowercase names of quota functions reiserfs: Use lowercase names of quota functions ext4: Use lowercase names of quota functions ext3: Use lowercase names of quota functions ext2: Use lowercase names of quota functions ramfs: Remove quota call vfs: Use lowercase names of quota functions quota: Remove dqbuf_t and other cleanups quota: Remove NODQUOT macro quota: Make global quota locks cacheline aligned quota: Move quota files into separate directory ext4: quota reservation for delayed allocation ...	2009-03-27 14:48:34 -07:00
Linus Torvalds	805de022b1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fix length calculation in compat code dlm: ignore cancel on granted lock dlm: clear defunct cancel state dlm: replace idr with hash table for connections dlm: comment typo fixes dlm: use ipv6_addr_copy dlm: Change rwlock which is only used in write mode to a spinlock	2009-03-27 14:48:07 -07:00
Jan Kara	86db97c87f	jbd2: Update locking coments Update information about locking in JBD2 revoke code. Inconsistency in comments found by Lin Tan <tammy000@gmail.com>. CC: Lin Tan <tammy000@gmail.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-27 17:20:40 -04:00
Aneesh Kumar K.V	cc0fb9ad7d	ext4: Rename pa_linear to pa_type Impact: code cleanup This patch rename pa_linear to pa_type and add MB_INODE_PA and MB_GROUP_PA to indicate inode and group prealloc space. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-27 17:16:58 -04:00
Thiemo Nagel	fe2c8191fa	ext4: add checks of block references for non-extent inodes Check block references in the inode and indorect blocks for non-extent inodes to make sure they are valid, and flag an error if they are invalid. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-31 08:36:10 -04:00
Nick Piggin	aabb8fdb41	fs: avoid I_NEW inodes To be on the safe side, it should be less fragile to exclude I_NEW inodes from inode list scans by default (unless there is an important reason to have them). Normally they will get excluded (eg. by zero refcount or writecount etc), however it is a bit fragile for list walkers to know exactly what parts of the inode state is set up and valid to test when in I_NEW. So along these lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc checks with them too -- this shouldn't be a problem should it?) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:05 -04:00
Sukadev Bhattiprolu	1bd7903560	Merge code for single and multiple-instance mounts new_pts_mount() (including the get_sb_nodev()), shares a lot of code with init_pts_mount(). The only difference between them is the 'test-super' function passed into sget(). Move all common code into devpts_get_sb() and remove the new_pts_mount() and init_pts_mount() functions, Changelog[v3]: [Serge Hallyn]: Remove unnecessary printk()s Changelog[v2]: (Christoph Hellwig): Merge code in 'do_pts_mount()' into devpts_get_sb() Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:04 -04:00
Sukadev Bhattiprolu	289f00e225	Remove get_init_pts_sb() With mknod_ptmx() moved to devpts_get_sb(), init_pts_mount() becomes a wrapper around get_init_pts_sb(). Remove get_init_pts_sb() and fold code into init_pts_mount(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:04 -04:00
Sukadev Bhattiprolu	945cf2c79f	Move common mknod_ptmx() calls into caller We create 'ptmx' node in both single-instance and multiple-instance mounts. So devpts_get_sb() can call mknod_ptmx() once rather than have both modes calling mknod_ptmx() separately. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:04 -04:00
Sukadev Bhattiprolu	482984f06d	Parse mount options just once and copy them to super block Since all the mount option parsing is done in devpts, we could do it just once and pass it around in devpts functions and eventually store it in the super block. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:04 -04:00
Sukadev Bhattiprolu	fdbf534866	Unroll essentials of do_remount_sb() into devpts On remount, devpts fs only needs to parse the mount options. Users cannot directly create/dirty files in /dev/pts so the MS_RDONLY flag and shrinking the dcache does not really apply to devpts. So effectively on remount, devpts only parses the mount options and updates these options in its super block. As such, we could replace do_remount_sb() call with a direct parse_mount_options(). Doing so enables subsequent patches to avoid parsing the mount options twice and simplify the code. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Sukadev Bhattiprolu	a3ec947c85	vfs: simple_set_mnt() should return void simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Nick Piggin	585d3bc06f	fs: move bdev code out of buffer.c Move some block device related code out from buffer.c and put it in block_dev.c. I'm trying to move non-buffer_head code out of buffer.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Al Viro	3ba13d179e	constify dentry_operations: rest Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Al Viro	296c2d8663	constify dentry_operations: configfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Al Viro	ee1ec32903	constify dentry_operations: sysfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:02 -04:00
Al Viro	ad28b4ef19	constify dentry_operations: JFS Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:02 -04:00
Al Viro	d8fba0ffe5	constify dentry_operations: OCFS2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:02 -04:00
Al Viro	92cecbbfa3	constify dentry_operations: GFS2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:02 -04:00
Al Viro	ce6cdc474a	constify dentry_operations: FAT Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Al Viro	4269590a72	constify dentry_operations: FUSE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Al Viro	d72f71eb0e	constify dentry_operations: procfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Al Viro	5a3fd05a9b	constify dentry_operations: ecryptfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Al Viro	4fd03e84d8	constify dentry_operations: CIFS Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Al Viro	79be57cc7f	constify dentry_operations: AFS Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:00 -04:00
Al Viro	08f11513fa	constify dentry_operations: autofs, autofs4 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:00 -04:00
Al Viro	a488257ce5	constify dentry_operations: 9p Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:00 -04:00
Al Viro	e16404ed0f	constify dentry_operations: misc filesystems Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:00 -04:00
Al Viro	f786aa90e0	constify dentry_operations: NFS Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:59 -04:00
Sukadev Bhattiprolu	a9f184f02a	devpts: Must release s_umount on error We should drop the ->s_umount mutex if an error occurs after the sget()/grab_super() call. This was introduced when adding support for multiple instances of devpts and noticed during a code review/reorg. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:59 -04:00
Cheng Renquan	10f303ae1e	do_pipe cleanup: drop its last user in arch/alpha/ The last user of do_pipe is in arch/alpha/, after replacing it with do_pipe_flags, the do_pipe can be totally dropped. Signed-off-by: Cheng Renquan <crquan@gmail.com> Acked-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:58 -04:00
Duane Griffin	723be1f300	ufs: copy symlink data into the correct union member Copy symlink data into the union member it is accessed through. Although this shouldn't make a difference to behaviour it makes the code easier to follow and grep through. It may also prevent problems if the struct/union definitions change in the future. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:58 -04:00
Duane Griffin	b12903f138	ufs: ensure fast symlinks are NUL-terminated Ensure fast symlink targets are NUL-terminated, even if corrupted on-disk. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:58 -04:00
Duane Griffin	f33219b7a9	ufs: don't truncate longer ufs2 fast symlinks ufs2 fast symlinks can be twice as long as ufs ones, however the code was using the ufs size in various places. Fix that so ufs2 symlinks over 60 characters aren't truncated. Note that we copy the entire area instead of using the maxsymlinklen field from the superblock. This way we will be more robust against corruption (of the superblock). While we are at it, use memcpy instead of open-coding it with for loops. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:58 -04:00
Duane Griffin	9e6766cc8c	ufs: validate maximum fast symlink size from superblock The maximum fast symlink size is set in the superblock of certain types of UFS filesystem. Before using it we need to check that it isn't longer than the available space we have in the inode. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:57 -04:00
Christoph Hellwig	c8fe8f30c7	cleanup may_open Add a switch for the various i_mode fmt cases, and remove the comment about writeability of devices nodes - that part is handled in inode_permission and comment on (briefly) there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:57 -04:00
Christoph Hellwig	b6520c8193	cleanup d_add_ci Make sure that comments describe what's going on and not how, and always use __d_instantiate instead of two separate branches, one with d_instantiate and one with __d_instantiate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:57 -04:00
Christoph Hellwig	2b1c6bd77d	generic compat_sys_ustat Due to a different size of ino_t ustat needs a compat handler, but currently only x86 and mips provide one. Add a generic compat_sys_ustat and switch all architectures over to it. Instead of doing various user copy hacks compat_sys_ustat just reimplements sys_ustat as it's trivial. This was suggested by Arnd Bergmann. Found by Eric Sandeen when running xfstests/017 on ppc64, which causes stack smashing warnings on RHEL/Fedora due to the too large amount of data writen by the syscall. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:57 -04:00
Christoph Hellwig	ec1ab0abde	affs: fix missing unlocks in affs_remove_link In two error cases affs_remove_link doesn't call affs_unlock_dir to release the i_hash_lock semaphore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:43:56 -04:00
Linus Torvalds	8e9d208972	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: Rationalize fasync return values Move FASYNC bit handling to f_op->fasync() Use f_lock to protect f_flags Rename struct file->f_ep_lock	2009-03-26 16:14:02 -07:00
Linus Torvalds	21cdbc1378	Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (81 commits) [S390] remove duplicated #includes [S390] cpumask: use mm_cpumask() wrapper [S390] cpumask: Use accessors code. [S390] cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits. [S390] cpumask: remove cpu_coregroup_map [S390] fix clock comparator save area usage [S390] Add hwcap flag for the etf3 enhancement facility [S390] Ensure that ipl panic notifier is called late. [S390] fix dfp elf hwcap/facility bit detection [S390] smp: perform initial cpu reset before starting a cpu [S390] smp: fix memory leak on __cpu_up [S390] ipl: Improve checking logic and remove switch defaults. [S390] s390dbf: Remove needless check for NULL pointer. [S390] s390dbf: Remove redundant initilizations. [S390] use kzfree() [S390] BUG to BUG_ON changes [S390] zfcpdump: Prevent zcore from beeing built as a kernel module. [S390] Use csum_partial in checksum.h [S390] cleanup lowcore.h [S390] eliminate ipl_device from lowcore ...	2009-03-26 16:04:22 -07:00
Linus Torvalds	86d9c07017	Merge branch 'for-2.6.30' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.30' of git://git.kernel.dk/linux-2.6-block: Get rid of pdflush_operation() in emergency sync and remount btrfs: get rid of current_is_pdflush() in btrfs_btree_balance_dirty Move the default_backing_dev_info out of readahead.c and into backing-dev.c block: Repeated lines in switching-sched.txt bsg: Remove bogus check against request_queue->max_sectors block: WARN in __blk_put_request() for potential bio leak loop: fix circular locking in loop_clr_fd() loop: support barrier writes bsg: add support for tail queuing cpqarray: enable bus mastering block: genhd.h cleanup patch block: add private bio_set for bio integrity allocations block: genhd.h comment needs updating block: get rid of unused blkdev_free_rq() define block: remove various blk_queue_*() setting functions in blk_init_queue_node() cciss: add BUILD_BUG_ON() for catching bad CommandList_struct alignment block: don't create bio_vec slabs of less than the inline number block: cleanup bio_alloc_bioset()	2009-03-26 16:03:04 -07:00
Linus Torvalds	13220a94d3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits) ixgbe: Allow Priority Flow Control settings to survive a device reset net: core: remove unneeded include in net/core/utils.c. e1000e: update version number e1000e: fix close interrupt race e1000e: fix loss of multicast packets e1000e: commonize tx cleanup routine to match e1000 & igb netfilter: fix nf_logger name in ebt_ulog. netfilter: fix warning in ebt_ulog init function. netfilter: fix warning about invalid const usage e1000: fix close race with interrupt e1000: cleanup clean_tx_irq routine so that it completely cleans ring e1000: fix tx hang detect logic and address dma mapping issues bridge: bad error handling when adding invalid ether address bonding: select current active slave when enslaving device for mode tlb and alb gianfar: reallocate skb when headroom is not enough for fcb Bump release date to 25Mar2009 and version to 0.22 r6040: Fix second PHY address qeth: fix wait_event_timeout handling qeth: check for completion of a running recovery qeth: unregister MAC addresses during recovery. ... Manually fixed up conflicts in: drivers/infiniband/hw/cxgb3/cxio_hal.h drivers/infiniband/hw/nes/nes_nic.c	2009-03-26 15:54:36 -07:00
Linus Torvalds	39f15003c7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix memory overwrite when saving nativeFileSystem field during mount [CIFS] Rename compose_mount_options to cifs_compose_mount_options. [CIFS] work around bug in Samba server handling for posix open [CIFS] Use posix open on file open when server supports it cifs: fix buffer format byte on NT Rename/hardlink [CIFS] Add definitions for remoteably fsctl calls [CIFS] add extra null attr check [CIFS] fix build error [CIFS] reopen file via newer posix open protocol operation if available [CIFS] Add new nostrictsync cifs mount option to avoid slow SMB flush [CIFS] DFS no longer experimental [CIFS] Send SMB flush in cifs_fsync	2009-03-26 15:46:37 -07:00
Jan Kara	9e80d40773	ext3: Avoid starting a transaction in writepage when not necessary We don't have to start a transaction in writepage() when all the blocks are a properly allocated. Even in ordered mode either the data has been written via write() and they are thus already added to transaction's list or the data was written via mmap and then it's random in which transaction they get written anyway. This should help VM to pageout dirty memory without blocking on transaction commits. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 15:44:59 -07:00
David S. Miller	08abe18af1	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: drivers/net/wimax/i2400m/usb-notif.c	2009-03-26 15:23:24 -07:00
Linus Torvalds	0c93ea4064	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (61 commits) Dynamic debug: fix pr_fmt() build error Dynamic debug: allow simple quoting of words dynamic debug: update docs dynamic debug: combine dprintk and dynamic printk sysfs: fix some bin_vm_ops errors kobject: don't block for each kobject_uevent sysfs: only allow one scheduled removal callback per kobj Driver core: Fix device_move() vs. dpm list ordering, v2 Driver core: some cleanup on drivers/base/sys.c Driver core: implement uevent suppress in kobject vcs: hook sysfs devices into object lifetime instead of "binding" driver core: fix passing platform_data driver core: move platform_data into platform_device sysfs: don't block indefinitely for unmapped files. driver core: move knode_bus into private structure driver core: move knode_driver into private structure driver core: move klist_children into private structure driver core: create a private portion of struct device driver core: remove polling for driver_probe_done(v5) sysfs: reference sysfs_dirent from sysfs inodes ... Fixed conflicts in drivers/sh/maple/maple.c manually	2009-03-26 11:17:04 -07:00
Linus Torvalds	8ff64b539b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Fix freeze issue Fix a minor bug in the previous patch GFS2: Clean up of glops.c GFS2: Fix locking bug in failed shared to exclusive conversion GFS2: Pagecache usage optimization on GFS2 GFS2: fix sparse warning: Should it be static? GFS2: fix sparse warnings: constant is so big it is ... GFS2: Support quota/noquota mount arguments GFS2: Fix alignment issue and tidy gfs2_bitfit GFS2: Add a "demote a glock" interface to sysfs GFS2: Expose UUID via sysfs/uevent GFS2: Support generation of discard requests GFS2: Fix deadlock on journal flush GFS2: Fix error path ref counting for root inode GFS2: Remove unused field from glock GFS2: Merge lock_dlm module into GFS2 GFS2: Remove "double" locking in quota GFS2: change gfs2_quota_scan into a shrinker GFS2: Bring back lvb-related stuff to lock_nolock to support quotas GFS2: Fix remount argument parsing	2009-03-26 11:08:47 -07:00
Linus Torvalds	8d80ce80e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (71 commits) SELinux: inode_doinit_with_dentry drop no dentry printk SELinux: new permission between tty audit and audit socket SELinux: open perm for sock files smack: fixes for unlabeled host support keys: make procfiles per-user-namespace keys: skip keys from another user namespace keys: consider user namespace in key_permission keys: distinguish per-uid keys in different namespaces integrity: ima iint radix_tree_lookup locking fix TOMOYO: Do not call tomoyo_realpath_init unless registered. integrity: ima scatterlist bug fix smack: fix lots of kernel-doc notation TOMOYO: Don't create securityfs entries unless registered. TOMOYO: Fix exception policy read failure. SELinux: convert the avc cache hash list to an hlist SELinux: code readability with avc_cache SELinux: remove unused av.decided field SELinux: more careful use of avd in avc_has_perm_noaudit SELinux: remove the unused ae.used SELinux: check seqno when updating an avc_node ...	2009-03-26 11:03:39 -07:00
Matthew Garrett	0a1c01c947	Make relatime default Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 11:01:10 -07:00
Matthew Garrett	d0adde574b	Add a strictatime mount option Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 10:56:35 -07:00
Matthew Garrett	11ff6f05f1	Allow relatime to update atime once a day Allow atime to be updated once per day even with relatime. This lets utilities like tmpreaper (which delete files based on last access time) continue working, making relatime a plausible default for distributions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Valerie Aurora Henson <vaurora@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 10:48:13 -07:00
Artem Bityutskiy	963f0cf6d1	UBIFS: add R/O compatibility Now UBIFS is supported by u-boot. If we ever decide to change the media format, then people will have to upgrade their u-boots to mount new format images. However, very often it is possible to preserve R/O forward-compatibility, even though the write forward-compatibility is not preserved. This patch introduces a new super-block field which stores the R/O compatibility version. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-03-26 16:36:20 +02:00
Stefan Weinhuber	b44b0ab3ba	[S390] dasd: add large volume support The dasd device driver will now support ECKD devices with more then 65520 cylinders. In the traditional ECKD adressing scheme each track is addressed by a 16-bit cylinder and 16-bit head number. The new addressing scheme makes use of the fact that the actual number of heads is never larger then 15, so 12 bits of the head number can be redefined to be part of the cylinder address. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-03-26 15:24:05 +01:00
Jens Axboe	a2a9537ac0	Get rid of pdflush_operation() in emergency sync and remount Opencode a cheasy approach with kevent. The idea here is that we'll add some generic delayed work infrastructure, which probably wont be based on pdflush (or maybe it will, in which case we can just add it back). This is in preparation for getting rid of pdflush completely. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-26 11:01:36 +01:00
Jens Axboe	6933c02e9c	btrfs: get rid of current_is_pdflush() in btrfs_btree_balance_dirty Chris says it's safe to kill. Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-26 11:01:35 +01:00
Theodore Ts'o	563bdd61fe	ext4: Check for an valid i_mode when reading the inode from disk Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-26 00:06:19 -04:00
Theodore Ts'o	7058548cd5	ext4: Use WRITE_SYNC for commits which are caused by fsync() If a commit is triggered by fsync(), set a flag indicating the journal blocks associated with the transaction should be flushed out using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-25 23:35:46 -04:00
Manish Katiyar	c16831b4cc	ext2: Zero our b_size in ext2_quota_read() ext2_quota_read() doesn't initialize tmp_bh.b_size before calling ext2_get_block() where we access it. Since it is a local variable it might contain some garbage. Make sure it is filled with reasonable value before passing. Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:38 +01:00
Matt LaPlante	620372a9ff	trivial: fix typos/grammar errors in fs/Kconfig Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:38 +01:00
Jan Kara	268157ba67	quota: Coding style fixes Wrap long lines, remove assignments from conditions, rewrite two overcomplicated for loops. Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:38 +01:00

... 3 4 5 6 7 ...

13586 Commits