linux

Commit Graph

Author	SHA1	Message	Date
Peter Oberparleiter	0f58b44582	sysfs: fix hardlink count on device_move Update directory hardlink count when moving kobjects to a new parent. Fixes the following problem which occurs when several devices are moved to the same parent and then unregistered: > ls -laF /sys/devices/css0/defunct/ > total 0 > drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./ > drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../ > drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/ > -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-07-28 13:45:21 -07:00
Yan Zheng	f25784b35f	Btrfs: Fix async caching interaction with unmount - don't stop the caching thread until btrfs_commit_super return. - if caching is interrupted by umount, set last to (u64)-1. otherwise the un-scanned range of block group will be considered as free extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-28 08:41:57 -04:00
Jeff Layton	7b91e2661a	cifs: fix error handling in mount-time DFS referral chasing code If the referral is malformed or the hostname can't be resolved, then the current code generates an oops. Fix it to handle these errors gracefully. Reported-by: Sandro Mathys <sm@sandro-mathys.ch> Acked-by: Igor Mammedov <niallain@gmail.com> CC: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-28 00:51:59 +00:00
Linus Torvalds	fc013a5885	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: use GFP_NOFS under potential memory pressure fsnotify: fix inotify tail drop check with path entries inotify: check filename before dropping repeat events fsnotify: use def_bool in kconfig instead of letting the user choose inotify: fix error paths in inotify_update_watch inotify: do not leak inode marks in inotify_add_watch inotify: drop user watch count when a watch is removed	2009-07-27 15:54:10 -07:00
Linus Torvalds	a9355cf8e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: Fix early release of acl in jfs_get_acl	2009-07-27 12:15:56 -07:00
Linus Torvalds	2bc20d09b0	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: fix race between write_metadata_buffer and get_write_access ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() jbd: Fix a race between checkpointing code and journal_get_write_access() ext3: Fix truncation of symlinks after failed write jbd: Fail to load a journal if it is too short	2009-07-27 12:12:10 -07:00
Linus Torvalds	c7425eb481	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] fix sparse warning cifs: fix sb->s_maxbytes so that it casts properly to a signed value cifs: disable serverino if server doesn't support it	2009-07-27 12:11:43 -07:00
Josef Bacik	68b38550dd	Btrfs: change how we unpin extents We are racy with async block caching and unpinning extents. This patch makes things much less complicated by only unpinning the extent if the block group is cached. We check the block_group->cached var under the block_group->lock spin lock. If it is set to BTRFS_CACHE_FINISHED then we update the pinned counters, and unpin the extent and add the free space back. If it is not set to this, we start the caching of the block group so the next time we unpin extents we can unpin the extent. This keeps us from racing with the async caching threads, lets us kill the fs wide async thread counter, and keeps us from having to set DELALLOC bits for every extent we hit if there are caching kthreads going. One thing that needed to be changed was btrfs_free_super_mirror_extents. Now instead of just looking for LOCKED extents, we also look for DIRTY extents, since we could have left some extents pinned in the previous transaction that will never get freed now that we are unmounting, which would cause us to leak memory. So btrfs_free_super_mirror_extents has been changed to btrfs_free_pinned_extents, and it will clear the extents locked for the super mirror, and any remaining pinned extents that may be present. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:01 -04:00
Julia Lawall	631c07c8d1	Btrfs: Correct redundant test in add_inode_ref dir has already been tested. It seems that this test should be on the recently returned value inode. A simplified version of the semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:00 -04:00
Chris Mason	9779b72f05	Btrfs: find smallest available device extent during chunk allocation Allocating new block group is easy when the disk has plenty of space. But things get difficult as the disk fills up, especially if the FS has been run through btrfs-vol -b. The balance operation is likely to make the total bytes available on the device greater than the largest extent we'll actually be able to allocate. But the device extent allocation code incorrectly assumes that a device with 5G free will be able to allocate a 5G extent. It isn't normally a problem because device extents don't get freed unless btrfs-vol -b is run. This fixes the device extent allocator to remember the largest free extent it can find, and then uses that value as a fallback. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:41:41 -04:00
Chris Mason	283bb1979f	Btrfs: clear all space_info->full after removing a block group Btrfs allocates individual extents from block groups, and each block group has a specific type. It may hold metadata, data mirrored or striped etc. When we balance space (btrfs-vol -b) or remove a drive (btrfs-vol -r) we free block groups. Once a block group is freed, the space it was using on the device may be available for use by new block groups. btrfs_remove_block_group was clearing the flag that said 'our devices are full, don't even try to allocate new block groups', but it was only clearing that flag for a specific type of block group. This commit clears the full flag for all of the types of block groups, making it much more likely that we'll be able to balance space when the drive is close to full. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:30:55 -04:00
Sage Weil	ebecd3d9d2	Btrfs: make flushoncommit mount option correctly wait on ordered_extents The commit_transaction call to wait_ordered_extents when snap_pending passes nocow_only=1 to process only NOCOW or PREALLOC extents. This isn't correct for the 'flushoncommit' mode, as it skips extents we just started IO on in start_delalloc_inodes. So, in the flushoncommit case, wait on all ordered extents. Otherwise, only pass the nocow_only flag to wait_ordered_extents if snap_pending. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 13:17:44 -04:00
Yan Zheng	d717aa1d31	Btrfs: Avoid delayed reference update looping btrfs_split_leaf and btrfs_del_items can end up in a loop where one is constantly spliting a given leaf and the other is constantly merging it back with the adjacent nodes. There is a better fix for this, but in the interest of something small, this patch just changes btrfs_del_items back to balancing less often. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 12:42:46 -04:00
Yan Zheng	0a4eefbb74	Btrfs: Fix ordering of key field checks in btrfs_previous_item Check objectid of item before checking the item type, otherwise we may return zero for a key that is actually too low. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Yan Zheng	1fcbac581b	Btrfs: find_free_dev_extent doesn't handle holes at the start of the device find_free_dev_extent does not properly handle the case where the device is not complete free, and there is a free extent at the beginning of the device. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Diego Calleja	20736abaa3	Btrfs: Remove code duplication in comp_keys comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just call it. Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:46 -04:00
Josef Bacik	817d52f8db	Btrfs: async block group caching This patch moves the caching of the block group off to a kthread in order to allow people to allocate sooner. Instead of blocking up behind the caching mutex, we instead kick of the caching kthread, and then attempt to make an allocation. If we cannot, we wait on the block groups caching waitqueue, which the caching kthread will wake the waiting threads up everytime it finds 2 meg worth of space, and then again when its finished caching. This is how I tested the speedup from this mkfs the disk mount the disk fill the disk up with fs_mark unmount the disk mount the disk time touch /mnt/foo Without my changes this took 11 seconds on my box, with these changes it now takes 1 second. Another change thats been put in place is we lock the super mirror's in the pinned extent map in order to keep us from adding that stuff as free space when caching the block group. This doesn't really change anything else as far as the pinned extent map is concerned, since for actual pinned extents we use EXTENT_DIRTY, but it does mean that when we unmount we have to go in and unlock those extents to keep from leaking memory. I've also added a check where when we are reading block groups from disk, if the amount of space used == the size of the block group, we go ahead and mark the block group as cached. This drastically reduces the amount of time it takes to cache the block groups. Using the same test as above, except doing a dd to a file and then unmounting, it used to take 33 seconds to umount, now it takes 3 seconds. This version uses the commit_root in the caching kthread, and then keeps track of how many async caching threads are running at any given time so if one of the async threads is still running as we cross transactions we can wait until its finished before handling the pinned extents. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:39 -04:00
Josef Bacik	9630308170	Btrfs: use hybrid extents+bitmap rb tree for free space Currently btrfs has a problem where it can use a ridiculous amount of RAM simply tracking free space. As free space gets fragmented, we end up with thousands of entries on an rb-tree per block group, which usually spans 1 gig of area. Since we currently don't ever flush free space cache back to disk this gets to be a bit unweildly on large fs's with lots of fragmentation. This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free space cache. Initially we calculate a threshold of extent entries we can handle, which is however many extent entries we can cram into 16k of ram. The maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace will be 32k of RAM, which scales much better than we did before. Once we pass the extent threshold, we start adding bitmaps and using those instead for tracking the free space. This patch also makes it so that any free space thats less than 4 * sectorsize we go ahead and put into a bitmap. This is nice since we try and allocate out of the front of a block group, so if the front of a block group is heavily fragmented and then has a huge chunk of free space at the end, we go ahead and add the fragmented areas to bitmaps and use a normal extent entry to track the big chunk at the back of the block group. I've also taken the opportunity to revamp how we search for free space. Previously we indexed free space via an offset indexed rb tree and a bytes indexed rb tree. I've dropped the bytes indexed rb tree and use only the offset indexed rb tree. This cuts the number of tree operations we were doing previously down by half, and gives us a little bit of a better allocation pattern since we will always start from a specific offset and search forward from there, instead of searching for the size we need and try and get it as close as possible to the offset we want. I've given this a healthy amount of testing pre-new format stuff, as well as post-new format stuff. I've booted up my fedora box which is installed on btrfs with this patch and ran with it for a few days without issues. I've not seen any performance regressions in any of my tests. Since the last patch Yan Zheng fixed a problem where we could have overlapping entries, so updating their offset inline would cause problems. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:30 -04:00
Stefan Bader	4a19fb11a9	jfs: Fix early release of acl in jfs_get_acl BugLink: http://bugs.launchpad.net/ubuntu/+bug/396780 Commit `073aaa1b14` "helpers for acl caching + switch to those" introduced new helper functions for acl handling but seems to have introduced a regression for jfs as the acl is released before returning it to the caller, instead of leaving this for the caller to do. This causes the acl object to be used after freeing it, leading to kernel panics in completely different places. Thanks to Christophe Dumez for reporting and bisecting into this. Reported-by: Christophe Dumez <dchris@gmail.com> Tested-by: Christophe Dumez <dchris@gmail.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2009-07-23 11:08:36 -05:00
Linus Torvalds	81cbf6d055	Merge branch 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: Fix lockdep annotation for pipe_double_lock()	2009-07-22 16:44:18 -07:00
Steve French	f1230c9797	[CIFS] fix sparse warning Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 23:13:01 +00:00
Jeff Layton	03aa3a49ad	cifs: fix sb->s_maxbytes so that it casts properly to a signed value This off-by-one bug causes sendfile() to not work properly. When a task calls sendfile() on a file on a CIFS filesystem, the syscall returns -1 and sets errno to EOVERFLOW. do_sendfile uses s_maxbytes to verify the returned offset of the file. The problem there is that this value is cast to a signed value (loff_t). When this is done on the s_maxbytes value that cifs uses, it becomes negative and the comparisons against it fail. Even though s_maxbytes is an unsigned value, it seems that it's not OK to set it in such a way that it'll end up negative when it's cast to a signed value. These casts happen in other codepaths besides sendfile too, but the VFS is a little hard to follow in this area and I can't be sure if there are other bugs that this will fix. It's not clear to me why s_maxbytes isn't just declared as loff_t in the first place, but either way we still need to fix these values to make sendfile work properly. This is also an opportunity to replace the magic bit-shift values here with the standard #defines for this. This fixes the reproducer program I have that does a sendfile and will probably also fix the situation where apache is serving from a CIFS share. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 21:08:00 +00:00
Jeff Layton	ce6e7fcd43	cifs: disable serverino if server doesn't support it A recent regression when dealing with older servers. This bug was introduced when we made serverino the default... When the server can't provide inode numbers, disable it for the mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-22 21:07:51 +00:00
David Woodhouse	83121942b2	Btrfs: Fix crash on read failures at mount If the tree roots hit read errors during mount, btrfs is not properly erroring out. We need to check the uptodate bits after reading in the tree root node. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Daniel Cadete	c271b49241	Btrfs: remove of redundant btrfs_header_level This removes the continues call's of btrfs_header_level. One call of btrfs_header_level(c) its enough. Signed-off-by Daniel Cadete <danielncadete10@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Julia Lawall	33c17ad571	Btrfs: adjust NULL test Move the call to BUG_ON to before the dereference of the tested value. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
David Woodhouse	3acada49c2	Btrfs: Remove broken sanity check from btrfs_rmap_block() It was never actually doing anything anyway (see the loop condition), and it would be difficult to make it work for RAID[56]. Even if it was actually working, it's checking for the wrong thing anyway. Instead of checking whether we list a block which _doesn't_ land at the relevant physical location, it should be checking that we _have_ listed all the logical blocks which refer to the required physical location on all devices. This function is only called from remove_sb_from_cache() to ensure that we reserve the logical blocks which would reside at the same physical location as the superblock copies. So listing more blocks than we need is actually OK. With RAID[56] we're going to throw away an entire stripe for each block we have to ignore, so we _are_ going to list blocks other than the ones which actually contain the superblock. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
Julia Lawall	29c5e8ce01	Btrfs: convert nested spin_lock_irqsave to spin_lock If spin_lock_irqsave is called twice in a row with the same second argument, the interrupt state at the point of the second call overwrites the value saved by the first call. Indeed, the second call does not need to save the interrupt state, so it is changed to a simple spin_lock. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:00 -04:00
Peter Zijlstra	023d43c7b5	lockdep: Fix lockdep annotation for pipe_double_lock() The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>	2009-07-22 21:14:14 +02:00
Linus Torvalds	1f9758d4e7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: fs/Kconfig: move nilfs2 out	2009-07-22 10:05:00 -07:00
Yan Zheng	4a8c9a62d7	Btrfs: make sure all dirty blocks are written at commit time Write dirty block groups may allocate new block, and so may add new delayed back ref. btrfs_run_delayed_refs may make some block groups dirty. commit_cowonly_roots does not handle the recursion properly, and some dirty blocks can be left unwritten at commit time. This patch moves btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes the code not break out of the loop until there are no dirty block groups or delayed back refs. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 10:07:05 -04:00
Yan Zheng	33c66f430b	Btrfs: fix locking issue in btrfs_find_next_key When walking up the tree, btrfs_find_next_key assumes the upper level tree block is properly locked. This isn't always true even path->keep_locks is 1. This is because btrfs_find_next_key may advance path->slots[] several times instead of only once. When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found, we can't guarantee the original value of 'path->slots[level]' is 'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at 'level + 1' isn't locked. This patch fixes the issue by explicitly checking the locking state, re-searching the tree if it's not locked. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	e457afec60	Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf if 1 is returned by btrfs_search_slot, the path already points to the first item with 'key > searching key'. So increasing path->slots[0] by one is superfluous in that case. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	bf1fb512a5	Btrfs: properly update space information after shrinking device. Change 'goto done' to 'break' for the case of all device extents have been freed, so that the code updates space information will be execute. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	1bec1aed1e	Btrfs: fix definition of struct btrfs_extent_inline_ref use __le64 instead of u64 in on-disk structure definition. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Trond Myklebust	d953126a28	NFSv4: Fix a problem whereby a buggy server can oops the kernel We just had a case in which a buggy server occasionally returns the wrong attributes during an OPEN call. While the client does catch this sort of condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return -EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a fallback to an ordinary lookup instead of just returning the error. When the buggy server then returns a regular file for the fallback lookup, the VFS allows the open, and bad things start to happen, since the open file doesn't have any associated NFSv4 state. The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and secondly to ensure that we are always careful when dereferencing the nfs_open_context state pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 19:22:38 -04:00
Trond Myklebust	fccba80455	NFSv4: Fix an NFSv4 mount regression Commit `008f55d0e0` (nfs41: recover lease in _nfs4_lookup_root) forces the state manager to always run on mount. This is a bug in the case of NFSv4.0, which doesn't require us to send a setclientid until we want to grab file state. In any case, this is completely the wrong place to be doing state management. Moving that code into nfs4_init_session... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 16:48:07 -04:00
Trond Myklebust	b64aec8d1e	NFSv4: Fix an Oops in nfs4_free_lock_state The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to be due to the nfs4_lock_state->ls_state field being uninitialised. This happens if the call to nfs4_free_lock_state() is triggered at the end of nfs4_get_lock_state(). The fix is to move the initialisation of ls_state into the allocator. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-07-21 16:47:46 -04:00
Eric Paris	f44aebcc56	inotify: use GFP_NOFS under potential memory pressure inotify can have a watchs removed under filesystem reclaim. ================================= [ INFO: inconsistent lock state ] 2.6.31-rc2 #16 --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3 {IN-RECLAIM_FS-W} state was registered at: [<c10536ab>] __lock_acquire+0x2c9/0xac4 [<c1053f45>] lock_acquire+0x9f/0xc2 [<c1308872>] __mutex_lock_common+0x2d/0x323 [<c1308c00>] mutex_lock_nested+0x2e/0x36 [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2 [<c108bfb6>] shrink_slab+0xe2/0x13c [<c108c3e1>] kswapd+0x3d1/0x55d [<c10449b5>] kthread+0x66/0x6b [<c1003fdf>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff Two things are needed to fix this. First we need a method to tell fsnotify_create_event() to use GFP_NOFS and second we need to stop using one global IN_IGNORED event and allocate them one at a time. This solves current issues with multiple IN_IGNORED on a queue having tail drop problems and simplifies the allocations since we don't have to worry about two tasks opperating on the IGNORED event concurrently. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:27 -04:00
Eric Paris	c05594b621	fsnotify: fix inotify tail drop check with path entries fsnotify drops new events when they are the same as the tail event on the queue to be sent to userspace. The problem is that if the event comes with a path we forget to break out of the switch statement and fall into the code path which matches on events that do not have any type of file backed information (things like IN_UNMOUNT and IN_Q_OVERFLOW). The problem is that this code thinks all such events should be dropped. Fix is to add a break. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	4a148ba988	inotify: check filename before dropping repeat events inotify drops events if the last event on the queue is the same as the current event. But it does 2 things wrong. First it is comparing old->inode with new->inode. But after an event if put on the queue the ->inode is no longer allowed to be used. It's possible between the last event and this new event the inode could be reused and we would falsely match the inode's memory address between two differing events. The second problem is that when a file is removed fsnotify is passed the negative dentry for the removed object rather than the postive dentry from immediately before the removal. This mean the (broken) inotify tail drop code was matching the NULL ->inode of differing events. The fix is to check the file name which is stored with events when doing the tail drop instead of wrongly checking the address of the stored ->inode. Reported-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	520dc2a526	fsnotify: use def_bool in kconfig instead of letting the user choose fsnotify doens't give the user anything. If someone chooses inotify or dnotify it should build fsnotify, if they don't select one it shouldn't be built. This patch changes fsnotify to be a def_bool=n and makes everything else select it. Also fixes the issue people complained about on lwn where gdm hung because they didn't have inotify and they didn't get the inotify build option..... Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	7e790dd5fc	inotify: fix error paths in inotify_update_watch inotify_update_watch could leave things in a horrid state on a number of error paths. We could try to remove idr entries that didn't exist, we could send an IN_IGNORED to userspace for watches that don't exist, and a bit of other stupidity. Clean these up by doing the idr addition before we put the mark on the inode since we can clean that up on error and getting off the inode's mark list is hard. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	75fe2b2639	inotify: do not leak inode marks in inotify_add_watch inotify_add_watch had a couple of problems. The biggest being that if inotify_add_watch was called on the same inode twice (to update or change the event mask) a refence was taken on the original inode mark by fsnotify_find_mark_entry but was not being dropped at the end of the inotify_add_watch call. Thus if inotify_rm_watch was called although the mark was removed from the inode, the refcnt wouldn't hit zero and we would leak memory. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
Eric Paris	5549f7cdf8	inotify: drop user watch count when a watch is removed The inotify rewrite forgot to drop the inotify watch use cound when a watch was removed. This means that a single inotify fd can only ever register a maximum of /proc/sys/fs/max_user_watches even if some of those had been freed. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-07-21 15:26:26 -04:00
dingdinghua	f1015c4477	jbd: fix race between write_metadata_buffer and get_write_access The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-21 11:54:42 +02:00
Linus Torvalds	457f82bac6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: Fix incorrect parameters to v9fs_file_readn. 9p: Possible regression in p9_client_stat 9p: default 9p transport module fix	2009-07-20 16:48:31 -07:00
Jeff Layton	90a98b2f3f	cifs: free nativeFileSystem field before allocating a new one ...otherwise, we'll leak this memory if we have to reconnect (e.g. after network failure). Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-20 18:24:37 +00:00
Steve French	f6c4338543	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-07-16 04:21:39 +00:00
Jan Kara	43237b5490	ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sence. Currently it was set only by ext3_getblk(). Since the parameter has some effect only if create == 1, it is easy to check that the three callers which end up calling ext3_getblk() with create == 1 (ext3_append, ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:30:46 +02:00
Jan Kara	1e9fd53b78	jbd: Fix a race between checkpointing code and journal_get_write_access() The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. The similar problem can happen with the journal_get_create_access() path. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:30:07 +02:00
Jan Kara	9eaaa2d575	ext3: Fix truncation of symlinks after failed write Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the orphan list (both on disk and in memory). Fix this by calling ext3_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext3_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext3_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:28:07 +02:00
Jan Kara	7447a668a3	jbd: Fail to load a journal if it is too short Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-15 21:26:23 +02:00
Linus Torvalds	8aa651e23e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: free socket in error exit path dlm: fix plock use-after-free dlm: Fix uninitialised variable warning in lock.c	2009-07-14 18:37:24 -07:00
Linus Torvalds	c0c50b541a	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/function-profiler: do not free per cpu variable stat tracing/events: Move TRACE_SYSTEM outside of include guard	2009-07-14 18:34:32 -07:00
Abhishek Kulkarni	9c9ad6162e	9p: Fix incorrect parameters to v9fs_file_readn. Fix v9fs_vfs_readpage. The offset and size parameters to v9fs_file_readn were interchanged and hence passed incorrectly. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2009-07-14 15:54:42 -05:00
Casey Dahlin	a89d63a159	dlm: free socket in error exit path In the tcp_connect_to_sock() error exit path, the socket allocated at the top of the function was not being freed. Signed-off-by: Casey Dahlin <cdahlin@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-07-14 12:28:43 -05:00
Ryusuke Konishi	4fed598a49	fs/Kconfig: move nilfs2 out fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>	2009-07-14 12:34:17 +09:00
Linus Torvalds	1cf29683f4	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix race between write_metadata_buffer and get_write_access ext4: Fix ext4_mb_initialize_context() to initialize all fields ext4: fix null handler of ioctls in no journal mode ext4: Fix buffer head reference leak in no-journal mode ext4: Move __ext4_journalled_writepage() to avoid forward declaration ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation ext4: Don't look at buffer_heads outside i_size. ext4: Fix goal inum check in the inode allocator ext4: fix no journal corruption with locale-gen ext4: Calculate required journal credits for inserting an extent properly ext4: Fix truncation of symlinks after failed write jbd2: Fix a race between checkpointing code and journal_get_write_access() ext4: Use rcu_barrier() on module unload. ext4: naturally align struct ext4_allocation_request ext4: mark several more functions in mballoc.c as noinline ext4: Fix potential reclaim deadlock when truncating partial block jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region ext4: Fix type warning on 64-bit platforms in tracing events header	2009-07-13 16:39:25 -07:00
dingdinghua	96577c4382	jbd2: fix race between write_metadata_buffer and get_write_access The function jbd2_journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 17:55:35 -04:00
Linus Torvalds	a4dc32374e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: wm97xx_batery: replace driver_data with dev_get_drvdata() omap: video: remove direct access of driver_data Sound: remove direct access of driver_data driver model: fix show/store prototypes in doc. Firmware: firmware_class, fix lock imbalance Driver Core: remove BUS_ID_SIZE sparc: remove driver-core BUS_ID_SIZE partitions: fix broken uevent_suppress conversion devres: WARN() and return, don't crash on device_del() of uninitialized device	2009-07-13 10:24:08 -07:00
Theodore Ts'o	833576b362	ext4: Fix ext4_mb_initialize_context() to initialize all fields Pavel Roskin pointed out that kmemcheck indicated that ext4_mb_store_history() was accessing uninitialized values of ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc history. Fix this by initializing the entire structure to all zeros first. Also, two fields were getting doubly initialized by the caller of ext4_mb_initialize_context, so remove them for efficiency's sake. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:45:52 -04:00
Peng Tao	ac046f1d61	ext4: fix null handler of ioctls in no journal mode The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not flush the journal in no_journal mode. Otherwise, running resize2fs on a mounted no_journal partition triggers the following error messages: BUG: unable to handle kernel NULL pointer dereference at 00000014 IP: [<c039d282>] _spin_lock+0x8/0x19 *pde = 00000000 Oops: 0002 [#1] SMP Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:30:17 -04:00
Curt Wohlgemuth	e6b5d30104	ext4: Fix buffer head reference leak in no-journal mode We found a problem with buffer head reference leaks when using an ext4 partition without a journal. In particular, calls to ext4_forget() would not to a brelse() on the input buffer head, which will cause pages they belong to to not be reclaimable. Further investigation showed that all places where ext4_journal_forget() and ext4_journal_revoke() are called are subject to the same problem. The patch below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit release of the buffer head when the journal handle isn't valid. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:07:20 -04:00
Li Zefan	d0b6e04a4c	tracing/events: Move TRACE_SYSTEM outside of include guard If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h> will be included and compiled, otherwise it will be <trace/events/TRACE_SYSTEM.h> So TRACE_SYSTEM should be defined outside of #if proctection, just like TRACE_INCLUDE_FILE. Imaging this scenario: #include <trace/events/foo.h> -> TRACE_SYSTEM == foo ... #include <trace/events/bar.h> -> TRACE_SYSTEM == bar ... #define CREATE_TRACE_POINTS #include <trace/events/foo.h> -> TRACE_SYSTEM == bar !!! and then bar.h will be included and compiled. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 10:59:55 +02:00
Johannes Berg	134e63756d	genetlink: make netns aware This makes generic netlink network namespace aware. No generic netlink families except for the controller family are made namespace aware, they need to be checked one by one and then set the family->netnsok member to true. A new function genlmsg_multicast_netns() is introduced to allow sending a multicast message in a given namespace, for example when it applies to an object that lives in that namespace, a new function genlmsg_multicast_allns() to send a message to all network namespaces (for objects that do not have an associated netns). The function genlmsg_multicast() is changed to multicast the message in just init_net, which is currently correct for all generic netlink families since they only work in init_net right now. Some will later want to work in all net namespaces because they do not care about the netns at all -- those will have to be converted to use one of the new functions genlmsg_multicast_allns() or genlmsg_multicast_netns() whenever they are made netns aware in some way. After this patch families can easily decide whether or not they should be available in all net namespaces. Many genl families us it for objects not related to networking and should therefore be available in all namespaces, but that will have to be done on a per family basis. Note that this doesn't touch on the checkpoint/restart problem where network namespaces could be used, genl families and multicast groups are numbered globally and I see no easy way of changing that, especially since it must be possible to multicast to all network namespaces for those families that do not care about netns. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-12 14:03:27 -07:00
Heiko Carstens	f8c73c790c	partitions: fix broken uevent_suppress conversion git commit `f67f129e` "Driver core: implement uevent suppress in kobject" contains this chunk for fs/partitions/check.c: /* suppress uevent if the disk supresses it */ - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(pdev)) kobject_uevent(&pdev->kobj, KOBJ_ADD); However that should have been - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(ddev)) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Ming Lei <tom.leiming@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-07-12 13:02:09 -07:00
Artem Bityutskiy	dd0d9a46f5	AFS: Fix compilation warning Fix the following warning: fs/afs/dir.c: In function 'afs_d_revalidate': fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this function by marking the 'fid' variable as an uninitialized_var. The problem is that gcc doesn't always manage to work out that fid is always set on the path through the function that uses it. Cc: linux-afs@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:24:07 -07:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Linus Torvalds	81e4e1ba7e	Revert "fuse: Fix build error" as unnecessary This reverts commit `097041e576`. Trond had a better fix, which is the parent of this one ("Fix compile error due to congestion_wait() changes") Requested-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-11 11:22:34 -07:00
Bartlomiej Zolnierkiewicz	8711c67bee	isofs: fix Joliet regression commit `5404ac8e44` ("isofs: cleanup mount option processing") missed conversion of joliet option flag resulting in non-working Joliet support. CC: walt <w41ter@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-10 19:18:59 -07:00
Linus Torvalds	44c695b13b	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix corruption dump UBIFS: clean up free space checking UBIFS: small amendments in the LEB scanning code UBIFS: dump a little more in case of corruptions MAINTAINERS: update ahunter's e-mail address UBIFS: allow more than one volume to be mounted UBIFS: fix assertion warning UBIFS: minor spelling and grammar fixes UBIFS: fix 64-bit divisions in debug print UBIFS: few spelling fixes UBIFS: set write-buffer timout to 3-5 seconds UBIFS: slightly optimize write-buffer timer usage UBIFS: improve debugging messaged UBIFS: fix integer overflow warning	2009-07-10 19:14:48 -07:00
Linus Torvalds	04eef90c2e	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: osdblk: Adjust queue limits to lower device's limits osdblk: a Linux block device for OSD objects MAINTAINERS: Add osd maintained files (F:) exofs: Avoid using file_fsync() exofs: Remove IBM copyrights exofs: Fix bio leak in error handling path (sync read)	2009-07-10 19:12:24 -07:00
Larry Finger	097041e576	fuse: Fix build error When building v2.6.31-rc2-344-g69ca06c, the following build errors are found due to missing includes: CC [M] fs/fuse/dev.o fs/fuse/dev.c: In function ‘request_end’: fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function) ... fs/nfs/write.c: In function ‘nfs_set_page_writeback’: fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function) Signed-off-by: Larry Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-10 19:09:46 -07:00
Linus Torvalds	69ca06c945	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: reset oom_cfqq in cfq_set_request() block: fix sg SG_DXFER_TO_FROM_DEV regression block: call blk_scsi_ioctl_init() Fix congestion_wait() sync/async vs read/write confusion	2009-07-10 14:29:58 -07:00
Linus Torvalds	9f2d8be426	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix disorder in cp count on error during deleting checkpoints nilfs2: fix lockdep warning between regular file and inode file nilfs2: fix incorrect KERN_CRIT messages in case of write failures nilfs2: fix hang problem of log writer which occurs after write failures nilfs2: remove unlikely directive causing mis-conversion of error code	2009-07-10 14:27:21 -07:00
FUJITA Tomonori	ecb554a846	block: fix sg SG_DXFER_TO_FROM_DEV regression I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.html Reported-by: zhou sf <sxzzsf@gmail.com> Tested-by: zhou sf <sxzzsf@gmail.com> Cc: stable@kernel.org Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Jens Axboe	8aa7e847d8	Fix congestion_wait() sync/async vs read/write confusion Commit `1faa16d228` accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Steve French	65bc98b005	[CIFS] Distinguish posix opens and mkdirs from legacy mkdirs in stats Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-10 15:27:25 +00:00
Linus Torvalds	c2cc49a2f8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: when ATTR_READONLY is set, only clear write bits on non-directories cifs: remove cifsInodeInfo->inUse counter cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget [CIFS] update cifs version number cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo cifs: add pid of initiating process to spnego upcall info cifs: fix regression with O_EXCL creates and optimize away lookup cifs: add new cifs_iget function and convert unix codepath to use it	2009-07-09 20:40:58 -07:00
Jeff Layton	d0c280d26d	cifs: when ATTR_READONLY is set, only clear write bits on non-directories cifs: when ATTR_READONLY is set, only clear write bits on non-directories On windows servers, ATTR_READONLY apparently either has no meaning or serves as some sort of queue to certain applications for unrelated behavior. This MS kbase article has details: http://support.microsoft.com/kb/326549/ Don't clear the write bits directory mode when ATTR_READONLY is set. Reported-by: pouchat@peewiki.net Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:06:04 +00:00
Jeff Layton	aeaaf253c4	cifs: remove cifsInodeInfo->inUse counter cifs: remove cifsInodeInfo->inUse counter It was purported to be a refcounter of some sort, but was never used that way. It never served any purpose that wasn't served equally well by the I_NEW flag. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:06:00 +00:00
Jeff Layton	0b8f18e358	cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget Rather than allocating an inode and filling it out, have cifs_get_inode_info fill out a cifs_fattr and call cifs_iget. This means a pretty hefty reorganization of cifs_get_inode_info. For the readdir codepath, add a couple of new functions for filling out cifs_fattr's from different FindFile response infolevels. Finally, remove cifs_new_inode since there are no more callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 23:05:48 +00:00
Steve French	b77863bfa1	[CIFS] update cifs version number Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 22:51:38 +00:00
Jeff Layton	3bbeeb3c93	cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls When there's an open filehandle, SET_FILE_INFO is apparently preferred over SET_PATH_INFO. Add a new variant that sets a FILE_UNIX_INFO_BASIC infolevel via SET_FILE_INFO and switch cifs_setattr_unix to use the new call when there's an open filehandle available. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:10 +00:00
Jeff Layton	654cf14ac0	cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO The SET_FILE_INFO variant will need to do the same thing here. Break this code out into a separate function that both variants can call. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:06 +00:00
Jeff Layton	01ea95e3b6	cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo ...in preparation of adding a SET_FILE_INFO variant. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:15:02 +00:00
Jeff Layton	c4c1bff64d	cifs: add pid of initiating process to spnego upcall info cifs: add pid of initiating process to spnego upcall info This will allow the upcall to poke in /proc/<pid>/environ and get the value of the $KRB5CCNAME env var for the process. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-09 21:14:58 +00:00
Artem Bityutskiy	0611254760	UBIFS: fix corruption dump In the 'ubifs_recover_leb()' function, when we find corrupted empty space, we dump 8K starting from the offset where the last node ends. This is OK if the corrupted empty space is somewhere near that offset. But if the corruption is far at the end of the LEB, we will dump all 0xFF bytes and complitely ignore the interesting data. This is observed on a PPC ("kilauea") with NOR flash. This patch changes the behavior and teaches UBIFS to print only interesting data. I.e., now we find where corruption starts and start dumping from that offset. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:39 +03:00
Artem Bityutskiy	431102fed3	UBIFS: clean up free space checking recovery.c has 'is_empty()' helper and it is better to use this helper instead of re-implementing it in several places. This patch does this and removes some amount of unneeded code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Artem Bityutskiy	ed43f2f06c	UBIFS: small amendments in the LEB scanning code This patch fixes few minor things I've spotted while going through code: 1. Better document return codes 2. If 'ubifs_scan_a_node()' returns some thing we do not expect, treat this as an error. 3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN, not on any error. 4. If empty space starts at a non-aligned address, print a message. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Artem Bityutskiy	086b3640c1	UBIFS: dump a little more in case of corruptions In case of corruptions, dump 8192 bytes instead of 4096. The largest node is 4096+ bytes, so it is better to see a node boundary, which is not always possible when only 4096 bytes are printed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>	2009-07-09 09:19:38 +03:00
Jeff Layton	5ddf1e0ff0	cifs: fix regression with O_EXCL creates and optimize away lookup cifs: fix regression with O_EXCL creates and optimize away lookup Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Shirish Pargaonkar <shirishp@gmail.com> CC: Stable Kernel <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2009-07-08 21:55:45 +00:00
Joe Perches	ad361c9884	Remove multiple KERN_ prefixes from printk formats Commit `5fd29d6ccb` ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:30:03 -07:00
Linus Torvalds	728b690fd5	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: quota: Fix possible deadlock during parallel quotaon and quotaoff	2009-07-08 09:35:50 -07:00
Catalin Marinas	d5ce5b40bc	Free the memory allocated by memdup_user() in fs/sysfs/bin.c Commit `1c8542c7bb` replaced kmalloc() with memdup_user() in the write() function but also dropped the kfree(temp). The memdup_user() function allocates memory which is never freed. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:34:07 -07:00
Alexey Dobriyan	b43f3cbd21	headers: mnt_namespace.h redux Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:31:56 -07:00
Jiaying Zhang	d01730d74d	quota: Fix possible deadlock during parallel quotaon and quotaoff The following test script triggers a deadlock on ext2 filesystem: while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done & while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done & I found there is a potential deadlock between quotaon and quotaoff (or quotasync). Basically, all of quotactl operations need to be protected by dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write that needs to grab the i_mutex of the quota file. But in vfs_quota_on_inode (called from quotaon operation), the current code tries to grab the i_mutex of the quota file first before getting quonoff_mutex. Reverse the order in which we take locks in vfs_quota_on_inode(). Jan Kara: Changed changelog to be more readable, made lockdep happy with I_MUTEX_QUOTA. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-07-07 18:15:21 +02:00
Oleg Nesterov	793285fcaf	cred_guard_mutex: do not return -EINTR to user-space do_execve() and ptrace_attach() return -EINTR if mutex_lock_interruptible(->cred_guard_mutex) fails. This is not right, change the code to return ERESTARTNOINTR. Perhaps we should also change proc_pid_attr_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:04 -07:00
Zhang, Yanmin	3beab0b424	sys_sync(): fix 16% performance regression in ffsb create_4k test I run many ffsb test cases on JBODs (typically 13/12 disks). Comparing with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with ffsb_create_4k. The sub test case creates files continuously for 10 minitues and every file is 1MB. Bisect located below patch. `5cee5815d1` is first bad commit commit `5cee5815d1` Author: Jan Kara <jack@suse.cz> Date: Mon Apr 27 16:43:51 2009 +0200 vfs: Make sys_sync() use fsync_super() (version 4) It is unnecessarily fragile to have two places (fsync_super() and do_sync()) doing data integrity sync of the filesystem. Alter __fsync_super() to accommodate needs of both callers and use it. So after this patch __fsync_super() is the only place where we gather all the calls needed to properly send all data on a filesystem to disk. As a matter of fact, ffsb calls sys_sync in the end to make sure all data is flushed to disks and the flushing is counted into the result. vmstat shows ffsb is blocked when syncing for a long time. With 2.6.30, ffsb is blocked for a short time. I checked the patch and did experiments to recover the original methods. Eventually, the root cause is the patch deletes the calling to wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O. wakeup_pdflush could ask pdflush to write back pages with ffsb at the same time. [akpm@linux-foundation.org: restore comment too] Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:03 -07:00

1 2 3 4 5 ...

14815 Commits