Commit Graph

15575 Commits

Author SHA1 Message Date
Linus Torvalds 9abf47f11b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix missing initialization of i_dir_start_lookup member
  nilfs2: fix missing zero-fill initialization of btree node cache
2009-09-30 09:42:24 -07:00
Linus Torvalds 9f44fdc518 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix time encoding with extra epoch bits
  ext4: Add a stub for mpage_da_data in the trace header
  jbd2: Use tracepoints for history file
  ext4: Use tracepoints for mb_history trace file
  ext4, jbd2: Drop unneeded printks at mount and unmount time
  ext4: Handle nested ext4_journal_start/stop calls without a journal
  ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
  ext4: Avoid updating the inode table bh twice in no journal mode
  ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
  ext4: async direct IO for holes and fallocate support
  ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
  ext4: Split uninitialized extents for direct I/O
  ext4: release reserved quota when block reservation for delalloc retry
  ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
  ext4: Fix hueristic which avoids group preallocation for closed files
  ext4: Use ext4_msg() for ext4_da_writepage() errors
  ext4: Update documentation about quota mount options
2009-09-30 09:32:30 -07:00
Linus Torvalds 4c8f1cb266 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: Check s_dirt in fat_sync_fs()
  vfat: change the default from shortname=lower to shortname=mixed
  fat/nls: Fix handling of utf8 invalid char
2009-09-30 09:31:14 -07:00
Theodore Ts'o c1fccc0696 ext4: Fix time encoding with extra epoch bits
"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 01:13:55 -04:00
Theodore Ts'o bf6993276f jbd2: Use tracepoints for history file
The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure.  We
save memory as a bonus.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:06 -04:00
Theodore Ts'o 296c355cd6 ext4: Use tracepoints for mb_history trace file
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:42 -04:00
Theodore Ts'o 90576c0b9a ext4, jbd2: Drop unneeded printks at mount and unmount time
There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted.  Disable them to economize space
in the system logs.  In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 15:51:30 -04:00
Curt Wohlgemuth d3d1faf6a7 ext4: Handle nested ext4_journal_start/stop calls without a journal
This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 11:01:03 -04:00
Curt Wohlgemuth f3dc272fd5 ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 16:06:01 -04:00
Frank Mayhar 830156c79b ext4: Avoid updating the inode table bh twice in no journal mode
This is a cleanup of commit 91ac6f4.  Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode.  Indeed, it would be
duplicated work.

Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 10:07:47 -04:00
Ryusuke Konishi 3cc811bffd nilfs2: fix missing initialization of i_dir_start_lookup member
The i_dir_start_lookup field in nilfs_inode_info objects should be
cleared when the objects are allocated, but the the initialization was
missing in case of reading from disk.  This adds the initialization.

Since the variable just gives a start page on directory lookups, the
bug was nonfatal until now.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-29 20:32:13 +09:00
Ryusuke Konishi 1f28fcd925 nilfs2: fix missing zero-fill initialization of btree node cache
This will fix file system corruption which infrequently happens after
mount.  The problem was reported from users with the title "[NILFS
users] Fail to mount NILFS." (Message-ID:
<200908211918.34720.yuri@itinteg.net>), and so forth.  I've also
experienced the corruption multiple times on kernel 2.6.30 and 2.6.31.

The problem turned out to be caused due to discordance between
mapping->nrpages of a btree node cache and the actual number of pages
hung on the cache; if the mapping->nrpages becomes zero even as it has
pages, truncate_inode_pages() returns without doing anything.  Usually
this is harmless except it may cause page leak, but garbage collection
fairly infrequently sees a stale page remained in the btree node cache
of DAT (i.e. disk address translation file of nilfs), and induces the
corruption.

I identified a missing initialization in btree node caches was the
root cause.  This corrects the bug.

I've tested this for kernel 2.6.30 and 2.6.31.

Reported-by: Yuri Chislov <yuri@itinteg.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
2009-09-29 20:12:56 +09:00
Theodore Ts'o f3ce8064b3 ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28 15:58:29 -04:00
Mingming Cao 8d5d02e6b1 ext4: async direct IO for holes and fallocate support
For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2009-09-28 15:48:29 -04:00
Mingming Cao 4c0425ff68 ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28 15:48:41 -04:00
Mingming Cao 0031462b5b ext4: Split uninitialized extents for direct I/O
When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28 15:49:08 -04:00
Mingming Cao 9f0ccfd8e0 ext4: release reserved quota when block reservation for delalloc retry
ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.

Bug found by Jan Kara.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28 15:49:52 -04:00
Theodore Ts'o 55138e0bc2 ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 13:31:31 -04:00
Theodore Ts'o 7178057730 ext4: Fix hueristic which avoids group preallocation for closed files
The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file.  Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28 00:06:20 -04:00
Alexey Dobriyan f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Theodore Ts'o 1693918e0b ext4: Use ext4_msg() for ext4_da_writepage() errors
This allows the user to see what filesystem was involved with a
particular ext4_da_writepage() error.  Also, use KERN_CRIT which is
more appropriate than KERN_EMERG.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-26 17:43:59 -04:00
Linus Torvalds bfebb14063 Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: pass in super_block to bdi_start_writeback()
2009-09-26 10:11:13 -07:00
Linus Torvalds 07e2e6ba27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix locking and list handling code in cifs_open and its helper
  [CIFS] Remove build warning
  cifs: fix problems with last two commits
  [CIFS] Fix build break when keys support turned off
  cifs: eliminate cifs_init_private
  cifs: convert oplock breaks to use slow_work facility (try #4)
  cifs: have cifsFileInfo hold an extra inode reference
  cifs: take read lock on GlobalSMBSes_lock in is_valid_oplock_break
  cifs: remove cifsInodeInfo.oplockPending flag
  cifs: fix oplock request handling in posix codepath
  [CIFS] Re-enable Lanman security
2009-09-26 10:10:35 -07:00
Jens Axboe a72bfd4dea writeback: pass in super_block to bdi_start_writeback()
Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dcf7
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-26 00:10:40 +02:00
Jeff Layton 3321b791b2 cifs: fix locking and list handling code in cifs_open and its helper
The patch to remove cifs_init_private introduced a locking imbalance. It
didn't remove the leftover list addition code and the unlocking in that
function. cifs_new_fileinfo does the list addition now, so there should
be no need to do it outside of that function.

pCifsInode will never be NULL, so we don't need to check for that. This
patch also gets rid of the ugly locking and unlocking across function
calls.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-25 17:59:31 +00:00
Linus Torvalds 6d7f18f6ea Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: writeback_inodes_sb() should use bdi_start_writeback()
  writeback: don't delay inodes redirtied by a fast dirtier
  writeback: make the super_block pinning more efficient
  writeback: don't resort for a single super_block in move_expired_inodes()
  writeback: move inodes from one super_block together
  writeback: get rid to incorrect references to pdflush in comments
  writeback: improve readability of the wb_writeback() continue/break logic
  writeback: cleanup writeback_single_inode()
  writeback: kupdate writeback shall not stop when more io is possible
  writeback: stop background writeback when below background threshold
  writeback: balance_dirty_pages() shall write more than dirtied pages
  fs: Fix busyloop in wb_writeback()
2009-09-25 09:27:30 -07:00
Jens Axboe 56a131dcf7 writeback: writeback_inodes_sb() should use bdi_start_writeback()
Pointless to iterate other devices looking for a super, when
we have a bdi mapping.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:26 +02:00
Wu Fengguang b3af9468ae writeback: don't delay inodes redirtied by a fast dirtier
Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:26 +02:00
Jens Axboe 9ecc2738ac writeback: make the super_block pinning more efficient
Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:26 +02:00
Jens Axboe cf137307cd writeback: don't resort for a single super_block in move_expired_inodes()
If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:26 +02:00
Shaohua Li 5c03449d34 writeback: move inodes from one super_block together
__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?

Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:25 +02:00
Jens Axboe 5b0830cb90 writeback: get rid to incorrect references to pdflush in comments
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:25 +02:00
Jens Axboe 71fd05a887 writeback: improve readability of the wb_writeback() continue/break logic
And throw some comments in there, too.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:25 +02:00
Wu Fengguang ae1b7f7d4b writeback: cleanup writeback_single_inode()
Make the if-else straight in writeback_single_inode().
No behavior change.

Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:25 +02:00
Wu Fengguang 7fbdea3232 writeback: kupdate writeback shall not stop when more io is possible
Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.

wbc.more_io should always be respected.

Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:25 +02:00
Wu Fengguang d3ddec7635 writeback: stop background writeback when below background threshold
Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.

Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.

Reported-by: Richard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:24 +02:00
Jan Kara a5989bdc98 fs: Fix busyloop in wb_writeback()
If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.

Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-25 18:08:24 +02:00
Steve French 15dd478107 [CIFS] Remove build warning
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-25 02:24:45 +00:00
Jeff Layton 5d2c0e2259 cifs: fix problems with last two commits
Fix problems with commits:

086f68bd97
3bc303c254

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-25 02:12:33 +00:00
Steve French 0f59e61c1f [CIFS] Fix build break when keys support turned off
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-25 00:33:37 +00:00
Andrew Morton c44972f178 procfs: disable per-task stack usage on NOMMU
It needs walk_page_range().

Reported-by: Michal Simek <monstr@monstr.eu>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 17:11:24 -07:00
Linus Torvalds b9b9df62e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Prevent lower dentry from going negative during unlink
  eCryptfs: Propagate vfs_read and vfs_write return codes
  eCryptfs: Validate global auth tok keys
  eCryptfs: Filename encryption only supports password auth tokens
  eCryptfs: Check for O_RDONLY lower inodes when opening lower files
  eCryptfs: Handle unrecognized tag 3 cipher codes
  ecryptfs: improved dependency checking and reporting
  eCryptfs: Fix lockdep-reported AB-BA mutex issue
  ecryptfs: Remove unneeded locking that triggers lockdep false positives
2009-09-24 17:10:17 -07:00
Jeff Layton 086f68bd97 cifs: eliminate cifs_init_private
...it does the same thing as cifs_fill_fileinfo, but doesn't handle the
flist ordering correctly. Also rename cifs_fill_fileinfo to a more
descriptive name and have it take an open flags arg instead of just a
write_only flag. That makes the logic in the callers a little simpler.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-24 19:35:18 +00:00
Al Viro 36dd2fdb37 nfs[23] tcp breakage in mount with binary options
We forget to set nfs_server.protocol in tcp case when old-style binary
options are passed to mount.  The thing remains zero and never validated
afterwards.  As the result, we hit BUG in fs/nfs/client.c:588.

Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data
merged yesterday...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-24 14:58:42 -04:00
Jeff Layton 3bc303c254 cifs: convert oplock breaks to use slow_work facility (try #4)
This is the fourth respin of the patch to convert oplock breaks to
use the slow_work facility.

A customer of ours was testing a backport of one of the earlier
patchsets, and hit a "Busy inodes after umount..." problem. An oplock
break job had raced with a umount, and the superblock got torn down and
its memory reused. When the oplock break job tried to dereference the
inode->i_sb, the kernel oopsed.

This patchset has the oplock break job hold an inode and vfsmount
reference until the oplock break completes.  With this, there should be
no need to take a tcon reference (the vfsmount implicitly holds one
already).

Currently, when an oplock break comes in there's a chance that the
oplock break job won't occur if the allocation of the oplock_q_entry
fails. There are also some rather nasty races in the allocation and
handling these structs.

Rather than allocating oplock queue entries when an oplock break comes
in, add a few extra fields to the cifsFileInfo struct. Get rid of the
dedicated cifs_oplock_thread as well and queue the oplock break job to
the slow_work thread pool.

This approach also has the advantage that the oplock break jobs can
potentially run in parallel rather than be serialized like they are
today.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-09-24 18:33:18 +00:00
Linus Torvalds 7ca263cdf8 Merge branch 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6
* 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6:
  [PATCH] Fix idle time field in /proc/uptime
2009-09-24 09:04:24 -07:00
Linus Torvalds dc2af6a6bc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (42 commits)
  Btrfs: hash the btree inode during  fill_super
  Btrfs: relocate file extents in clusters
  Btrfs: don't rename file into dummy directory
  Btrfs: check size of inode backref before adding hardlink
  Btrfs: fix releasepage to avoid unlocking extents we haven't locked
  Btrfs: Fix test_range_bit for whole file extents
  Btrfs: fix errors handling cached state in set/clear_extent_bit
  Btrfs: fix early enospc during balancing
  Btrfs: deal with NULL space info
  Btrfs: account for space used by the super mirrors
  Btrfs: fix extent entry threshold calculation
  Btrfs: remove dead code
  Btrfs: fix bitmap size tracking
  Btrfs: don't keep retrying a block group if we fail to allocate a cluster
  Btrfs: make balance code choose more wisely when relocating
  Btrfs: fix arithmetic error in clone ioctl
  Btrfs: add snapshot/subvolume destroy ioctl
  Btrfs: change how subvolumes are organized
  Btrfs: do not reuse objectid of deleted snapshot/subvol
  Btrfs: speed up snapshot dropping
  ...
2009-09-24 08:57:29 -07:00
Linus Torvalds 6c5daf012c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  truncate: use new helpers
  truncate: new helpers
  fs: fix overflow in sys_mount() for in-kernel calls
  fs: Make unload_nls() NULL pointer safe
  freeze_bdev: grab active reference to frozen superblocks
  freeze_bdev: kill bd_mount_sem
  exofs: remove BKL from super operations
  fs/romfs: correct error-handling code
  vfs: seq_file: add helpers for data filling
  vfs: remove redundant position check in do_sendfile
  vfs: change sb->s_maxbytes to a loff_t
  vfs: explicitly cast s_maxbytes in fiemap_check_ranges
  libfs: return error code on failed attr set
  seq_file: return a negative error code when seq_path_root() fails.
  vfs: optimize touch_time() too
  vfs: optimization for touch_atime()
  vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
  fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
  libfs: make simple_read_from_buffer conventional
2009-09-24 08:32:11 -07:00
Linus Torvalds db16826367 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
  HWPOISON: Enable error_remove_page on btrfs
  HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
  HWPOISON: Add madvise() based injector for hardware poisoned pages v4
  HWPOISON: Enable error_remove_page for NFS
  HWPOISON: Enable .remove_error_page for migration aware file systems
  HWPOISON: The high level memory error handler in the VM v7
  HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
  HWPOISON: shmem: call set_page_dirty() with locked page
  HWPOISON: Define a new error_remove_page address space op for async truncation
  HWPOISON: Add invalidate_inode_page
  HWPOISON: Refactor truncate to allow direct truncating of page v2
  HWPOISON: check and isolate corrupted free pages v2
  HWPOISON: Handle hardware poisoned pages in try_to_unmap
  HWPOISON: Use bitmask/action code for try_to_unmap behaviour
  HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
  HWPOISON: Add poison check to page fault handling
  HWPOISON: Add basic support for poisoned pages in fault handler v3
  HWPOISON: Add new SIGBUS error codes for hardware poison signals
  HWPOISON: Add support for poison swap entries v2
  HWPOISON: Export some rmap vma locking to outside world
  ...
2009-09-24 07:53:22 -07:00
Hiroshi Shimamoto 801460d0cf task_struct cleanup: move binfmt field to mm_struct
Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct.  And binfmt moudle is
handled per mm_struct instead of task_struct.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00