Commit Graph

18447 Commits

Author SHA1 Message Date
Sage Weil 2b2300d62e ceph: try to send partial cap release on cap message on missing inode
If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately.  This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.

If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:30:25 -07:00
Sage Weil 3d7ded4d81 ceph: release cap on import if we don't have the inode
If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:30:07 -07:00
Sage Weil 9dbd412f56 ceph: fix misleading/incorrect debug message
Nothing is released here: the caps message is simply ignored in this case.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:29:59 -07:00
Jeff Mahoney 00d5643e7c ceph: fix atomic64_t initialization on ia64
bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes
 build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10 13:29:50 -07:00
Miklos Szeredi 6db40cf047 pipe: fix check in "set size" fcntl
As it stands this check compares the number of pages to the page size.
This makes no sense and makes the fcntl fail in almost any sane case.

Fix it by checking if nr_pages is not zero (it can become zero only if
arg is too big and round_pipe_size() overflows).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-10 19:08:34 +02:00
Miklos Szeredi 1d862f4122 pipe: fix pipe buffer resizing
pipe_set_size() needs to copy pipe bufs from the old circular buffer
to the new.

The current code gets this wrong in multiple ways, resulting in oops.

Test program is available here:
  http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-10 19:08:34 +02:00
Jens Axboe 3e6c05052c block: remove duplicate BUG_ON() in bd_finish_claiming()
We do the same BUG_ON() just a line later when calling into
__bd_abort_claiming().

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-10 19:08:34 +02:00
Nick Piggin b0018361c3 block: bd_start_claiming cleanup
I don't like the subtle multi-context code in bd_claim (ie.  detects where it
has been called based on bd_claiming). It seems clearer to just require a new
function to finish a 2-part claim.

Also improve commentary in bd_start_claiming as to how it should
be used.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-10 19:08:34 +02:00
Nick Piggin cf3425707e block: bd_start_claiming fix module refcount
bd_start_claiming has an unbalanced module_put introduced in 6b4517a79.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-10 19:08:34 +02:00
Linus Torvalds b95a568093 Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux:
  nfsd4: shut down callback queue outside state lock
  nfsd: nfsd_setattr needs to call commit_metadata
2010-06-09 12:43:04 -07:00
Dave Chinner 254c8c2dbf xfs: remove nr_to_write writeback windup.
Now that the background flush code has been fixed, we shouldn't need to
silently multiply the wbc->nr_to_write to get good writeback. Remove
that code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-08 18:12:44 -07:00
J. Bruce Fields 44b56603c4 Merge branch 'for-2.6.34-incoming' into for-2.6.35-incoming 2010-06-08 20:05:18 -04:00
J. Bruce Fields c3935e3049 nfsd4: shut down callback queue outside state lock
This reportedly causes a lockdep warning on nfsd shutdown.  That looks
like a false positive to me, but there's no reason why this needs the
state lock anyway.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-08 19:33:52 -04:00
Linus Torvalds 3975d16760 Merge git://git.infradead.org/~dwmw2/mtd-2.6.35
* git://git.infradead.org/~dwmw2/mtd-2.6.35:
  jffs2: update ctime when changing the file's permission by setfacl
  jffs2: Fix NFS race by using insert_inode_locked()
  jffs2: Fix in-core inode leaks on error paths
  mtd: Fix NAND submenu
  mtd/r852: update card detect early.
  mtd/r852: Fixes in case of DMA timeout
  mtd/r852: register IRQ as last step
  drivers/mtd: Use memdup_user
  docbook: make mtd nand module init static
2010-06-07 17:10:06 -07:00
Jan Kara 1c24d06f8e jffs2: update ctime when changing the file's permission by setfacl
jffs2 didn't update the ctime of the file when its permission was changed.

Steps to reproduce:
 # touch aaa
 # stat -c %Z aaa
 1275289822
 # setfacl -m  'u::x,g::x,o::x' aaa
 # stat -c %Z aaa
 1275289822                         <- unchanged

But, according to the spec of the ctime, jffs2 must update it.

Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-06-06 11:27:10 +01:00
Linus Torvalds 78b36558b7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags
  ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files
2010-06-05 10:07:25 -07:00
Dmitry Monakhov 84a8dce271 ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags
A few functions were still modifying i_flags in a racy manner.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-06-05 11:51:27 -04:00
Linus Torvalds 6c5de280b6 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: improve xfs_isilocked
  xfs: skip writeback from reclaim context
  xfs: remove done roadmap item from xfs-delayed-logging-design.txt
  xfs: fix race in inode cluster freeing failing to stale inodes
  xfs: fix access to upper inodes without inode64
  xfs: fix might_sleep() warning when initialising per-ag tree
  fs/xfs/quota: Add missing mutex_unlock
  xfs: remove duplicated #include
  xfs: convert more trace events to DEFINE_EVENT
  xfs: xfs_trace.c: remove duplicated #include
  xfs: Check new inode size is OK before preallocating
  xfs: clean up xlog_align
  xfs: cleanup log reservation calculactions
  xfs: be more explicit if RT mount fails due to config
  xfs: replace E2BIG with EFBIG where appropriate
2010-06-05 07:33:05 -07:00
Linus Torvalds 7926e0bfbb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: remove obsolete declarations of cache constructor and destructor
  nilfs2: fix style issue in nilfs_destroy_cachep
2010-06-05 07:31:13 -07:00
Linus Torvalds 7f0d384caf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Minix: Clean up left over label
  fix truncate inode time modification breakage
  fix setattr error handling in sysfs, configfs
  fcntl: return -EFAULT if copy_to_user fails
  wrong type for 'magic' argument in simple_fill_super()
  fix the deadlock in qib_fs
  mqueue doesn't need make_bad_inode()
2010-06-04 21:12:39 -07:00
Linus Torvalds d2dd328b7f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
  block: make blk_init_free_list and elevator_init idempotent
  block: avoid unconditionally freeing previously allocated request_queue
  pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
  pipe: change the privilege required for growing a pipe beyond system max
  pipe: adjust minimum pipe size to 1 page
  block: disable preemption before using sched_clock()
  cciss: call BUG() earlier
  Preparing 8.3.8rc2
  drbd: Reduce verbosity
  drbd: use drbd specific ratelimit instead of global printk_ratelimit
  drbd: fix hang on local read errors while disconnected
  drbd: Removed the now empty w_io_error() function
  drbd: removed duplicated #includes
  drbd: improve usage of MSG_MORE
  drbd: need to set socket bufsize early to take effect
  drbd: improve network latency, TCP_QUICKACK
  drbd: Revert "drbd: Create new current UUID as late as possible"
  brd: support discard
  Revert "writeback: fix WB_SYNC_NONE writeback from umount"
  Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
  ...
2010-06-04 15:37:44 -07:00
Linus Torvalds f9196e7c03 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  fix setattr error handling in sysfs, configfs
  kobject: free memory if netlink_kernel_create() fails
  lib/kobject_uevent.c: fix CONIG_NET=n warning
2010-06-04 15:27:27 -07:00
Mike Frysinger 1da083c9b2 flat: fix unmap len in load error path
The data chunk is mmaped with 'len' which remains unchanged, so use that
when unmapping in the error path rather than trying to recalculate (and
incorrectly so) the value used originally.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David McCullough <davidm@snapgear.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-04 15:21:45 -07:00
Mike Frysinger 2e94de8acb fs/binfmt_flat.c: split the stack & data alignments
The stack and data have different alignment requirements, so don't force
them to wear the same shoe.  Increase the data alignment to match that
which the elf2flt linker script has always been using: 0x20 bytes.  Not
only does this bring the kernel loader in line with the toolchain, but it
also fixes a swath of gcc tests which try to force larger alignment values
but randomly fail when the FLAT loader fails to deliver.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: David McCullough <davidm@snapgear.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jie Zhang <jie@codesourcery.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-04 15:21:44 -07:00
Heiko Carstens 7cbe17701a fs/compat_rw_copy_check_uvector: add missing compat_ptr call
A call to access_ok is missing a compat_ptr conversion.  Introduced with
b83733639a "compat: factor out
compat_rw_copy_check_uvector from compat_do_readv_writev"

fs/compat.c: In function 'compat_rw_copy_check_uvector':
fs/compat.c:629: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-04 15:21:44 -07:00
Andrew Hendry 01afaf6198 Minix: Clean up left over label
Remove a left over fail label.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:30 -04:00
Nick Piggin af5a30d8cf fix truncate inode time modification breakage
mtime and ctime should be changed only if the file size has actually
changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
sequence has caused regressions where they always update timestamps.

There is some strange cases in POSIX where truncate(2) must not update
times unless the size has acutally changed, see 6e656be89.

This area is all still rather buggy in different ways in a lot of
filesystems and needs a cleanup and audit (ideally the vfs will provide
a simple attribute or call to direct all filesystems exactly which
attributes to change). But coming up with the best solution will take a
while and is not appropriate for rc anyway.

So fix recent regression for now.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:30 -04:00
Nick Piggin 8718d36cf9 fix setattr error handling in sysfs, configfs
sysfs and configfs setattr functions have error cases after the generic inode's
attributes have been changed. Fix consistency by changing the generic inode
attributes only when it is guaranteed to succeed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:29 -04:00
Dan Carpenter 5b54470dad fcntl: return -EFAULT if copy_to_user fails
copy_to_user() returns the number of bytes remaining, but we want to
return -EFAULT.
	ret = fcntl(fd, F_SETOWN_EX, NULL);
With the original code ret would be 8 here.

V2: Takuya Yoshikawa pointed out a similar issue in f_getown_ex()

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:28 -04:00
Roberto Sassu 7d683a0999 wrong type for 'magic' argument in simple_fill_super()
It's used to superblock ->s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reviewed-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
CC: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:28 -04:00
Nick Piggin 75de46b98d fix setattr error handling in sysfs, configfs
sysfs and configfs setattr functions have error cases after the generic inode's
attributes have been changed. Fix consistency by changing the generic inode
attributes only when it is guaranteed to succeed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-06-04 13:27:53 -07:00
Alex Elder 1bf7dbfde8 Merge branch 'master' into for-linus 2010-06-04 13:22:30 -05:00
Sage Weil 1e5ea23df1 ceph: fix lease revocation when seq doesn't match
If the client revokes a lease with a higher seq than what we have, keep
the mds's seq, so that it honors our release.  Otherwise, we can hang
indefinitely.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-04 10:05:40 -07:00
Linus Torvalds 03cd373981 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix page refcount leak
2010-06-03 07:20:28 -07:00
Jens Axboe ff9da691c0 pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
This changes the interface to be based on bytes instead. The API
matches that of F_SETPIPE_SZ in that it rounds up the passed in
size so that the resulting page array is a power-of-2 in size.

The proc file is renamed to /proc/sys/fs/pipe-max-size to
reflect this change.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-03 14:54:39 +02:00
Jens Axboe 419f8367ea pipe: change the privilege required for growing a pipe beyond system max
Change it to CAP_SYS_RESOURCE, as that more accurately models what
we want to control.

Suggested-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-03 12:45:28 +02:00
Jens Axboe 6a6ca57de9 pipe: adjust minimum pipe size to 1 page
We don't need to pages to guarantee the POSIX requirement
that upto a page size write must be atomic to an empty
pipe.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-03 12:44:30 +02:00
David Woodhouse e72e6497e7 jffs2: Fix NFS race by using insert_inode_locked()
New inodes need to be locked as we're creating them, so they don't get used
by other things (like NFSd) before they're ready.

Pointed out by Al Viro.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-06-03 08:22:04 +01:00
David Woodhouse f324e4cb2c jffs2: Fix in-core inode leaks on error paths
Pointed out by Al Viro.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-06-03 08:03:39 +01:00
Christoph Hellwig f936972949 xfs: improve xfs_isilocked
Use rwsem_is_locked to make the assertations for shared locks work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-06-03 16:22:29 +10:00
Christoph Hellwig 070ecdca54 xfs: skip writeback from reclaim context
Allowing writeback from reclaim context causes massive problems with stack
overflows as we can call into the writeback code which tends to be a heavy
stack user both in the generic code and XFS from random contexts that
perform memory allocations.

Follow the example of btrfs (and in slightly different form ext4) and refuse
to write out data from reclaim context.  This issue should really be handled
by the VM so that we can tune better for this case, but until we get it
sorted out there we have to hack around this in each filesystem with a
complex writeback path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-06-03 16:22:29 +10:00
Dave Chinner 5b257b4a1f xfs: fix race in inode cluster freeing failing to stale inodes
When an inode cluster is freed, it needs to mark all inodes in memory as
XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes
have a different life cycle to the buffer, and once the buffer is torn down
during transaction completion, we must ensure none of the inodes get written
back (which is what XFS_ISTALE does).

Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being
marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these
inodes either during inode reclaim or tail pushing on the AIL.  The buffer is
read back, but no longer contains inodes and so triggers assert failures and
shutdowns. This was reproducable with at run.dbench10 invocation from xfstests.

There are two main causes of xfs_ifree_cluster() failing. The first is simple -
it checks in-memory inodes it finds in the per-ag icache to see if they are
clean without holding the flush lock. if they are clean it skips them
completely. However, If an inode is flushed delwri, it will
appear clean, but is not guaranteed to be written back until the flush lock has
been dropped. Hence we may have raced on the clean check and the inode may
actually be dirty. Hence always mark inodes found in memory stale before we
check properly if they are clean.

The second is more complex, and makes the first problem easier to hit.
Basically the in-memory inode scan is done with full knowledge it can be racing
with inode flushing and AIl tail pushing, which means that inodes that it can't
get the flush lock on might not be attached to the buffer after then in-memory
inode scan due to IO completion occurring. This is actually documented in the
code as "needs better interlocking". i.e. this is a zero-day bug.

Effectively, the in-memory scan must be done while the inode buffer is locked
and Io cannot be issued on it while we do the in-memory inode scan. This
ensures that inodes we couldn't get the flush lock on are guaranteed to be
attached to the cluster buffer, so we can then catch all in-memory inodes and
mark them stale.

Now that the inode cluster buffer is locked before the in-memory scan is done,
there is no need for the two-phase update of the in-memory inodes, so simplify
the code into two loops and remove the allocation of the temporary buffer used
to hold locked inodes across the phases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-06-03 16:22:29 +10:00
Theodore Ts'o 1f5a81e41f ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files
Dan Roseberg has reported a problem with the MOVE_EXT ioctl.  If the
donor file is an append-only file, we should not allow the operation
to proceed, lest we end up overwriting the contents of an append-only
file.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
2010-06-02 22:04:39 -04:00
Sage Weil 558d3499bd ceph: fix f_namelen reported by statfs
We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX.
That disagrees with ceph_lookup behavior (which checks against NAME_MAX),
and also makes the pjd posix test suite spit out ugly errors because with
can't clean up its temporary files.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01 16:56:03 -07:00
Yehuda Sadeh 205475679a ceph: fix memory leak in statfs
Freeing the statfs request structure when required.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01 16:56:02 -07:00
Henry C Chang 13a4214cd9 ceph: fix d_subdirs ordering problem
We misused list_move_tail() to order the dentry in d_subdirs.
This will screw up the d_subdirs order.

This bug can be reliably reproduced by:
1. mount ceph fs.
2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git
3. Run autogen.sh in ceph directory.
(Note: Errors only occur at the first time you run autogen.sh.)

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01 16:55:55 -07:00
Christoph Hellwig b160fdabe9 nfsd: nfsd_setattr needs to call commit_metadata
The conversion of write_inode_now calls to commit_metadata in commit
f501912a35 missed out the call in nfsd_setattr.

But without this conversion we can't guarantee that a SETATTR request
has actually been commited to disk with XFS, which causes a regression
from 2.6.32 (only for NFSv2, but anyway).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-01 19:17:50 -04:00
Dan Carpenter 08a66859e6 FS-Cache: Remove unneeded null checks
fscache_write_op() makes unnecessary checks of the page variable to see if it
is NULL.  It can't be NULL at those points as the kernel would already have
crashed a little higher up where we examined page->index.

Furthermore, unless radix_tree_gang_lookup_tag() can return 1 but no page, a
NULL pointer crash should not be encountered there as we can only get there if
r_t_g_l_t() returned 1.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-01 13:32:11 -07:00
Jeff Layton 06b43672a9 cifs: fix page refcount leak
Commit 315e995c63 is causing OOM kills
when stress-testing a CIFS filesystem. The VFS readpages operation takes
a page reference. The older code just handed this reference off to the
page cache, but the new code takes an extra one. The simplest fix is to
put the new reference after add_to_page_cache_lru.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-06-01 17:15:52 +00:00
Denis Kirjanov 037776fcbe AFS: Fix possible null pointer dereference in afs_alloc_server()
Fix a possible null pointer dereference in afs_alloc_server(): the server
pointer is NULL if there was an allocation failure, and under such a
condition, we can't dereference it in the _leave() statement.

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-01 09:26:36 -07:00
Takuya Yoshikawa e30c7c3b30 binfmt_elf_fdpic: Fix clear_user() error handling
clear_user() returns the number of bytes that could not be copied rather than
an error code.  So we should return -EFAULT rather than directly returning the
results.

Without this patch, positive values may be returned to elf_fdpic_map_file()
and the following error handlings do not function as expected.

1.
	ret = elf_fdpic_map_file_constdisp_on_uclinux(params, file, mm);
	if (ret < 0)
		return ret;
2.
	ret = elf_fdpic_map_file_by_direct_mmap(params, file, mm);
	if (ret < 0)
		return ret;

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
CC: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-01 08:11:06 -07:00
Jens Axboe b4ca761577 Merge branch 'master' into for-linus
Conflicts:
	fs/pipe.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 12:42:12 +02:00
Jens Axboe 0e3c9a2284 Revert "writeback: fix WB_SYNC_NONE writeback from umount"
This reverts commit e913fc825d.

We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.

Conflicts:

	fs/fs-writeback.c
	mm/page-writeback.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:08:43 +02:00
Jens Axboe f17625b318 Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
This reverts commit 7c8a3554c6.

We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:05:22 +02:00
Ryusuke Konishi c29684d683 nilfs2: remove obsolete declarations of cache constructor and destructor
The commit 41c88bd7 ("nilfs2: cleanup multi
kmem_cache_{create,destroy} code") consolidated slab constructors and
destructors used in nilfs, but it left some declarations in header
files.

This gets rid of the obsolete declarations.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-31 20:50:29 +09:00
Ryusuke Konishi 84cb099985 nilfs2: fix style issue in nilfs_destroy_cachep
This gets rid of unwanted space chars in front of conditional
sentences of nilfs_destroy_cachep().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-31 20:50:29 +09:00
Linus Torvalds 003386fff3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  mm: export generic_pipe_buf_*() to modules
  fuse: support splice() reading from fuse device
  fuse: allow splice to move pages
  mm: export remove_from_page_cache() to modules
  mm: export lru_cache_add_*() to modules
  fuse: support splice() writing to fuse device
  fuse: get page reference for readpages
  fuse: use get_user_pages_fast()
  fuse: remove unneeded variable
2010-05-30 09:16:14 -07:00
Linus Torvalds d28619f156 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Convert quota statistics to generic percpu_counter
  ext3 uses rb_node = NULL; to zero rb_root.
  quota: Fixup dquot_transfer
  reiserfs: Fix resuming of quotas on remount read-write
  pohmelfs: Remove dead quota code
  ufs: Remove dead quota code
  udf: Remove dead quota code
  quota: rename default quotactl methods to dquot_
  quota: explicitly set ->dq_op and ->s_qcop
  quota: drop remount argument to ->quota_on and ->quota_off
  quota: move unmount handling into the filesystem
  quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
  quota: move remount handling into the filesystem
  ocfs2: Fix use after free on remount read-only

Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c
2010-05-30 09:11:11 -07:00
Linus Torvalds b612a05537 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: clean up on forwarded aborted mds request
  ceph: fix leak of osd authorizer
  ceph: close out mds, osd connections before stopping auth
  ceph: make lease code DN specific
  fs/ceph: Use ERR_CAST
  ceph: renew auth tickets before they expire
  ceph: do not resend mon requests on auth ticket renewal
  ceph: removed duplicated #includes
  ceph: avoid possible null dereference
  ceph: make mds requests killable, not interruptible
  sched: add wait_for_completion_killable_timeout
2010-05-30 08:56:39 -07:00
Sage Weil 2a8e5e3637 ceph: clean up on forwarded aborted mds request
If an mds request is aborted (timeout, SIGKILL), it is left registered to
keep our state in sync with the mds.  If we get a forward notification,
though, we know the request didn't succeed and we can unregister it
safely.  We were trying to resend it, but then bailing out (and not
unregistering) in __do_request.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:42:05 -07:00
Sage Weil 79494d1b9b ceph: fix leak of osd authorizer
Release the ceph_authorizer when releasing osd state.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:42:04 -07:00
Sage Weil a922d38fd1 ceph: close out mds, osd connections before stopping auth
The auth module (part of the mon_client) is needed to free any
ceph_authorizer(s) used by the mds and osd connections.  Flush the msgr
workqueue before stopping monc to ensure that the destroy_authorizer
auth op is available when those connections are closed out.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:42:03 -07:00
Sage Weil dd1c905736 ceph: make lease code DN specific
The lease code includes a mask in the CEPH_LOCK_* namespace, but that
namespace is changing, and only one mask (formerly _DN == 1) is used, so
hard code for that value for now.

If we ever extend this code to handle leases over different data types we
can extend it accordingly.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:42 -07:00
Julia Lawall 7e34bc524e fs/ceph: Use ERR_CAST
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of
the returned value is the same as the type of the enclosing function.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:41 -07:00
Sage Weil a41359fa35 ceph: renew auth tickets before they expire
We were only requesting renewal after our tickets expire; do so before
that.  Most of the low-level logic for this was already there; just use
it.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:39 -07:00
Sage Weil 09c4d6a7d4 ceph: do not resend mon requests on auth ticket renewal
We only want to send pending mon requests when we successfully
authenticate.  If we are already authenticated, like when we renew our
ticket, there is no need to resend pending requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:38 -07:00
Andrea Gelmini 984c76908e ceph: removed duplicated #includes
fs/ceph/auth.c: linux/slab.h is included more than once.
fs/ceph/super.h: linux/slab.h is included more than once.

Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:37 -07:00
Sage Weil e95e9a7ae4 ceph: avoid possible null dereference
ac->ops may be null; use protocol id in error message instead.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:36 -07:00
Sage Weil aa91647c89 ceph: make mds requests killable, not interruptible
The underlying problem is that many mds requests can't be restarted.  For
example, a restarted create() would return -EEXIST if the original request
succeeds.  However, we do not want a hung MDS to hang the client too.  So,
use the _killable wait_for_completion variants to abort on SIGKILL but
nothing else.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:35 -07:00
Linus Torvalds 9a90e09854 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
  ACPI: Don't let acpi_pad needlessly mark TSC unstable
  drivers/acpi/sleep.h: Checkpatch cleanup
  ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion
  ACPI: delete unused c-state promotion/demotion data strucutures
  ACPI: video: fix acpi_backlight=video
  ACPI: EC: Use kmemdup
  drivers/acpi: use kasprintf
  ACPI, APEI, EINJ injection parameters support
  Add x64 support to debugfs
  ACPI, APEI, Use ERST for persistent storage of MCE
  ACPI, APEI, Error Record Serialization Table (ERST) support
  ACPI, APEI, Generic Hardware Error Source memory error support
  ACPI, APEI, UEFI Common Platform Error Record (CPER) header
  Unified UUID/GUID definition
  ACPI Hardware Error Device (PNP0C33) support
  ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup
  ACPI, APEI, Document for APEI
  ACPI, APEI, EINJ support
  ACPI, APEI, HEST table parsing
  ACPI, APEI, APEI supporting infrastructure
  ...
2010-05-28 14:42:18 -07:00
Christoph Hellwig fb3b504ade xfs: fix access to upper inodes without inode64
If a filesystem is mounted without the inode64 mount option we
should still be able to access inodes not fitting into 32 bits, just
not created new ones.  For this to work we need to make sure the
inode cache radix tree is initialized for all allocation groups, not
just those we plan to allocate inodes from.  This patch makes sure
we initialize the inode cache radix tree for all allocation groups,
and also cleans xfs_initialize_perag up a bit to separate the
inode32 logical from the general perag structure setup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:56 -05:00
Dave Chinner 9b98b6f3e1 xfs: fix might_sleep() warning when initialising per-ag tree
The use of radix_tree_preload() only works if the radix tree was
initialised without the __GFP_WAIT flag. The per-ag tree uses
GFP_NOFS, so does not trigger allocation of new tree nodes from the
preloaded array. Hence it enters the allocator with a spinlock held
and triggers the might_sleep() warnings.

Reported-by; Chris Mason <chris.mason@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:50 -05:00
Julia Lawall 38e712ab3d fs/xfs/quota: Add missing mutex_unlock
Add a mutex_unlock missing on the error path.  The use of this lock
is balanced elsewhere in the file.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* mutex_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* mutex_unlock(E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:41 -05:00
Huang Weiyi 3bd0946eb1 xfs: remove duplicated #include
Remove duplicated #include('s) in
  fs/xfs/linux-2.6/xfs_quotaops.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:36 -05:00
Li Zefan f8adb4d574 xfs: convert more trace events to DEFINE_EVENT
Use DECLARE_EVENT_CLASS, and save ~15K:

   text    data     bss     dec     hex filename
 171949   43028      48  215025   347f1 fs/xfs/linux-2.6/xfs_trace.o.orig
 156521   43028      36  199585   30ba1 fs/xfs/linux-2.6/xfs_trace.o

No change in functionality.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:31 -05:00
Huang Weiyi 292ec4cf35 xfs: xfs_trace.c: remove duplicated #include
Remove duplicated #include('s) in
  fs/xfs/linux-2.6/xfs_trace.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:24 -05:00
Dave Chinner 07f1a4f5e8 xfs: Check new inode size is OK before preallocating
The new xfsqa test 228 tries to preallocate more space than the
filesystem contains. it should fail, but instead triggers an assert
about lock flags.  The failure is due to the size extension failing
in vmtruncate() due to rlimit being set. Check this before we start
the preallocation to avoid allocating space that will never be used.

Also the path through xfs_vn_allocate already holds the IO lock, so
it should not be present in the lock flags when the setattr fails.
Hence the assert needs to take this into account. This will prevent
other such callers from hitting this incorrect ASSERT.

(Fixed a reference to "newsize" to read "new_size". -Alex)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 15:19:12 -05:00
Christoph Hellwig fdc07f44c8 xfs: clean up xlog_align
Add suggested cleanups to commit 29db3370a1369541d58d692fbfb168b8a0bd7f41
from review that didn't end up being commited.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 14:58:36 -05:00
Christoph Hellwig 025101dca4 xfs: cleanup log reservation calculactions
Instead of having small helper functions calling big macros do the
calculations for the log reservations directly in the functions.
These are mostly 1:1 from the macros execept that the macros kept
the quota calculations in their callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 14:58:30 -05:00
Eric Sandeen 32891b292d xfs: be more explicit if RT mount fails due to config
Recent testers were slightly confused that a realtime mount failed
due to missing CONFIG_XFS_RT; we can make that a little more
obvious.

V2: drop the else as suggested by Christoph

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 14:58:24 -05:00
Eric Sandeen 657a4cffde xfs: replace E2BIG with EFBIG where appropriate
Many places in the xfs code return E2BIG when they really mean
EFBIG; trying to grow past 16T on a 32 bit machine, for example,
says "Argument list too long" rather than "File too large" which is
not particularly helpful.

Some of these don't make perfect sense as EFBIG either, but still
better than E2BIG IMHO.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-28 14:58:16 -05:00
Al Viro 49837a80b3 remove detritus left by "mm: make read_cache_page synchronous"
gets minix get_dir_page() in sync with its analogs; back in 2007
Nick has switched read_cache_page() and friends to sync behaviour
(i.e.  they wait for the page to get unlocked, check if it's uptodate
and if it isn't return ERR_PTR(-EIO) instead) and removed the
duplicate logics from the callers.  In case of fs/minix/dir.c he'd
removed only half of that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-28 11:37:41 -04:00
Al Viro 4c9002de32 fix fs/sysv s_dirt handling
got broken on ->sync_fs() conversion a year ago, nobody noticed...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:16:05 -04:00
npiggin@suse.de 459f6ed3b8 fat: convert to use the new truncate convention.
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:16:02 -04:00
npiggin@suse.de 737f2e93b9 ext2: convert to use the new truncate convention.
I also have commented a possible bug in existing ext2 code, marked with XXX.

Cc: linux-ext4@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:57 -04:00
Nick Piggin 3322e79a38 fs: convert simple fs to new truncate
Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate
sequence.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:47 -04:00
npiggin@suse.de 15c6fd9786 kill spurious reference to vmtruncate
Lots of filesystems calls vmtruncate despite not implementing the old
->truncate method.  Switch them to use simple_setsize and add some
comments about the truncate code where it seems fitting.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:42 -04:00
npiggin@suse.de 7bb46a6734 fs: introduce new truncate sequence
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:33 -04:00
Randy Dunlap 7000d3c424 fs/super: fix kernel-doc warning
Fix fs/super.c kernel-doc warning and function notation:
Warning(fs/super.c:957): No description found for parameter 'sb'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:06:23 -04:00
Erik van der Kouwe 0ab7620a0c fs/minix: bugfix, number of indirect block ptrs per block depends on block size
The MINIX filesystem driver used a constant number of indirect block
pointers in an indirect block. This worked only for filesystems with 1kb
block, while the MINIX default block size is now 4kb. As a consequence,
large files were read incorrectly on such filesystems and writing a
large file would cause the filesystem to become corrupted. This patch
computes the number of indirect block pointers based on the block size,
making the driver work for each block size.

I would like to thank Feiran Zheng ('Fam') for pointing out the cause
of the corruption.

Signed-off-by: Erik van der Kouwe <vdkouwe@cs.vu.nl>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:06:22 -04:00
Christoph Hellwig 1b061d9247 rename the generic fsync implementations
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:06:06 -04:00
Christoph Hellwig 7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Julia Lawall cc967be547 fs: Add missing mutex_unlock
Add a mutex_unlock missing on the error path.  At other exists from the
function that return an error flag, the mutex is unlocked, so do the same
here.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* mutex_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* mutex_unlock(E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:03:09 -04:00
Al Viro d7065da038 get rid of the magic around f_count in aio
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req->ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:03:07 -04:00
Neil Brown 176306f59a VFS: fix recent breakage of FS_REVAL_DOT
Commit 1f36f774b2 broke FS_REVAL_DOT semantics.

In particular, before this patch, the command
   ls -l
in an NFS mounted directory would always check if the directory on the server
had changed and if so would flush and refill the pagecache for the dir.
After this patch, the same "ls -l" will repeatedly return stale date until
the cached attributes for the directory time out.

The following patch fixes this by ensuring the d_revalidate is called by
do_last when "." is being looked-up.
link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN
is not set so nfs_lookup_verify_inode chooses not to do any validation.

The following patch restores the original behaviour.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:03:06 -04:00
Al Viro 1eb2cbb6d5 Revert "anon_inode: set S_IFREG on the anon_inode"
This reverts commit a7cf4145bb.
2010-05-27 22:03:05 -04:00
Linus Torvalds 105a048a4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits)
  Btrfs: add more error checking to btrfs_dirty_inode
  Btrfs: allow unaligned DIO
  Btrfs: drop verbose enospc printk
  Btrfs: Fix block generation verification race
  Btrfs: fix preallocation and nodatacow checks in O_DIRECT
  Btrfs: avoid ENOSPC errors in btrfs_dirty_inode
  Btrfs: move O_DIRECT space reservation to btrfs_direct_IO
  Btrfs: rework O_DIRECT enospc handling
  Btrfs: use async helpers for DIO write checksumming
  Btrfs: don't walk around with task->state != TASK_RUNNING
  Btrfs: do aio_write instead of write
  Btrfs: add basic DIO read/write support
  direct-io: do not merge logically non-contiguous requests
  direct-io: add a hook for the fs to provide its own submit_bio function
  fs: allow short direct-io reads to be completed via buffered IO
  Btrfs: Metadata ENOSPC handling for balance
  Btrfs: Pre-allocate space for data relocation
  Btrfs: Metadata ENOSPC handling for tree log
  Btrfs: Metadata reservation for orphan inodes
  Btrfs: Introduce global metadata reservation
  ...
2010-05-27 10:43:44 -07:00
Linus Torvalds e4ce30f377 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: Make fsync sync new parent directories in no-journal mode
  ext4: Drop whitespace at end of lines
  ext4: Fix compat EXT4_IOC_ADD_GROUP
  ext4: Conditionally define compat ioctl numbers
  tracing: Convert more ext4 events to DEFINE_EVENT
  ext4: Add new tracepoints to track mballoc's buddy bitmap loads
  ext4: Add a missing trace hook
  ext4: restart ext4_ext_remove_space() after transaction restart
  ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
  ext4: Avoid crashing on NULL ptr dereference on a filesystem error
  ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
  ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
  ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
  ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
  ext4: Use our own write_cache_pages()
  ext4: Show journal_checksum option
  ext4: Fix for ext4_mb_collect_stats()
  ext4: check for a good block group before loading buddy pages
  ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
  ext4: Remove extraneous newlines in ext4_msg() calls
  ...

Fixed up trivial conflict in fs/ext4/fsync.c
2010-05-27 10:26:37 -07:00
Linus Torvalds ade61088bc Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix another nfs_wb_page() deadlock
  NFS: Ensure that we mark the inode as dirty if we exit early from commit
  NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker
  sunrpc: fix leak on error on socket xprt setup
2010-05-27 10:18:44 -07:00
Dmitry Monakhov f32764bd2b quota: Convert quota statistics to generic percpu_counter
Generic per-cpu counter has some memory overhead but it is negligible for
modern systems and embedded systems compile without quota support.  And code
reuse is a good thing. This patch should fix complain from preemptive kernels
which was introduced by dde9588853.

[Jan Kara: Fixed patch to work on 32-bit archs as well]

Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27 18:56:27 +02:00