Commit Graph

27460 Commits

Author SHA1 Message Date
Ryusuke Konishi fbb24a3a91 nilfs2: ensure proper cache clearing for gc-inodes
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.

Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes.  Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.

For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number.  They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.

However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset.  Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.

This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.

  nilfs_palloc_freev: entry number 307234 already freed.
  ...

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>	[2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:35 -07:00
Linus Torvalds d865983292 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs compile warning fixes from Chris Mason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: cast devid to unsigned long long for printk %llu
  Btrfs: init old_generation in get_old_root
2012-06-16 17:01:41 -07:00
Linus Torvalds 873b779d99 NFS client bugfixes for Linux 3.5
Highlights include:
 
  - Fix a couple of mount regressions due to the recent cleanups.
  - Fix an Oops in the open recovery code
  - Fix an rpc_pipefs upcall hang that results from some of the
    net namespace work from 3.4.x (stable kernel candidate).
  - Fix a couple of write and o_direct regressions that were found
    at last weeks Bakeathon testing event in Ann Arbor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
 D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
 10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
 q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
 epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
 BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
 XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
 DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
 n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
 2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
 go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
 Ydste3pQJkS7PiZu1sWA
 =unbu
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Fix a couple of mount regressions due to the recent cleanups.
   - Fix an Oops in the open recovery code
   - Fix an rpc_pipefs upcall hang that results from some of the net
     namespace work from 3.4.x (stable kernel candidate).
   - Fix a couple of write and o_direct regressions that were found at
     last weeks Bakeathon testing event in Ann Arbor."

* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add an endian notation for sparse
  NFSv4.1: integer overflow in decode_cb_sequence_args()
  rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
  NFSv4 do not send an empty SETATTR compound
  NFSv2: EOF incorrectly set on short read
  NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
  NFS: fix directio refcount bug on commit
  NFSv4: Fix unnecessary delegation returns in nfs4_do_open
  NFSv4.1: Convert another trivial printk into a dprintk
  NFS4: Fix open bug when pnfs module blacklisted
  NFS: Remove incorrect BUG_ON in nfs_found_client
  NFS: Map minor mismatch error to protocol not support error.
  NFS: Fix a commit bug
  NFS4: Set parsed mount data version to 4
  NFSv4.1: Ensure we clear session state flags after a session creation
  NFSv4.1: Convert a trivial printk into a dprintk
  NFSv4: Fix up decode_attr_mdsthreshold
  NFSv4: Fix an Oops in the open recovery code
  NFSv4.1: Fix a request leak on the back channel
2012-06-15 17:37:23 -07:00
Linus Torvalds 93dd048dbd Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from J. Bruce Fields.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux:
  nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
  NFS: hard-code init_net for NFS callback transports
2012-06-15 17:27:31 -07:00
Chris Mason a8c4a33b98 Btrfs: cast devid to unsigned long long for printk %llu
Avoid warning in 32 bit machines

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:07:17 -04:00
Chris Mason 4325edd078 Btrfs: init old_generation in get_old_root
gcc was giving an uninit variable warning here.  Strictly
speaking we don't need to init it, but this will make things
much less error prone.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:06:54 -04:00
Linus Torvalds 718f58ad61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "The dates look like I had to rebase this morning because there was a
  compiler warning for a printk arg that I had missed earlier.

  These are all fixes, including one to prevent using stale pointers for
  device names, and lots of fixes around transaction abort cleanups
  (Josef, Liu Bo).

  Jan Schmidt also sent in a number of fixes for the new reference
  number tracking code.

  Liu Bo beat me to updating the MAINTAINERS file.  Since he thought to
  also fix the git url, I kept his commit."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
  Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM
  Btrfs: destroy the items of the delayed inodes in error handling routine
  Btrfs: make sure that we've made everything in pinned tree clean
  Btrfs: avoid memory leak of extent state in error handling routine
  Btrfs: do not resize a seeding device
  Btrfs: fix missing inherited flag in rename
  Btrfs: fix incompat flags setting
  Btrfs: fix defrag regression
  Btrfs: call filemap_fdatawrite twice for compression
  Btrfs: keep inode pinned when compressing writes
  Btrfs: implement ->show_devname
  Btrfs: use rcu to protect device->name
  Btrfs: unlock everything properly in the error case for nocow
  Btrfs: fix btrfs_destroy_marked_extents
  Btrfs: abort the transaction if the commit fails
  Btrfs: wake up transaction waiters when aborting a transaction
  Btrfs: fix locking in btrfs_destroy_delayed_refs
  Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
  Btrfs: fix race in tree mod log addition
  Btrfs: add btrfs_next_old_leaf
  ...
2012-06-15 16:04:37 -07:00
Miao Xie 67cde3448d Btrfs: destroy the items of the delayed inodes in error handling routine
the items of the delayed inodes were forgotten to be freed, this patch
fixes it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:28 -04:00
Liu Bo ed0eaa1498 Btrfs: make sure that we've made everything in pinned tree clean
Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo 6e841e32b1 Btrfs: avoid memory leak of extent state in error handling routine
We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:

WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
...

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo 4e42ae1bdc Btrfs: do not resize a seeding device
Seeding devices are not supposed to change any more.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:26 -04:00
Liu Bo bc1782374b Btrfs: fix missing inherited flag in rename
When we move a file into a directory with compression flag, we need to
inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
But if we move a file into a directory without compression flag, we need
to clear both of them.

It is the way how our setflags deals with compression flag, so keep
the same behaviour here.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:33:30 -04:00
Chris Mason acbcabd2de Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus 2012-06-15 11:33:16 -04:00
Li Zefan 69e380d176 Btrfs: fix incompat flags setting
It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
has only one bit set.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:57 -04:00
Li Zefan 6c282eb40e Btrfs: fix defrag regression
If a file has 3 small extents:

| ext1 | ext2 | ext3 |

Running "btrfs fi defrag" will only defrag the last two extents, if those
extent mappings hasn't been read into memory from disk.

This bug was introduced by commit 17ce6ef8d7
("Btrfs: add a check to decide if we should defrag the range")

The cause is, that commit looked into previous and next extents using
lookup_extent_mapping() only.

While at it, remove the code that checks the previous extent, since
it's sufficient to check the next extent.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:55 -04:00
Josef Bacik 7ddf5a42d3 Btrfs: call filemap_fdatawrite twice for compression
I removed this in an earlier commit and I was wrong.  Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet.  So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways.  So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:54 -04:00
Josef Bacik 8180ef8894 Btrfs: keep inode pinned when compressing writes
A user reported lots of problems using compression on the new code and it
turns out part of the problem was that igrab() was failing when we added a
new ordered extent.  This is because when writing out an inode under
compression we immediately return without actually doing anything to the
pages, and then in another thread at some point down the line actually do
the ordered dance.  The problem is between the point that we start writeback
and we actually add the ordered extent we could be trying to reclaim the
inode, which makes igrab() return NULL.  So we need to do an igrab() when we
create the async extent and then drop it when we are done with it.  This
makes sure we stay pinned in memory until the ordered extent can get a
reference on it and we are good to go.  With this patch we no longer panic
in btrfs_finish_ordered_io().  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:53 -04:00
Josef Bacik 9c5085c147 Btrfs: implement ->show_devname
Because btrfs can remove the device that was mounted we need to have a
->show_devname so that in this case we can print out some other device in
the file system to /proc/mount.  So if there are multiple devices in a btrfs
file system we will just print the device with the lowest devid that we can
find.  This will make everything consistent and deal with device removal
properly.  The drawback is if you mount with a device that is higher than
the lowest devicd it won't show up as the mounted device in /proc/mounts,
but this is a small price to pay. This was inspired by Miao Xie's patch.
Thanks,

Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:37 -04:00
Josef Bacik 606686eeac Btrfs: use rcu to protect device->name
Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory.  Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock().  This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch.  Thanks,

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:16 -04:00
Josef Bacik 17ca04aff7 Btrfs: unlock everything properly in the error case for nocow
I was getting hung on umount when a transaction was aborted because a range
of one of the free space inodes was still locked.  This is because the nocow
stuff doesn't unlock anything on error.  This fixed the problem and I
verified that is what was happening.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:15 -04:00
Josef Bacik ee670f0af3 Btrfs: fix btrfs_destroy_marked_extents
So we're forcing the eb's to have their ref count set to 1 so invalidatepage
works but this breaks lots of things, for example root nodes, and is just
plain wrong, we don't need to just evict all of this stuff.  Also drop the
invalidatepage altogether and add a page_cache_release().  With this patch
we no longer hang when trying to access the root nodes after an aborted
transaction and we no longer leak memory.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:14 -04:00
Josef Bacik 7b8b92af58 Btrfs: abort the transaction if the commit fails
If a transaction commit fails we don't abort it so we don't set an error on
the file system.  This patch fixes that by actually calling the abort stuff
and then adding a check for a fs error in the transaction start stuff to
make sure it is caught properly.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:13 -04:00
Josef Bacik d7096fc3ef Btrfs: wake up transaction waiters when aborting a transaction
I was getting lots of hung tasks and a NULL pointer dereference because we
are not cleaning up the transaction properly when it aborts.  First we need
to reset the running_transaction to NULL so we don't get a bad dereference
for any start_transaction callers after this.  Also we cannot rely on
waitqueue_active() since it's just a list_empty(), so just call wake_up()
directly since that will do the barrier for us and such.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:12 -04:00
Josef Bacik b939d1ab76 Btrfs: fix locking in btrfs_destroy_delayed_refs
The transaction abort stuff was throwing warnings from the list debugging
code because we do a list_del_init outside of the delayed_refs spin lock.
The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
but we need to take the ref head mutex to make sure it's not being processed
currently, and so if it is we need to drop the spin lock and then take and
drop the mutex and do the search again.  If we can take the mutex then we
can safely remove the head from the list and carry on.  Now when the
transaction aborts I don't get the list debugging warnings.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:11 -04:00
Josef Bacik beb42dd793 Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
While doing my enospc work I got a transaction abortion that resulted in a
panic when we tried to unlock_page() an already unlocked page.  This is
because we aren't calling extent_clear_unlock_delalloc with the locked page
so it was unlocking all the pages in the range.  This is wrong since
__extent_writepage expects to have the page locked still unless we return
*page_started as 1.  This should keep us from panicing.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:09 -04:00
J. Bruce Fields bc2df47a40 nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.

Introduced by 508dc6e110 "nfsd41:
free_session/free_client must be called under the client_lock".

Cc: stable@kernel.org
Cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:54:08 -04:00
Stanislav Kinsbursky 12918b10d5 NFS: hard-code init_net for NFS callback transports
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed
to the point already. So, dereferencing of it is invalid.
This patch hard-code "init_net" for all network namespace references for NFS
callback services. This will be fixed with proper NFS callback
containerization.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:53:43 -04:00
Jan Schmidt 3310c36eef Btrfs: fix race in tree mod log addition
When adding to the tree modification log, we grab two locks at different
stages. We must not drop the outer lock until we're done with section
protected by the inner lock. This moves the unlock call for the outer lock
to the appropriate position.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:39 +02:00
Jan Schmidt 3d7806eca4 Btrfs: add btrfs_next_old_leaf
To make sense of the tree mod log, the backref walker not only needs
btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
calling btrfs_search_slot. This obviously didn't give the correct result.

This commit adds btrfs_next_old_leaf, a drop-in replacement for
btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
with this time_seq parameter.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:09 +02:00
Jan Schmidt a95236d99f Btrfs: fix return value for __tree_mod_log_oldest_root
In __tree_mod_log_oldest_root() we must return the found operation even if
it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there
are no operations to be rewinded and returns immediately.

The code in the caller is modified to improve readability.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:22 +02:00
Jan Schmidt 8ba97a15e7 Btrfs: use btrfs_read_lock_root_node in get_old_root
get_old_root could race with root node updates because we weren't locking
the node early enough. Use btrfs_read_lock_root_node to grab the root locked
in the very beginning and release the lock as soon as possible (just like
btrfs_search_slot does).

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:21 +02:00
Jan Schmidt f617e2fd52 Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_ref
When resolving indirect refs, we used to call btrfs_next_leaf in case we
didn't find an exact match. While we should find exact matches most of the
time, in case we don't, we must continue searching. Treating those matches
differently depending on the level we're searching doesn't make sense.

Even worse, we might end up searching for a key larger than the largest, in
which case there is no next_leaf and subsequent jobs would fail. This commit
drops the bogous lines.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:20 +02:00
Linus Torvalds 266ae4e615 fix unbalanced wb->list_lock in 3.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0tJwAAoJECvKgwp+S8JaSVAP/1FKw7u5E9QzOW8kPCj58nig
 FVF3BgaYgVrJglRjNOGTSfhJl3TVHGTzEMGICtsKUvXmgs7VmMudyWFuyokxfrZi
 z70DIbSuPDOMEt31MXiACq9z4E2IrZZ2kTYpRVd+zCV/2s4p789ejDLxOkk2ikpt
 M4TcIMPCAU2e4/ljRjNEQkg8d23ehdJsH9Hg8k56Jtb57e0LNpwnaZ0FzFKCzYh0
 Orm2dm9375vyzXZ2WNvgzl8ClJdEjkCyTq79i/39aQs4WeXsEP3nk+PFZYd+jrYA
 XwvV1sHJ43BOCCWwYS5dP0i9uGnRVlh/F2tyWWSSU2OizHzUXuOUWfXgY1JWyzaT
 uwEoCa8atYUOD7gp6ndnNR3uqBXtw0SihESst5fC7rKWuKOl62XToD6Pg6rPMex4
 ZuRaUw4Ts4efbX0dw3iDHMrS6Cf7a0JH52QOEYovrge4mDjiJzClHR69hsmVyoVb
 7s3j0q+mhur+NcnMkvC0C+pHn/m18q1i/yIBrD2K/UoLVGA17QMIZGoKfGnkeoaF
 a4fcoJ7fQE09tJ9KV1NLEdeD9PXQtETcNIAIZCfBCEJP9CPhHaUYkky+sOMcvKTr
 xEAe4f4WiHY7mTy8CRcDQjiOBigJe7Ksm+0qElauGcYw5Kqqrpdap1yPKbG4QWPL
 ABVYJyOvKSC3HLA6i7Gi
 =RBRw
 -----END PGP SIGNATURE-----

Merge tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback locking fix from Wu Fengguang:
 "fix unbalanced wb->list_lock in 3.5-rc1"

* tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix lock imbalance in writeback_sb_inodes()
2012-06-12 18:28:58 +03:00
Dan Carpenter e216c8c771 NFS: add an endian notation for sparse
This is supposed to be a __be32 value.  Sparse complains a lot:

fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types)
fs/nfs/callback_xdr.c:699:30:    expected unsigned int [unsigned] status
fs/nfs/callback_xdr.c:699:30:    got restricted __be32 const [usertype] csr_status
fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32
fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types)
fs/nfs/callback_xdr.c:716:16:    expected restricted __be32
fs/nfs/callback_xdr.c:716:16:    got unsigned int [unsigned] status

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:40 -04:00
Dan Carpenter 0439f31c35 NFSv4.1: integer overflow in decode_cb_sequence_args()
This seems like it could overflow on 32 bits.  Use kmalloc_array() which
has overflow protection built in.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:36 -04:00
Randy Dunlap b84297197c exofs: fix sparse non-ANSI function warning
Fix sparse non-ANSI function warning:

  fs/exofs/sys.c:112:28: warning: non-ANSI function declaration of function 'exofs_sysfs_dbg_print'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-12 06:33:22 +03:00
Andy Adamson 2669940db8 NFSv4 do not send an empty SETATTR compound
Commit 536e43d12b ATTR_OPEN check can result in
an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to
request from the server.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:53 -04:00
Sachin Prabhu 64f9a83665 NFSv2: EOF incorrectly set on short read
In cases where the server returns fewer bytes then those requested, we
can incorrectly set the eof flag for the file. Fixing this allows the
request to be retried with updated offset and count arguments.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:00 -04:00
Bryan Schumaker c5afc8da5b NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
Older versions of nfs utils don't always pass a "vers=" mount option for
NFS.  This chould lead to attempts at using NFS v0 due to a zeroed out
nfs_parsed_mount_data struct.  I solve this by setting the default NFS
version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been
taken care of by a similar patch).

Reported-by: Joerg Roedel <joro@&bytes.org>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:38:59 -04:00
Fred Isaman 906369e43c NFS: fix directio refcount bug on commit
This reverts a hunk from commit 0427708657
"NFS: Clean up - Simplify reference counting in fs/nfs/direct.c"

The cleanups in that patch affect the write path, but by the time
processing hits commit the removed reference has been added back by
nfs_scan_commit_list().  Without this reversion, any page that is
sent to commit holds on to an unbalanced reference that is never
freed.  The immediate effect is an imbalance over the wire between
OPENs and CLOSEs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:32:45 -04:00
Jan Kara ead188f9f9 writeback: Fix lock imbalance in writeback_sb_inodes()
Fix bug introduced by 169ebd90.  We have to have wb_list_lock locked when
restarting writeback loop after having waited for inode writeback.

Bug description by Ted Tso:

  I can reproduce this fairly easily by using ext4 w/o a journal, running
  under KVM with 1024megs memory, with fsstress (xfstests #13):

  [   45.153294] =====================================
  [   45.154784] [ BUG: bad unlock balance detected! ]
  [   45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
  [   45.155591] -------------------------------------
  [   45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
  [   45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
  [   45.155591] but there are no more locks to release!

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-06-09 08:32:15 +09:00
Linus Torvalds 77249539cd ext4 bug fixes for 3.5-rc2
This update contains two bug fixes, both destined for the stable tree.
 Perhaps the most important is one which fixes ext4 when used with file
 systems originally formatted for use with ext3, but then later
 converted to take advantage of ext4.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJP0jyAAAoJENNvdpvBGATwGSIP/R0+4j/SqSncjLIEx6EvrCeY
 eyR9JYb0F44P1bKlZ6WErrupysmhPqHubnbxMg9bVMTxefN/m6tEyG+EG4MwMxFP
 mRtc5EVT0i/9k7U6s4Ey0kSnBnZqx6IBdFUR5lHrvAf7Ud920eY12Cx4oscI7Fj6
 E81lWASOkDTjzsJpzFIqcW7b257Ne1oxPAu1p290W5riSkDzle7qqT5xT+lBAbr0
 YM8ZFhznUdyv2aeuchf/dfys5YdVgSporplIjvP69bRyNr4x5B6QOUGxuX3RNhV3
 Hqf83GW8HycyAEdl4AtakNtxiMPhrIy0S/RamWqBTE2+nws3Wnkd1wN4c5HjZZad
 w5u2E/uSxedZfgGDZZriRwHCgR+2I2CEtMy45gL96o+2Cel6unN+vjVPTVbMpMUy
 3PsPoPjPVmNPLyn4vIfHlEQ5ZLhBzVpXFEsca9s+6JeHQ4dKP77fa8iq5AjzCjzA
 FYvmf1fNbDW1da81fRUOSnwuaC0JuUvOu5eQLLQ3jzz5gj+DvZvQqvleMHTXkQ3X
 hLpSXNTtXUBYeXIw5aJzh5VPJqqGzPkpJhAEbMRgCobRA1HtuAEJzZf4J7+jiWjV
 IrQpw0HIeZiK1oZW8iC8YYbIz8AakU+MnIVhvcPc3rIr5eeEtWPM/9ETx/mrO9s2
 /y0bwu84eQLIctFjzR2s
 =9oyU
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Theodore Ts'o:
 "This update contains two bug fixes, both destined for the stable tree.
  Perhaps the most important is one which fixes ext4 when used with file
  systems originally formatted for use with ext3, but then later
  converted to take advantage of ext4."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't set i_flags in EXT4_IOC_SETFLAGS
  ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
2012-06-08 11:15:31 -07:00
Linus Torvalds e72643088f Fix UBI and UBIFS - they refuse to work without debugfs. This was
broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
 Kconfig switches.
 
 Also, correct locking in 'ubi_wl_flush()' - it was extended to support
 flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0jAHAAoJECmIfjd9wqK0qN8P/1JRuky3xsX8o5S5tqVLKkzO
 4naM8bH1AuUemg8f+K+2IUyHp1Mp0lsBj7zA46+3xWXvRcZBJaWgqZFkLOuWSZn0
 ChRGKKhdwRkXokq7JHgUJCrdTLyXsBV+uuBbJ+oTPQa2+7f1mxgZuLc+MJzpoKAs
 sNdE3A2pH3hhXDJKuT6hrM2wqD/2rW5bGurW4sYBxVUJEZ793HeXu30lTffZuWHo
 ClzDtC0iWIS5NmdN0sbQL6Os3p+uiR5mjJek0eABTHAmgXvfBUWpf9ewTXYS2ulK
 zY8brtIy2N8qmXFa8ZyguZKpCdulDqRLPz/2bmX5rOJSsyogoNxHI+UlZFCfq0u/
 txA+Wm4dQZtvSM3Stws6gxuIDCX83wOgHq/gUitMsVkxo+McwevjXpZb4Ha80OVl
 Iw7ERGrbxZn+B+lTh+9jgD6kstqpefO7fBP5gkLRvHToxDKY+6sdH6b3UuE6pDJI
 SA2D1uDyk33A/dyvYZThcvsq7KMdF0mhfLuxkScSfP17hc0hnh/FSzkj3fHrADfy
 oA4Oo/jKFbOFnmByNHjzvOaqeYHVsKwzq+feDcZ7sYA5GNrVMVViLZQljQBdJWo5
 E9howusrb81AGvjCIFobntBLE8DZF5OGfgt/n+60+jo3/qern/4jeuWfGL/Q9Wcw
 x27oVmOBLjuHuG3hXlGt
 =X7QS
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Artem Bityutskiy:
 "Fix UBI and UBIFS - they refuse to work without debugfs.  This was
  broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
  Kconfig switches.

  Also, correct locking in 'ubi_wl_flush()' - it was extended to support
  flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal."

* tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs:
  UBI: correct ubi_wl_flush locking
  UBIFS: fix debugfs-less systems support
  UBI: fix debugfs-less systems support
2012-06-08 11:04:06 -07:00
Linus Torvalds 32ba9c3fca Revert "vfs: stop d_splice_alias creating directory aliases"
This reverts commit 7732a557b1 (and commit
3f50fff4da, which was a follow-up
cleanup).

We're chasing an elusive bug that Dave Jones can apparently reproduce
using his system call fuzzer tool, and that looks like some kind of
locking ordering problem on the directory i_mutex chain.  Our i_mutex
locking is rather complex, and depends on the topological ordering of
the directories, which is why we have been very wary of splicing
directory entries around.

Of course, we really don't want to ever see aliased unconnected
directories anyway, so none of this should ever happen, but this revert
aims to basically get us back to a known older state.

Bruce points to some of the previous discussion at

       http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>

and in particular a long post from Neil:

       http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>

It should be noted that it's possible that Dave's problems come from
other changes altohgether, including possibly just the fact that Dave
constantly is teachning his fuzzer new tricks.  So what appears to be a
new bug could in fact be an old one that just gets newly triggered, but
reverting these patches as "still under heavy discussion" is the right
thing regardless.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-08 10:34:03 -07:00
Trond Myklebust 2d0dbc6ae8 NFSv4: Fix unnecessary delegation returns in nfs4_do_open
While nfs4_do_open() expects the fmode argument to be restricted to
combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open()
and nfs4_proc_create will pass the nfs_open_context->mode,
which contains the full fmode_t.

This patch ensures that nfs4_do_open strips the other fmode_t bits,
fixing a problem in which the nfs4_do_open call would result in an
unnecessary delegation return.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-06-08 11:08:42 -04:00
Linus Torvalds 48d212a2ee Revert "mm: correctly synchronize rss-counters at exit/exec"
This reverts commit 40af1bbdca.

It's horribly and utterly broken for at least the following reasons:

 - calling sync_mm_rss() from mmput() is fundamentally wrong, because
   there's absolutely no reason to believe that the task that does the
   mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
   you name it.

 - calling it *after* having done mmdrop() on it is doubly insane, since
   the mm struct may well be gone now.

 - testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap.  I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 17:54:07 -07:00
Tao Ma b22b1f178f ext4: don't set i_flags in EXT4_IOC_SETFLAGS
Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to
change the i_flags automatically but fails to remove the error setting
of i_flags.  So we still have the problem of trashing state flags.
Fix this by removing the assignment.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-06-07 19:04:19 -04:00
Theodore Ts'o b0dd6b70f0 ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
Ext3 filesystems that are converted to use as many ext4 file system
features as possible will enable uninit_bg to speed up e2fsck times.
These file systems will have a native ext3 layout of inode tables and
block allocation bitmaps (as opposed to ext4's flex_bg layout).
Unfortunately, in these cases, when first allocating a block in an
uninitialized block group, ext4 would incorrectly calculate the number
of free blocks in that block group, and then errorneously report that
the file system was corrupt:

EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd

This problem can be reproduced via:

    mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
    mount -t ext4 /dev/vdd /mnt
    fallocate -l 4600m /mnt/test

The problem was caused by a bone headed mistake in the check to see if a
particular metadata block was part of the block group.

Many thanks to Kees Cook for finding and bisecting the buggy commit
which introduced this bug (commit fd034a84e1, present since v3.2).

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: stable@kernel.org
2012-06-07 18:56:06 -04:00
Konstantin Khlebnikov 40af1bbdca mm: correctly synchronize rss-counters at exit/exec
mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats,
audit and other stuff.  Unfortunately the kernel does this before
calling mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should
be no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>		[3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 14:43:55 -07:00
Trond Myklebust 02c67525cf NFSv4.1: Convert another trivial printk into a dprintk
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:45:53 -04:00