Commit Graph

3878 Commits

Author SHA1 Message Date
Liu Bo a848c23fb1 Btrfs: fix crash on endio of reading corrupted block
commit 38c1c2e44b upstream.

The crash is

------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2124!
[...]
Workqueue: btrfs-endio normal_work_helper [btrfs]
RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]

This is in fact a regression.

It is because we forgot to increase @offset properly in reading corrupted block,
so that the @offset remains, and this leads to checksum errors while reading
left blocks queued up in the same bio, and then ends up with hiting the above
BUG_ON.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05 16:34:16 -07:00
Liu Bo bd7c57d3ca Btrfs: fix compressed write corruption on enospc
commit ce62003f69 upstream.

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05 16:34:16 -07:00
Filipe Manana 8a6de3b6c9 Btrfs: read lock extent buffer while walking backrefs
commit 6f7ff6d783 upstream.

Before processing the extent buffer, acquire a read lock on it, so
that we're safe against concurrent updates on the extent buffer.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05 16:34:16 -07:00
Filipe Manana dcc48de73f Btrfs: fix csum tree corruption, duplicate and outdated checksums
commit 27b9a8122f upstream.

Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path->slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path->slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

    slot = 0                           slot = btrfs_header_nritems() - 1
  |-------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |-------------------------------------------------------------------|

  Leaf N + 1:

      slot = 0                          slot = btrfs_header_nritems() - 1
  |--------------------------------------------------------------------|
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  |--------------------------------------------------------------------|

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

    slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
  |----------------------------------------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
  |----------------------------------------------------------------------------------------------------|

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the "insert:" label, which will set tmp to:

    tmp = min((sums->len - total_bytes) >> blocksize_bits,
        (next_offset - file_key.offset) >> blocksize_bits) =
    min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
    min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05 16:34:16 -07:00
Takashi Iwai 1b37263cbe Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch
commit 4eb1f66dce upstream.

We've got bug reports that btrfs crashes when quota is enabled on
32bit kernel, typically with the Oops like below:
 BUG: unable to handle kernel NULL pointer dereference at 00000004
 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
 *pde = 00000000
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
 task: f1478130 ti: f147c000 task.ti: f147c000
 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
 EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
 EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
 Stack:
  00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
  00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
  00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
 Call Trace:
  [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
  [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
  [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
  [<c025e38b>] process_one_work+0x11b/0x390
  [<c025eea1>] worker_thread+0x101/0x340
  [<c026432b>] kthread+0x9b/0xb0
  [<c0712a71>] ret_from_kernel_thread+0x21/0x30
  [<c0264290>] kthread_create_on_node+0x110/0x110

This indicates a NULL corruption in prefs_delayed list.  The further
investigation and bisection pointed that the call of ulist_add_merge()
results in the corruption.

ulist_add_merge() takes u64 as aux and writes a 64bit value into
old_aux.  The callers of this function in backref.c, however, pass a
pointer of a pointer to old_aux.  That is, the function overwrites
64bit value on 32bit pointer.  This caused a NULL in the adjacent
variable, in this case, prefs_delayed.

Here is a quick attempt to band-aid over this: a new function,
ulist_add_merge_ptr() is introduced to pass/store properly a pointer
value instead of u64.  There are still ugly void ** cast remaining
in the callers because void ** cannot be taken implicitly.  But, it's
safer than explicit cast to u64, anyway.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05 16:34:16 -07:00
Jeff Mahoney 9e22eb8c72 btrfs: allocate raid type kobjects dynamically
commit c1895442be upstream.

We are currently allocating space_info objects in an array when we
allocate space_info. When a user does something like:

# btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt
# btrfs balance start -mconvert=single -dconvert=single /mnt -f
# btrfs balance start -mconvert=raid1 -dconvert=raid1 /

We can end up with memory corruption since the kobject hasn't
been reinitialized properly and the name pointer was left set.

The rationale behind allocating them statically was to avoid
creating a separate kobject container that just contained the
raid type. It used the index in the array to determine the index.

Ultimately, though, this wastes more memory than it saves in all
but the most complex scenarios and introduces kobject lifetime
questions.

This patch allocates the kobjects dynamically instead. Note that
we also remove the kobject_get/put of the parent kobject since
kobject_add and kobject_del do that internally.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Jeff Mahoney f59f14e214 btrfs: fix lockdep warning with reclaim lock inversion
commit ed55b6ac07 upstream.

When encountering memory pressure, testers have run into the following
lockdep warning. It was caused by __link_block_group calling kobject_add
with the groups_sem held. kobject_add calls kvasprintf with GFP_KERNEL,
which gets us into reclaim context. The kobject doesn't actually need
to be added under the lock -- it just needs to ensure that it's only
added for the first block group to be linked.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.14.0-rc8-default #1 Not tainted
---------------------------------------------------------
kswapd0/169 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa018baea>] __btrfs_release_delayed_node+0x3a/0x200 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 (&found->groups_sem){+++++.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&found->groups_sem);
                               local_irq_disable();
                               lock(&delayed_node->mutex);
                               lock(&found->groups_sem);
  <Interrupt>
    lock(&delayed_node->mutex);

 *** DEADLOCK ***
2 locks held by kswapd0/169:
 #0:  (shrinker_rwsem){++++..}, at: [<ffffffff81159e8a>] shrink_slab+0x3a/0x160
 #1:  (&type->s_umount_key#27){++++..}, at: [<ffffffff811bac6f>] grab_super_passive+0x3f/0x90

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Eric Sandeen fe79ac25b6 btrfs: fix use of uninit "ret" in end_extent_writepage()
commit 3e2426bd0e upstream.

If this condition in end_extent_writepage() is false:

	if (tree->ops && tree->ops->writepage_end_io_hook)

we will then test an uninitialized "ret" at:

	ret = ret < 0 ? ret : -EIO;

The test for ret is for the case where ->writepage_end_io_hook
failed, and we'd choose that ret as the error; but if
there is no ->writepage_end_io_hook, nothing sets ret.

Initializing ret to 0 should be sufficient; if
writepage_end_io_hook wasn't set, (!uptodate) means
non-zero err was passed in, so we choose -EIO in that case.

Signed-of-by: Eric Sandeen <sandeen@redhat.com>

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Liu Bo 75f9347450 Btrfs: fix scrub_print_warning to handle skinny metadata extents
commit 6eda71d0c0 upstream.

The skinny extents are intepreted incorrectly in scrub_print_warning(),
and end up hitting the BUG() in btrfs_extent_inline_ref_size.

Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Liu Bo 4f850fdfbb Btrfs: use right type to get real comparison
commit cd857dd6bc upstream.

We want to make sure the point is still within the extent item, not to verify
the memory it's pointing to.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Josef Bacik 6858c106c6 Btrfs: don't check nodes for extent items
commit 8a56457f5f upstream.

The backref code was looking at nodes as well as leaves when we tried to
populate extent item entries.  This is not good, and although we go away with it
for the most part because we'd skip where disk_bytenr != random_memory,
sometimes random_memory would match and suddenly boom.  This fixes that problem.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:02 -07:00
Rickard Strandqvist 3379d704fb fs: btrfs: volumes.c: Fix for possible null pointer dereference
commit 8321cf2596 upstream.

There is otherwise a risk of a possible null pointer dereference.

Was largely found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Filipe Manana a6729cd101 Btrfs: send, don't error in the presence of subvols/snapshots
commit 1af56070e3 upstream.

If we are doing an incremental send and the base snapshot has a
directory with name X that doesn't exist anymore in the second
snapshot and a new subvolume/snapshot exists in the second snapshot
that has the same name as the directory (name X), the incremental
send would fail with -ENOENT error. This is because it attempts
to lookup for an inode with a number matching the objectid of a
root, which doesn't exist.

Steps to reproduce:

    mkfs.btrfs -f /dev/sdd
    mount /dev/sdd /mnt

    mkdir /mnt/testdir
    btrfs subvolume snapshot -r /mnt /mnt/mysnap1

    rmdir /mnt/testdir
    btrfs subvolume create /mnt/testdir
    btrfs subvolume snapshot -r /mnt /mnt/mysnap2

    btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data

A test case for xfstests follows.

Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Wang Shilong 0da608d5f9 Btrfs: set right total device count for seeding support
commit 298658414a upstream.

Seeding device support allows us to create a new filesystem
based on existed filesystem.

However newly created filesystem's @total_devices should include seed
devices. This patch fix the following problem:

 # mkfs.btrfs -f /dev/sdb
 # btrfstune -S 1 /dev/sdb
 # mount /dev/sdb /mnt
 # btrfs device add -f /dev/sdc /mnt --->fs_devices->total_devices = 1
 # umount /mnt
 # mount /dev/sdc /mnt               --->fs_devices->total_devices = 2

This is because we record right @total_devices in superblock, but
@fs_devices->total_devices is reset to be 0 in btrfs_prepare_sprout().

Fix this problem by not resetting @fs_devices->total_devices.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Liu Bo 9ec61f8c4a Btrfs: mark mapping with error flag to report errors to userspace
commit 5dca6eea91 upstream.

According to commit 865ffef379
(fs: fix fsync() error reporting),
it's not stable to just check error pages because pages can be
truncated or invalidated, we should also mark mapping with error
flag so that a later fsync can catch the error.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Liu Bo ba9ea14660 Btrfs: fix NULL pointer crash of deleting a seed device
commit 29cc83f69c upstream.

Same as normal devices, seed devices should be initialized with
fs_info->dev_root as well, otherwise we'll get a NULL pointer crash.

Cc: Chris Murphy <lists@colorremedies.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Wang Shilong a9142eca7b Btrfs: make sure there are not any read requests before stopping workers
commit de348ee022 upstream.

In close_ctree(), after we have stopped all workers,there maybe still
some read requests(for example readahead) to submit and this *maybe* trigger
an oops that user reported before:

kernel BUG at fs/btrfs/async-thread.c:619!

By hacking codes, i can reproduce this problem with one cpu available.
We fix this potential problem by invalidating all btree inode pages before
stopping all workers.

Thanks to Miao for pointing out this problem.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Miao Xie 0ebcffd714 Btrfs: output warning instead of error when loading free space cache failed
commit 32d6b47fe6 upstream.

If we fail to load a free space cache, we can rebuild it from the extent tree,
so it is not a serious error, we should not output a error message that
would make the users uncomfortable. This patch uses warning message instead
of it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Qu Wenruo b36fe043d1 btrfs: Add ctime/mtime update for btrfs device add/remove.
commit 5a1972bd9f upstream.

Btrfs will send uevent to udev inform the device change,
but ctime/mtime for the block device inode is not udpated, which cause
libblkid used by btrfs-progs unable to detect device change and use old
cache, causing 'btrfs dev scan; btrfs dev rmove; btrfs dev scan' give an
error message.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Chris Mason 9ed3912c68 Btrfs: fix double free in find_lock_delalloc_range
commit 7d78874273 upstream.

We need to NULL the cached_state after freeing it, otherwise
we might free it again if find_delalloc_range doesn't find anything.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:12:01 -07:00
Josef Bacik 2891d0f102 Btrfs: check for an extent_op on the locked ref
commit 573a075567 upstream.

We could have possibly added an extent_op to the locked_ref while we dropped
locked_ref->lock, so check for this case as well and loop around.  Otherwise we
could lose flag updates which would lead to extent tree corruption.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:19:05 -07:00
Josef Bacik 50b51ee311 Btrfs: fix deadlock with nested trans handles
commit 3bbb24b20a upstream.

Zach found this deadlock that would happen like this

btrfs_end_transaction <- reduce trans->use_count to 0
  btrfs_run_delayed_refs
    btrfs_cow_block
      find_free_extent
	btrfs_start_transaction <- increase trans->use_count to 1
          allocate chunk
	btrfs_end_transaction <- decrease trans->use_count to 0
	  btrfs_run_delayed_refs
	    lock tree block we are cowing above ^^

We need to only decrease trans->use_count if it is above 1, otherwise leave it
alone.  This will make nested trans be the only ones who decrease their added
ref, and will let us get rid of the trans->use_count++ hack if we have to commit
the transaction.  Thanks,

Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Zach Brown <zab@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:19:05 -07:00
Hidetoshi Seto 53dcb28868 Btrfs: skip submitting barrier for missing device
commit f88ba6a2a4 upstream.

I got an error on v3.13:
 BTRFS error (device sdf1) in write_all_supers:3378: errno=-5 IO failure (errors while submitting device barriers.)

how to reproduce:
  > mkfs.btrfs -f -d raid1 /dev/sdf1 /dev/sdf2
  > wipefs -a /dev/sdf2
  > mount -o degraded /dev/sdf1 /mnt
  > btrfs balance start -f -sconvert=single -mconvert=single -dconvert=single /mnt

The reason of the error is that barrier_all_devices() failed to submit
barrier to the missing device.  However it is clear that we cannot do
anything on missing device, and also it is not necessary to care chunks
on the missing device.

This patch stops sending/waiting barrier if device is missing.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:19:05 -07:00
Linus Torvalds 3962dfbe22 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have a small collection of fixes in my for-linus branch.

  The big thing that stands out is a revert of a new ioctl.  Users
  haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
  to export the information"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use right clone root offset for compressed extents
  btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
  Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
  Btrfs: fix max_inline mount option
  Btrfs: fix a lockdep warning when cleaning up aborted transaction
  Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-16 11:05:27 -08:00
Filipe David Borba Manana 93de4ba864 Btrfs: use right clone root offset for compressed extents
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following excerpt from the test I made for
xfstests:

  _scratch_mkfs
  _scratch_mount "-o compress-force=lzo"

  $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

  $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
  $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
  $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo

  # will be used for incremental send to be able to issue clone operations
  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

  $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
  $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
      -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
  $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
      -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2

  $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
  $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
  $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
      -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap

  _scratch_unmount
  _scratch_mkfs
  _scratch_mount

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
  $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
  $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
  $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-15 08:04:27 -08:00
Anand Jain f085381e6d btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
bdev is null when disk has disappeared and mounted with
the degrade option

stack trace
---------
btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
open_ctree+0x15f3/0x1fe0 [btrfs]
btrfs_mount+0x5db/0x790 [btrfs]
? alloc_pages_current+0xa4/0x160
mount_fs+0x34/0x1b0
vfs_kern_mount+0x62/0xf0
do_mount+0x22e/0xa80
? __get_free_pages+0x9/0x40
? copy_mount_options+0x31/0x170
SyS_mount+0x7e/0xc0
system_call_fastpath+0x16/0x1b
---------

reproducer:
-------
mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
(detach a disk)
devmgt detach /dev/sdc [1]
mount -o degrade /dev/sdd /btrfs
-------

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Tested-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-15 08:03:09 -08:00
Josef Bacik 3a0dfa6a12 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
A user was running into errors from an NFS export of a subvolume that had a
default subvol set.  When we mount a default subvol we will use d_obtain_alias()
to find an existing dentry for the subvolume in the case that the root subvol
has already been mounted, or a dummy one is allocated in the case that the root
subvol has not already been mounted.  This allows us to connect the dentry later
on if we wander into the path.  However if we don't ever wander into the path we
will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It doesn't
appear to cause any problems but it is annoying nonetheless, so simply unset
DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
use d_materialise_unique() instead which will make everything play nicely
together and reconnect stuff if we wander into the defaul subvol path from a
different way.  With this patch I'm no longer getting the NFS errors when
exporting a volume that has been mounted with a default subvol set.  Thanks,

cc: bfields@fieldses.org
cc: ebiederm@xmission.com
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Mitch Harder feb5f96589 Btrfs: fix max_inline mount option
Currently, the only mount option for max_inline that has any effect is
max_inline=0.  Any other value that is supplied to max_inline will be
adjusted to a minimum of 4k.  Since max_inline has an effective maximum
of ~3900 bytes due to page size limitations, the current behaviour
only has meaning for max_inline=0.

This patch will allow the the max_inline mount option to accept non-zero
values as indicated in the documentation.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Liu Bo a9d2d4adb6 Btrfs: fix a lockdep warning when cleaning up aborted transaction
Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,

[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882]  lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G        W  O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872]  [<ffffffff81680fe7>] dump_stack+0x54/0x74
[ 4723.422753]  [<ffffffff81681093>] spin_dump+0x8c/0x91
[ 4723.424979]  [<ffffffff816810b9>] spin_bug+0x21/0x26
[ 4723.425846]  [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
[ 4723.434424]  [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
[ 4723.438747]  [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321]  [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692]  [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336]  [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332]  [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543]  [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833]  [<ffffffff81079efa>] kthread+0xea/0xf0
[ 4723.458990]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.460133]  [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[ 4723.460865]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------

----------------------------------------------------------------------

The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.

So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.

Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Chris Mason 11bcac89c0 Revert "btrfs: add ioctl to export size of global metadata reservation"
This reverts commit 01e219e806.

David Sterba found a different way to provide these features without adding a new
ioctl.  We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
CC: Jeff Mahoney <jeffm@suse.com>
2014-02-14 13:42:13 -08:00
Linus Torvalds 9c1db77981 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small collection of fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix data corruption when reading/updating compressed extents
  Btrfs: don't loop forever if we can't run because of the tree mod log
  btrfs: reserve no transaction units in btrfs_ioctl_set_features
  btrfs: commit transaction after setting label and features
  Btrfs: fix assert screwup for the pending move stuff
2014-02-09 11:12:26 -08:00
Filipe David Borba Manana a2aa75e18a Btrfs: fix data corruption when reading/updating compressed extents
When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.

A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:

   _scratch_mkfs
   _scratch_mount "-o compress-force=lzo"
   $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

This results in the following file items in the fs tree:

   item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
       inode generation 6 transid 6 size 542872 block group 0 mode 100600
   item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
       inode ref index 2 namelen 6 name: foobar
   item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
       extent data disk byte 0 nr 0 gen 6
       extent data offset 0 nr 24576 ram 266240
       extent compression 0
   item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
       prealloc data disk byte 12849152 nr 241664 gen 6
       prealloc data offset 0 nr 241664
   item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
       extent data disk byte 12845056 nr 4096 gen 6
       extent data offset 0 nr 20480 ram 20480
       extent compression 2
   item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
       prealloc data disk byte 13090816 nr 405504 gen 6
       prealloc data offset 0 nr 258048

The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).

The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.

This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.

A test case for xfstests follows soon.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00
Josef Bacik 27a377db74 Btrfs: don't loop forever if we can't run because of the tree mod log
A user reported a 100% cpu hang with my new delayed ref code.  Turns out I
forgot to increase the count check when we can't run a delayed ref because of
the tree mod log.  If we can't run any delayed refs during this there is no
point in continuing to look, and we need to break out.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00
David Sterba 8051aa1a3d btrfs: reserve no transaction units in btrfs_ioctl_set_features
Added in patch "btrfs: add ioctls to query/change feature bits online"
modifications to superblock don't need to reserve metadata blocks when
starting a transaction.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00
Jeff Mahoney d0270aca88 btrfs: commit transaction after setting label and features
The set_fslabel ioctl uses btrfs_end_transaction, which means it's
possible that the change will be lost if the system crashes, same for
the newly set features. Let's use btrfs_commit_transaction instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00
Josef Bacik 6cc98d90f8 Btrfs: fix assert screwup for the pending move stuff
Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
reproduce.  Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
which meant that a key part of Filipe's original patch was not being built in.
This appears to be a mess up with merging Filipe's patch as it does not exist in
his original patch.  Fix this by changing how we make sure del_waiting_dir_move
asserts that it did not error and take the function out of the ifdef check.
This makes btrfs/030 pass with the assert on or off.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00
Linus Torvalds 878a876b2e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Filipe is fixing compile and boot problems with our crc32c rework, and
  Josef has disabled snapshot aware defrag for now.

  As the number of snapshots increases, we're hitting OOM.  For the
  short term we're disabling things until a bigger fix is ready"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use late_initcall instead of module_init
  Btrfs: use btrfs_crc32c everywhere instead of libcrc32c
  Btrfs: disable snapshot aware defrag for now
2014-02-04 12:26:56 -08:00
Filipe David Borba Manana 60efa5eb2e Btrfs: use late_initcall instead of module_init
It seems that when init_btrfs_fs() is called, crc32c/crc32c-intel might
not always be already initialized, which results in the call to crypto_alloc_shash()
returning -ENOENT, as experienced by Ahmet who reported this.

Therefore make sure init_btrfs_fs() is called after crc32c is initialized (which
is at initialization level 6, module_init), by using late_initcall (which is at
initialization level 7) instead of module_init for btrfs.

Reported-and-Tested-by: Ahmet Inan <ainan@mathematik.uni-freiburg.de>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-03 09:01:28 -08:00
Filipe David Borba Manana 0b947aff15 Btrfs: use btrfs_crc32c everywhere instead of libcrc32c
After the commit titled "Btrfs: fix btrfs boot when compiled as built-in",
LIBCRC32C requirement was removed from btrfs' Kconfig. This made it not
possible to build a kernel with btrfs enabled (either as module or built-in)
if libcrc32c is not enabled as well. So just replace all uses of libcrc32c
with the equivalent function in btrfs hash.h - btrfs_crc32c.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-03 09:01:27 -08:00
Josef Bacik 8101c8dbf6 Btrfs: disable snapshot aware defrag for now
It's just broken and it's taking a lot of effort to fix it, so for now just
disable it so people can defrag in peace.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-03 09:01:27 -08:00
Linus Torvalds e7651b819e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a pretty big pull, and most of these changes have been
  floating in btrfs-next for a long time.  Filipe's properties work is a
  cool building block for inheriting attributes like compression down on
  a per inode basis.

  Jeff Mahoney kicked in code to export filesystem info into sysfs.

  Otherwise, lots of performance improvements, cleanups and bug fixes.

  Looks like there are still a few other small pending incrementals, but
  I wanted to get the bulk of this in first"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
  Btrfs: fix spin_unlock in check_ref_cleanup
  Btrfs: setup inode location during btrfs_init_inode_locked
  Btrfs: don't use ram_bytes for uncompressed inline items
  Btrfs: fix btrfs_search_slot_for_read backwards iteration
  Btrfs: do not export ulist functions
  Btrfs: rework ulist with list+rb_tree
  Btrfs: fix memory leaks on walking backrefs failure
  Btrfs: fix send file hole detection leading to data corruption
  Btrfs: add a reschedule point in btrfs_find_all_roots()
  Btrfs: make send's file extent item search more efficient
  Btrfs: fix to catch all errors when resolving indirect ref
  Btrfs: fix protection between walking backrefs and root deletion
  btrfs: fix warning while merging two adjacent extents
  Btrfs: fix infinite path build loops in incremental send
  btrfs: undo sysfs when open_ctree() fails
  Btrfs: fix snprintf usage by send's gen_unique_name
  btrfs: fix defrag 32-bit integer overflow
  btrfs: sysfs: list the NO_HOLES feature
  btrfs: sysfs: don't show reserved incompat feature
  btrfs: call permission checks earlier in ioctls and return EPERM
  ...
2014-01-30 20:08:20 -08:00
Linus Torvalds f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Chris Mason cf93da7bcf Btrfs: fix spin_unlock in check_ref_cleanup
Our goto out should have gone a little farther.

Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:31 -08:00
Chris Mason 90d3e592e9 Btrfs: setup inode location during btrfs_init_inode_locked
We have a race during inode init because the BTRFS_I(inode)->location is setup
after the inode hash table lock is dropped.  btrfs_find_actor uses the location
field, so our search might not find an existing inode in the hash table if we
race with the inode init code.

This commit changes things to setup the location field sooner.  Also the find actor now
uses only the location objectid to match inodes.  For inode hashing, we just
need a unique and stable test, it doesn't have to reflect the inode numbers we
show to userland.

Signed-off-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org
2014-01-29 07:06:30 -08:00
Chris Mason 514ac8ad87 Btrfs: don't use ram_bytes for uncompressed inline items
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size.  The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.

Reported-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org
2014-01-29 07:06:29 -08:00
Filipe David Borba Manana 23c6bf6a91 Btrfs: fix btrfs_search_slot_for_read backwards iteration
If the current path's leaf slot is 0, we do search for the previous
leaf (via btrfs_prev_leaf) and set the new path's leaf slot to a
value corresponding to the number of items - 1 of the former leaf.
Fix this by using the slot set by btrfs_prev_leaf, decrementing it
by 1 if it's equal to the leaf's number of items.

Use of btrfs_search_slot_for_read() for backward iteration is used in
particular by the send feature, which could miss items when the input
leaf has less items than its previous leaf.

This could be reproduced by running btrfs/007 from xfstests in a loop.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:28 -08:00
Wang Shilong 49fc647a2c Btrfs: do not export ulist functions
There are not any users that use ulist except Btrfs,don't
export them.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:27 -08:00
Wang Shilong 4c7a6f74ce Btrfs: rework ulist with list+rb_tree
We are really suffering from now ulist's implementation, some developers
gave their try, and i just gave some of my ideas for things:

 1. use list+rb_tree instead of arrary+rb_tree

 2. add cur_list to iterator rather than ulist structure.

 3. add seqnum into every node when they are added, this is
 used to do selfcheck when iterating node.

I noticed Zach Brown's comments before, long term is to kick off
ulist implementation, however, for now, we need at least avoid
arrary from ulist.

Cc: Liu Bo <bo.li.liu@oracle.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:27 -08:00
Wang Shilong f05c474688 Btrfs: fix memory leaks on walking backrefs failure
When walking backrefs, we may iterate every inode's extent
and add/merge them into ulist, and the caller will free memory
from ulist.

However, if we fail to allocate inode's extents element
memory or ulist_add() fail to allocate memory, we won't
add allocated memory into ulist, and the caller won't
free some allocated memory thus memory leaks happen.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:26 -08:00
Filipe David Borba Manana bf54f412f0 Btrfs: fix send file hole detection leading to data corruption
There was a case where file hole detection was incorrect and it would
cause an incremental send to override a section of a file with zeroes.

This happened in the case where between the last leaf we processed which
contained a file extent item for our current inode and the leaf we're
currently are at (and has a file extent item for our current inode) there
are only leafs containing exclusively file extent items for our current
inode, and none of them was updated since the previous send operation.
The file hole detection code would incorrectly consider the file range
covered by these leafs as a hole.

A test case for xfstests follows soon.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29 07:06:25 -08:00