Pull btrfs fixes from Chris Mason:
"These are scattered fixes and one performance improvement. The
biggest functional change is in how we throttle metadata changes. The
new code bumps our average file creation rate up by ~13% in fs_mark,
and lowers CPU usage.
Stefan bisected out a regression in our allocation code that made
balance loop on extents larger than 256MB."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: improve the delayed inode throttling
Btrfs: fix a mismerge in btrfs_balance()
Btrfs: enforce min_bytes parameter during extent allocation
Btrfs: allow running defrag in parallel to administrative tasks
Btrfs: avoid deadlock on transaction waiting list
Btrfs: do not BUG_ON on aborted situation
Btrfs: do not BUG_ON in prepare_to_reloc
Btrfs: free all recorded tree blocks on error
Btrfs: build up error handling for merge_reloc_roots
Btrfs: check for NULL pointer in updating reloc roots
Btrfs: fix unclosed transaction handler when the async transaction commitment fails
Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails
Btrfs: use set_nlink if our i_nlink is 0
Commit 5ac00add added a testnset mutex and code that disallows
running administrative tasks in parallel. It is prevented that
the device add/delete/balance/replace/resize operations are
started in parallel. By mistake, the defragmentation operation
was included in the check for mutually exclusiveness as well.
This is fixed with this commit.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
If the async transaction commitment failed, we need close the
current transaction handler, or the current transaction will be
blocked to commit because of this orphan handler.
We fix the problem by doing sync transaction commitment, that is
to invoke btrfs_commit_transaction().
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
There are several bugs at error path of create_snapshot() when the
transaction commitment failed.
- access the freed transaction handler. At the end of the
transaction commitment, the transaction handler was freed, so we
should not access it after the transaction commitment.
- we were not aware of the error which happened during the snapshot
creation if we submitted a async transaction commitment.
- pending snapshot access vs pending snapshot free. when something
wrong happened after we submitted a async transaction commitment,
the transaction committer would cleanup the pending snapshots and
free them. But the snapshot creators were not aware of it, they
would access the freed pending snapshots.
This patch fixes the above problems by:
- remove the dangerous code that accessed the freed handler
- assign ->error if the error happens during the snapshot creation
- the transaction committer doesn't free the pending snapshots,
just assigns the error number and evicts them before we unblock
the transaction.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Pull btrfs update from Chris Mason:
"The biggest feature in the pull is the new (and still experimental)
raid56 code that David Woodhouse started long ago. I'm still working
on the parity logging setup that will avoid inconsistent parity after
a crash, so this is only for testing right now. But, I'd really like
to get it out to a broader audience to hammer out any performance
issues or other problems.
scrub does not yet correct errors on raid5/6 either.
Josef has another pass at fsync performance. The big change here is
to combine waiting for metadata with waiting for data, which is a big
latency win. It is also step one toward using atomics from the
hardware during a commit.
Mark Fasheh has a new way to use btrfs send/receive to send only the
metadata changes. SUSE is using this to make snapper more efficient
at finding changes between snapshosts.
Snapshot-aware defrag is also included.
Otherwise we have a large number of fixes and cleanups. Eric Sandeen
wins the award for removing the most lines, and I'm hoping we steal
this idea from XFS over and over again."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
btrfs: fixup/remove module.h usage as required
Btrfs: delete inline extents when we find them during logging
btrfs: try harder to allocate raid56 stripe cache
Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic
Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails
Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree
Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails
Btrfs: fix missing deleted items in btrfs_clean_quota_tree
btrfs: use only inline_pages from extent buffer
Btrfs: fix wrong reserved space when deleting a snapshot/subvolume
Btrfs: fix wrong reserved space in qgroup during snap/subv creation
Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot
btrfs: remove a printk from scan_one_device
Btrfs: fix NULL pointer after aborting a transaction
Btrfs: fix memory leak of log roots
Btrfs: copy everything if we've created an inline extent
btrfs: cleanup for open-coded alignment
Btrfs: do not change inode flags in rename
Btrfs: use reserved space for creating a snapshot
clear chunk_alloc flag on retryable failure
...
When deleting a snapshot/subvolume, we need remove root ref/backref,
dir entries and update the dir inode, so we must reserve free space
for those operations.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
There are two problems in the space reservation of the snapshot/
subvolume creation.
- don't reserve the space for the root item insertion
- the space which is reserved in the qgroup is different with
the free space reservation. we need reserve free space for
7 items, but in qgroup reservation, we need reserve space only
for 3 items.
So we implement new metadata reservation functions for the
snapshot/subvolume creation.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Since we have grabbed the parent inode at the beginning of the
snapshot creation, and both sync and async snapshot creation
release it after the pending snapshots are actually created,
it is safe to access the parent inode directly during the snapshot
creation, we needn't use dget_parent/dput to fix the parent dentry
and get the dir inode.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Right now inode cache inode is treated as the same as space cache
inode, ie. keep inode in memory till putting super.
But this leads to an awkward situation.
If we're going to delete a snapshot/subvolume, btrfs will not
actually delete it and return free space, but will add it to dead
roots list until the last inode on this snap/subvol being destroyed.
Then we'll fetch deleted roots and cleanup them via cleaner thread.
So here is the problem, if we enable inode cache option, each
snap/subvol has a cached inode which is used to store inode allcation
information. And this cache inode will be kept in memory, as the above
said. So with inode cache, snap/subvol can only be added into
dead roots list during freeing roots stage in umount, so that we can
ONLY get space back after another remount(we cleanup dead roots on mount).
But the real thing is we'll no more use the snap/subvol if we mark it
deleted, so we can safely iput its cache inode when we delete snap/subvol.
Another thing is that we need to change the rules of droping inode, we
don't keep snap/subvol's cache inode in memory till end so that we can
add snap/subvol into dead roots list in time.
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
In some cases, we need commit the current transaction, but don't want
to start a new one if there is no running transaction, so we introduce
the function - btrfs_attach_transaction(), which can catch the current
transaction, and return -ENOENT if there is no running transaction.
But no running transaction doesn't mean the current transction completely,
because we removed the running transaction before it completes. In some
cases, it doesn't matter. But in some special cases, such as freeze fs, we
hope the transaction is fully on disk, it will introduce some bugs, for
example, we may feeze the fs and dump the data in the disk, if the transction
doesn't complete, we would dump inconsistent data. So we need fix the above
problem for those cases.
We fixes this problem by introducing a function:
btrfs_attach_transaction_barrier()
if we hope all the transaction is fully on the disk, even they are not
running, we can use this function.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
With this new ioctl(2) BTRFS_IOC_SET_FSLABEL, we can set/change the label of a mounted file system.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Goffredo Baroncelli <kreijack@inwind.it>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Add a new ioctl(2) BTRFS_IOC_GET_FSLABLE, so that we can get the label upon a mounted filesystem.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Goffredo Baroncelli <kreijack@inwind.it>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The defrag operation can take very long, we want to have a way how to
cancel it. The code checks for a pending signal at safe points in the
defrag loops and returns EAGAIN. This means a user can press ^C after
running 'btrfs fi defrag', woks for both defrag modes, files and root.
Returning from the command was instant in my light tests, but may take
longer depending on the aging factor of the filesystem.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The argument "inherit" of btrfs_ioctl_snap_create_transid() was assigned
to NULL during we created the snapshots, so we didn't free it though we
called kfree() in the caller.
But since we are sure the snapshot creation is done after the function -
btrfs_ioctl_snap_create_transid() - completes, it is safe that we don't
assign the pointer "inherit" to NULL, and just free it in the caller of
btrfs_ioctl_snap_create_transid(). In this way, the code can become more
readable.
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The entry point at the defrag ioctl always sets "cache only" to 0;
the codepaths haven't run for a long time as far as I can
tell. Chris says they're dead code, so remove them.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
CAP_DAC_READ_SEARCH overrides read and search permission check on
file and directory. It seems fit for BTRFS_IOC_INO_PATHS.
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This patch fixes the following problem:
- improper return value
- unnecessary read-only check
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Pull btrfs fixes from Chris Mason:
"We've got corner cases for updating i_size that ceph was hitting,
error handling for quotas when we run out of space, a very subtle
snapshot deletion race, a crash while removing devices, and one
deadlock between subvolume creation and the sb_internal code (thanks
lockdep)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: move d_instantiate outside the transaction during mksubvol
Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata
Btrfs: fix possible stale data exposure
Btrfs: fix missing i_size update
Btrfs: fix race between snapshot deletion and getting inode
Btrfs: fix missing release of the space/qgroup reservation in start_transaction()
Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write()
Btrfs: do not merge logged extents if we've removed them from the tree
btrfs: don't try to notify udev about missing devices
Dave Sterba triggered a lockdep complaint about lock ordering
between the sb_internal lock and the cleaner semaphore.
btrfs_lookup_dentry() checks for orphans if we're looking up
the inode for a subvolume, and subvolume creation is triggering
the lookup with a transaction running.
This commit moves the d_instantiate after the transaction closes.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Pull btrfs fixes from Chris Mason:
"It turns out that we had two crc bugs when running fsx-linux in a
loop. Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
all down. Miao also has a new OOM fix in this v2 pull as well.
Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
and resuming a running balance across drives.
Josef's orphan truncate patch fixes an obscure corruption we'd see
during xfstests.
Arne's patches address problems with subvolume quotas. If the user
destroys quota groups incorrectly the FS will refuse to mount.
The rest are smaller fixes and plugs for memory leaks."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
Btrfs: fix repeated delalloc work allocation
Btrfs: fix wrong max device number for single profile
Btrfs: fix missed transaction->aborted check
Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
Btrfs: put csums on the right ordered extent
Btrfs: use right range to find checksum for compressed extents
Btrfs: fix panic when recovering tree log
Btrfs: do not allow logged extents to be merged or removed
Btrfs: fix a regression in balance usage filter
Btrfs: prevent qgroup destroy when there are still relations
Btrfs: ignore orphan qgroup relations
Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
Btrfs: fix unlock order in btrfs_ioctl_rm_dev
Btrfs: fix unlock order in btrfs_ioctl_resize
Btrfs: fix "mutually exclusive op is running" error code
Btrfs: bring back balance pause/resume logic
btrfs: update timestamps on truncate()
btrfs: fix btrfs_cont_expand() freeing IS_ERR em
Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
Btrfs: fix off-by-one in lseek
...
Operation-specific check (whether subvol is readonly or not) should go
after the mutual exclusiveness check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The error code that is returned in response to starting a mutually
exclusive operation when there is one already running got silently
changed from EINVAL to EINPROGRESS by 5ac00add. Returning EINPROGRESS
to, say, add_dev, when rm_dev is running is misleading. Furthermore,
the operation itself may want to use EINPROGRESS for other purposes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge). Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away. Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
We forget to give up the write access after we find some device operation
is going on. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Step to reproduce:
# mkfs.btrfs <disk>
# mount <disk> <mnt>
# btrfs sub create <mnt>/subv0
# btrfs sub snap <mnt> <mnt>/subv0/snap0
# change <mnt>/subv0 from R/W to R/O
# btrfs sub del <mnt>/subv0/snap0
We deleted the snapshot successfully. I think we should not be able to delete
the snapshot since the parent subvolume is R/O.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Pull btrfs update from Chris Mason:
"A big set of fixes and features.
In terms of line count, most of the code comes from Stefan, who added
the ability to replace a single drive in place. This is different
from how btrfs normally replaces drives, and is much much much faster.
Josef is plowing through our synchronous write performance. This pull
request does not include the DIO_OWN_WAITING patch that was discussed
on the list, but it has a number of other improvements to cut down our
latencies and CPU time during fsync/O_DIRECT writes.
Miao Xie has a big series of fixes and is spreading out ordered
operations over more CPUs. This improves performance and reduces
contention.
I've put in fixes for error handling around hash collisions. These
are going back to individual stable kernels as I test against them.
Otherwise we have a lot of fixes and cleanups, thanks everyone!
raid5/6 is being rebased against the device replacement code. I'll
have it posted this Friday along with a nice series of benchmarks."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (115 commits)
Btrfs: fix a bug of per-file nocow
Btrfs: fix hash overflow handling
Btrfs: don't take inode delalloc mutex if we're a free space inode
Btrfs: fix autodefrag and umount lockup
Btrfs: fix permissions of empty files not affected by umask
Btrfs: put raid properties into global table
Btrfs: fix BUG() in scrub when first superblock reading gives EIO
Btrfs: do not call file_update_time in aio_write
Btrfs: only unlock and relock if we have to
Btrfs: use tokens where we can in the tree log
Btrfs: optimize leaf_space_used
Btrfs: don't memset new tokens
Btrfs: only clear dirty on the buffer if it is marked as dirty
Btrfs: move checks in set_page_dirty under DEBUG
Btrfs: log changed inodes based on the extent map tree
Btrfs: add path->really_keep_locks
Btrfs: do not mark ems as prealloc if we are writing to them
Btrfs: keep track of the extents original block length
Btrfs: inline csums if we're fsyncing
Btrfs: don't bother copying if we're only logging the inode
...
Users report a bug, the reproducer is:
$ mkfs.btrfs /dev/loop0
$ mount /dev/loop0 /mnt/btrfs/
$ mkdir /mnt/btrfs/dir
$ chattr +C /mnt/btrfs/dir/
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10;
$ lsattr /mnt/btrfs/dir/foo
---------------C- /mnt/btrfs/dir/foo
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 1 extent found ---> an extent
$ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync
$ filefrag /mnt/btrfs/dir/foo
/mnt/btrfs/dir/foo: 3 extents found ---> with nocow, btrfs breaks the extent into three parts
The new created file should not only inherit the NODATACOW flag, but also
honor NODATASUM flag, because we must do COW on a file extent with checksum.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The handling for directory crc hash overflows was fairly obscure,
split_leaf returns EOVERFLOW when we try to extend the item and that is
supposed to bubble up to userland. For a while it did so, but along the
way we added better handling of errors and forced the FS readonly if we
hit IO errors during the directory insertion.
Along the way, we started testing only for EEXIST and the EOVERFLOW case
was dropped. The end result is that we may force the FS readonly if we
catch a directory hash bucket overflow.
This fixes a few problem spots. First I add tests for EOVERFLOW in the
places where we can safely just return the error up the chain.
btrfs_rename is harder though, because it tries to insert the new
directory item only after it has already unlinked anything the rename
was going to overwrite. Rather than adding very complex logic, I added
a helper to test for the hash overflow case early while it is still safe
to bail out.
Snapshot and subvolume creation had a similar problem, so they are using
the new helper now too.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Pascal Junod <pascal@junod.info>
We need get write access for qgroup operations, or we will modify the R/O fs.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We need get write access for scrub, or we will modify the R/O fs.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Steps to reproduce:
# mkfs.btrfs -d single -m single <disk0> <disk1>
# mount -o ro <disk0> <mnt0>
# mount -o ro <disk0> <mnt1>
# mount -o remount,rw <mnt0>
# umount <mnt0>
# btrfs device delete <disk1> <mnt1>
We can remove a device from a R/O filesystem. The reason is that we just check
the R/O flag of the super block object. It is not enough, because the kernel
may set the R/O flag only for the mount point. We need invoke
mnt_want_write_file()
to do a full check.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Steps to reproduce:
# mkfs.btrfs <partition>
# mount -o ro <partition> <mnt0>
# mount -o ro <partition> <mnt1>
# mount -o remount,rw <mnt0>
# umount <mnt0>
# btrfs fi resize 10g <mnt1>
We re-sized a R/O filesystem. The reason is that we just check the R/O flag
of the super block object. It is not enough, because the kernel may set the
R/O flag only for the mount point. We need invoke mnt_want_write_file() to
do a full check.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
When wen want to set the default subvolume, we must get write access, or
we will change the R/O file system.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If there is no running transaction in the fs, we needn't start a new one when
we want to start sync.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Since we have gotten the root in the caller, just pass it into
btrfs_ioctl_{start, wait}_sync() directly.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is the commit that allows to start the device replace
procedure.
An ioctl() interface is added that supports starting and
canceling the device replace procedure, and to retrieve
the status and progress.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch adds some code to disallow operations on the device that
is used as the target for the device replace operation.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Btrfs admin operations that are manually started from user mode
and that cannot be executed at the same time return -EINPROGRESS.
A common way to enter and leave this locked section is introduced
since it used to be specific to the balance operation.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A small number of functions that are used in a device replace
procedure when the operation is resumed at mount time are unable
to pass the same root pointer that would be used in the regular
(ioctl) context. And since the root pointer is not required, only
the fs_info is, the root pointer argument is replaced with the
fs_info pointer argument.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
When creating a snapshot, failing to commit a transaction can end up
with aborting the transaction, following by doing a cleanup for it, where
we'll free all snapshots pending to disk.
So we check it and avoid double free on pending snapshots.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove an invalid size check up from btrfs_shrink_dev().
The new size should not larger than the device->total_bytes as it was
already verified before coming to here(i.e. new_size < old_size).
Remove invalid check up for btrfs_shrink_dev().
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
There is no reason to pass the nr_pages_dirtied argument, because
nr_pages_dirtied value from the caller is unused in
balance_dirty_pages_ratelimited_nr().
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>