Commit Graph

25544 Commits

Author SHA1 Message Date
Linus Torvalds
22b4eb5e31 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: cleanup xfs_file_aio_write
  xfs: always return with the iolock held from xfs_file_aio_write_checks
  xfs: remove the i_new_size field in struct xfs_inode
  xfs: remove the i_size field in struct xfs_inode
  xfs: replace i_pin_wait with a bit waitqueue
  xfs: replace i_flock with a sleeping bitlock
  xfs: make i_flags an unsigned long
  xfs: remove the if_ext_max field in struct xfs_ifork
  xfs: remove the unused dm_attrs structure
  xfs: cleanup xfs_iomap_eof_align_last_fsb
  xfs: remove xfs_itruncate_data
2012-01-17 15:54:56 -08:00
Linus Torvalds
d65773b22b Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: take allocation of ->tree_root into open_ctree()
  btrfs: let ->s_fs_info point to fs_info, not root...
  btrfs: consolidate failure exits in btrfs_mount() a bit
  btrfs: make free_fs_info() call ->kill_sb() unconditional
  btrfs: merge free_fs_info() calls on fill_super failures
  btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
  btrfs: make open_ctree() return int
  btrfs: sanitizing ->fs_info, part 5
  btrfs: sanitizing ->fs_info, part 4
  btrfs: sanitizing ->fs_info, part 3
  btrfs: sanitizing ->fs_info, part 2
  btrfs: sanitizing ->fs_info, part 1
  btrfs: fix a deadlock in btrfs_scan_one_device()
  btrfs: fix mount/umount race
  btrfs: get ->kill_sb() of its own
  btrfs: preparation to fixing mount/umount race
2012-01-17 15:52:51 -08:00
Linus Torvalds
f9156c7288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
  Btrfs: use larger system chunks
  Btrfs: add a delalloc mutex to inodes for delalloc reservations
  Btrfs: space leak tracepoints
  Btrfs: protect orphan block rsv with spin_lock
  Btrfs: add allocator tracepoints
  Btrfs: don't call btrfs_throttle in file write
  Btrfs: release space on error in page_mkwrite
  Btrfs: fix btrfsck error 400 when truncating a compressed
  Btrfs: do not use btrfs_end_transaction_throttle everywhere
  Btrfs: add balance progress reporting
  Btrfs: allow for resuming restriper after it was paused
  Btrfs: allow for canceling restriper
  Btrfs: allow for pausing restriper
  Btrfs: add skip_balance mount option
  Btrfs: recover balance on mount
  Btrfs: save balance parameters to disk
  Btrfs: soft profile changing mode (aka soft convert)
  Btrfs: implement online profile changing
  Btrfs: do not reduce profile in do_chunk_alloc()
  Btrfs: virtual address space subset filter
  ...

Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.
2012-01-17 15:49:54 -08:00
Linus Torvalds
e268337dfe proc: clean up and fix /proc/<pid>/mem handling
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.

Reported-by: Jüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-17 15:21:19 -08:00
Christoph Hellwig
d060646436 xfs: cleanup xfs_file_aio_write
With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:12:33 -06:00
Christoph Hellwig
5bf1f26227 xfs: always return with the iolock held from xfs_file_aio_write_checks
While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:11:07 -06:00
Christoph Hellwig
2813d682e8 xfs: remove the i_new_size field in struct xfs_inode
Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is always uptodate when
writeback could see a page.  Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk.  I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added.  I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:10:19 -06:00
Christoph Hellwig
ce7ae151dd xfs: remove the i_size field in struct xfs_inode
There is no fundamental need to keep an in-memory inode size copy in the XFS
inode.  We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular files.  Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is no point in doing so because the VFS inode
is no longer in use at this point.  Just relax the assert in xfs_ifree to
only check the on-disk size instead.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:08:53 -06:00
Christoph Hellwig
f392e6319a xfs: replace i_pin_wait with a bit waitqueue
Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.

Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:07:54 -06:00
Christoph Hellwig
474fce0675 xfs: replace i_flock with a sleeping bitlock
We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:06:45 -06:00
Christoph Hellwig
49e4c70e52 xfs: make i_flags an unsigned long
To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems.  Because of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:03:50 -06:00
Christoph Hellwig
8096b1ebb5 xfs: remove the if_ext_max field in struct xfs_ifork
We spent a lot of effort to maintain this field, but it always equals to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:02:28 -06:00
Linus Torvalds
a12587b003 NFS client bugfixes and cleanups for Linux 3.3 (pull 2)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPFJ45AAoJEGcL54qWCgDyy9QP/2LmBpYyu3L6vJKRDPxHO9/A
 geLuJsc+m+NU7V7k2H2PWj5TfLPyY33+FEdR0+iqmYVdCK+cb/fvXjeRmhwaFE78
 uoOs9x4LYtMUN+OCOdPJaNc8vosBxHfD9GE3Qb6KFKT4/2pjRu7kjbZvYEyufDhO
 FrPkSiJv5VUgV+xWsgp2C0FSbBFUQ4iGZ3nPeqJCLAT+1QEaJd3PIaJdUepGCi/X
 wakUGAWGKA/sWDcJgBbovVddB8j6FINmTrFme3ZSYqkOOF9mdmzxa/tvhFMjr/17
 sE2L3BHqQioRNXfjY4Ex3kGdVbuIVIIOSNAYeUysuunTSGsub0Ou+UJpj/CtNhur
 auZhjeZQrQUEd1j4Zb4knE3PyjF/GT1RV1CstmdDqty9+BPrzZdx8Rfj61IlLWEz
 b4DYT2HYxXifH5ZbDwA/uyApNKg95kjkp/OLGxwF640gIKXoeKo3xfJO+syQ5wKH
 AdBuTV0vPvkMwdY0H64IHghs47y7A3M/ZbK1jyAkaTzkq15GPqp49ojwFLc3UESb
 JiV/66hmmIJCAqEMu8OncHNtzsrZ2ZO5fqPSjhcrc4CEVCeqNy8WwP8she6BeR2U
 EwaWIqBs1zt5A6dZE8yxdu2Mvai3oRo/ioa6bPwxCAuy7o6/p2x/39TLe27fL18z
 1Qss51gQfFWe8lM8Rhj+
 =beF6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

NFS client bugfixes and cleanups for Linux 3.3 (pull 2)

* tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pnfsblock: alloc short extent before submit bio
  pnfsblock: remove rpc_call_ops from struct parallel_io
  pnfsblock: move find lock page logic out of bl_write_pagelist
  pnfsblock: cleanup bl_mark_sectors_init
  pnfsblock: limit bio page count
  pnfsblock: don't spinlock when freeing block_dev
  pnfsblock: clean up _add_entry
  pnfsblock: set read/write tk_status to pnfs_error
  pnfsblock: acquire im_lock in _preload_range
  NFS4: fix compile warnings in nfs4proc.c
  nfs: check for integer overflow in decode_devicenotify_args()
  NFS: cleanup endian type in decode_ds_addr()
  NFS: add an endian notation
2012-01-16 15:08:13 -08:00
Chris Mason
96bdc7dc61 Btrfs: use larger system chunks
system chunks by default are very small.  This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:38:24 -05:00
Josef Bacik
f248679e86 Btrfs: add a delalloc mutex to inodes for delalloc reservations
I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead.  This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16 15:29:43 -05:00
Josef Bacik
8c2a3ca20f Btrfs: space leak tracepoints
This in addition to a script in my btrfs-tracing tree will help track down space
leaks when we're getting space left over in block groups on umount.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16 15:29:43 -05:00
Josef Bacik
90290e1982 Btrfs: protect orphan block rsv with spin_lock
We've been seeing warnings coming out of the orphan commit stuff forever from
ceph.  Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock.  So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime.  Then clear
the root's orphan block rsv and release the lock.  With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16 15:29:42 -05:00
Josef Bacik
3f7de037fb Btrfs: add allocator tracepoints
I used these tracepoints when figuring out what the cluster stuff was doing, so
add them to mainline in case we need to profile this stuff again.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16 15:29:42 -05:00
Josef Bacik
45a8090e62 Btrfs: don't call btrfs_throttle in file write
Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:28:55 -05:00
Josef Bacik
ec39e180fd Btrfs: release space on error in page_mkwrite
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:28:54 -05:00
Miao Xie
f70a9a6b94 Btrfs: fix btrfsck error 400 when truncating a compressed
Reproduce steps:
 # mkfs.btrfs /dev/sdb5
 # mount /dev/sdb5 -o compress=lzo /mnt
 # dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1
 # sync
 # truncate -s 64K /mnt/tmpfile
 root 5 inode 257 errors 400

This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:28:54 -05:00
Josef Bacik
7ad85bb76a Btrfs: do not use btrfs_end_transaction_throttle everywhere
A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount.  This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns.  This adds a ridiculous amount of
latency and isn't really needed.  The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return.  This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:28:54 -05:00
Chris Mason
c126dea771 Merge branch 'integrity-check-patch-v2' of git://btrfs.giantdisaster.de/git/btrfs into integration
Conflicts:
	fs/btrfs/ctree.h
	fs/btrfs/super.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:27:58 -05:00
Chris Mason
9785dbdf26 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration 2012-01-16 15:26:31 -05:00
Chris Mason
d756bd2d93 Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration
Conflicts:
	fs/btrfs/volumes.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16 15:26:17 -05:00
Chris Mason
27263e2832 Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into integration 2012-01-16 15:26:02 -05:00
Chris Mason
64e05503ab Merge branch 'allocation-fixes' into integration 2012-01-16 15:25:42 -05:00
Ilya Dryomov
19a39dce3b Btrfs: add balance progress reporting
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:49 +02:00
Ilya Dryomov
de322263d3 Btrfs: allow for resuming restriper after it was paused
Recognize BTRFS_BALANCE_RESUME flag passed from userspace.  We use the
same heuristics used when recovering balance after a crash to try to
start where we left off last time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:49 +02:00
Ilya Dryomov
a7e99c691a Btrfs: allow for canceling restriper
Implement an ioctl for canceling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:49 +02:00
Ilya Dryomov
837d5b6e46 Btrfs: allow for pausing restriper
Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount.  (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:49 +02:00
Ilya Dryomov
9555c6c180 Btrfs: add skip_balance mount option
Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it.  The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
596410151e Btrfs: recover balance on mount
On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted.  For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
0940ebf6b9 Btrfs: save balance parameters to disk
Introduce a new btree objectid for storing balance item.  The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

	[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
cfa4c961cc Btrfs: soft profile changing mode (aka soft convert)
When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to.  This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type.  This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
e4d8ec0f65 Btrfs: implement online profile changing
Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized.  Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce.  If target profile is not yet available it goes
back to a plain reduce.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
70922617b0 Btrfs: do not reduce profile in do_chunk_alloc()
Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time.  Instead check the
validity of the passed profile.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
ea67176ae8 Btrfs: virtual address space subset filter
Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
94e60d5a5c Btrfs: devid subset filter
Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:48 +02:00
Ilya Dryomov
409d404b46 Btrfs: devid filter
Relocate chunks which have at least one stripe located on a device with
devid X.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
5ce5b3c091 Btrfs: usage filter
Select chunks that are less than X percent full.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
ed25e9b26f Btrfs: profiles filter
Select chunks based on a given profile mask.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
f43ffb60fd Btrfs: add basic infrastructure for selective balancing
This allows to have a separate set of filters for each chunk type
(data,meta,sys).  The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
c9e9f97bdf Btrfs: add basic restriper infrastructure
Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
10ea00f55a Btrfs: make avail_*_alloc_bits fields dynamic
Currently when new chunks are created respective avail_alloc_bits field
is updated to reflect profiles of all chunks present in the system.
However when chunks are removed profile bits are never cleared.

This patch clears profile bit of respective avail_alloc_bits field when
the last chunk with that profile is removed.  Restriper needs this to
properly operate when "downgrading".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
a46d11a8b0 Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit
Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS.  When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field.  Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble.  Restriper
needs that information, so add a separate bit for SINGLE profile.

This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change.  However to avoid remappings
in future, reserve corresponding on-disk bit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
52ba692972 Btrfs: introduce masks for chunk type and profile
Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Ilya Dryomov
6fef8df1dc Btrfs: get rid of *_alloc_profile fields
{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16 22:04:47 +02:00
Linus Torvalds
122804ecb5 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (655 commits)
  [media] revert patch: HDIC HD29L2 DMB-TH USB2.0 reference design driver
  mb86a20s: Add a few more register settings at the init seq
  mb86a20s: Group registers into the same line
  [media] [PATCH] don't reset the delivery system on DTV_CLEAR
  [media] [BUG] it913x-fe fix typo error making SNR levels unstable
  [media] cx23885: Query the CX25840 during enum_input for status
  [media] cx25840: Add support for g_input_status
  [media] rc-videomate-m1f.c Rename to match remote controler name
  [media] drivers: media: au0828: Fix dependency for VIDEO_AU0828
  [media] convert drivers/media/* to use module_platform_driver()
  [media] drivers: video: cx231xx: Fix dependency for VIDEO_CX231XX_DVB
  [media] Exynos4 JPEG codec v4l2 driver
  [media] doc: v4l: selection: choose pixels as units for selection rectangles
  [media] v4l: s5p-tv: mixer: fix setup of VP scaling
  [media] v4l: s5p-tv: mixer: add support for selection API
  [media] v4l: emulate old crop API using extended crop/compose API
  [media] doc: v4l: add documentation for selection API
  [media] doc: v4l: add binary images for selection API
  [media] v4l: add support for selection api
  [media] hd29l2: fix review findings
  ...
2012-01-15 12:49:56 -08:00
Linus Torvalds
b3c9dd182e Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block
* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
  Revert "block: recursive merge requests"
  block: Stop using macro stubs for the bio data integrity calls
  blockdev: convert some macros to static inlines
  fs: remove unneeded plug in mpage_readpages()
  block: Add BLKROTATIONAL ioctl
  block: Introduce blk_set_stacking_limits function
  block: remove WARN_ON_ONCE() in exit_io_context()
  block: an exiting task should be allowed to create io_context
  block: ioc_cgroup_changed() needs to be exported
  block: recursive merge requests
  block, cfq: fix empty queue crash caused by request merge
  block, cfq: move icq creation and rq->elv.icq association to block core
  block, cfq: restructure io_cq creation path for io_context interface cleanup
  block, cfq: move io_cq exit/release to blk-ioc.c
  block, cfq: move icq cache management to block core
  block, cfq: move io_cq lookup to blk-ioc.c
  block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
  block, cfq: reorganize cfq_io_context into generic and cfq specific parts
  block: remove elevator_queue->ops
  block: reorder elevator switch sequence
  ...

Fix up conflicts in:
 - block/blk-cgroup.c
	Switch from can_attach_task to can_attach
 - block/cfq-iosched.c
	conflict with now removed cic index changes (we now use q->id instead)
2012-01-15 12:24:45 -08:00