linux/fs/btrfs
Filipe Manana e2127cf008 Btrfs: make defrag not fragment files when using prealloc extents
When using prealloc extents, a file defragment operation may actually
fragment the file and increase the amount of data space used by the file.
This change fixes that behaviour.

Example:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt
$ cd /mnt
$ xfs_io -f -c 'falloc 0 1048576' foobar && sync
$ xfs_io -c 'pwrite -S 0xff -b 100000 5000 100000' foobar
$ xfs_io -c 'pwrite -S 0xac -b 100000 200000 100000' foobar
$ xfs_io -c 'pwrite -S 0xe1 -b 100000 900000 100000' foobar && sync

Before defragmenting the file:

$ btrfs filesystem df /mnt
Data, single: total=8.00MiB, used=1.25MiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00

$ btrfs-debug-tree /dev/sdb3
(...)
	item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
		prealloc data disk byte 12845056 nr 1048576
		prealloc data offset 0 nr 4096
	item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 4096 nr 102400 ram 1048576
		extent compression 0
	item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
		prealloc data disk byte 12845056 nr 1048576
		prealloc data offset 106496 nr 90112
	item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 196608 nr 106496 ram 1048576
		extent compression 0
	item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
		prealloc data disk byte 12845056 nr 1048576
		prealloc data offset 303104 nr 593920
	item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 897024 nr 106496 ram 1048576
		extent compression 0
	item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
		prealloc data disk byte 12845056 nr 1048576
		prealloc data offset 1003520 nr 45056
(...)

Now defragmenting the file results in more data space used than before:

$ btrfs filesystem defragment -f foobar && sync
$ btrfs filesystem df /mnt
Data, single: total=8.00MiB, used=1.55MiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00

And the corresponding file extent items are now no longer perfectly sequential
as before, and we're now needlessly using more space from data block groups:

$ btrfs-debug-tree /dev/sdb3
(...)
	item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 0 nr 4096 ram 1048576
		extent compression 0
	item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
		extent data disk byte 13893632 nr 102400
		extent data offset 0 nr 102400 ram 102400
		extent compression 0
	item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 106496 nr 90112 ram 1048576
		extent compression 0
	item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
		extent data disk byte 13996032 nr 106496
		extent data offset 0 nr 106496 ram 106496
		extent compression 0
	item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
		prealloc data disk byte 12845056 nr 1048576
		prealloc data offset 303104 nr 593920
	item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
		extent data disk byte 14102528 nr 106496
		extent data offset 0 nr 106496 ram 106496
		extent compression 0
	item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
		extent data disk byte 12845056 nr 1048576
		extent data offset 1003520 nr 45056 ram 1048576
		extent compression 0
(...)

With this change, the above example will no longer cause allocation of new data
space nor change the sequentiality of the file extents, that is, defragment will
be effectless, leaving all extent items pointing to the extent starting at disk
byte 12845056.

In a 20Gb filesystem I had, mounted with the autodefrag option and 20 files of
400Mb each, initially consisting of a single prealloc extent of 400Mb, having
random writes happening at a low rate, lead to a total of over ~17Gb of data
space used, not far from eventually reaching an ENOSPC state.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
2014-03-10 15:17:17 -04:00
..
tests Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
acl.c btrfs: remove dead code 2014-01-28 13:19:50 -08:00
async-thread.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
async-thread.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
backref.c Btrfs: skip locking when searching commit root 2014-03-10 15:16:55 -04:00
backref.h Btrfs: allocate prelim_ref with a slab allocater 2013-09-01 08:16:27 -04:00
btrfs_inode.h Btrfs: use signed integer instead of unsigned long integer for log transid 2014-03-10 15:16:42 -04:00
check-integrity.c Btrfs: use btrfs_crc32c everywhere instead of libcrc32c 2014-02-03 09:01:27 -08:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c Btrfs: fix data corruption when reading/updating compressed extents 2014-02-08 17:57:15 -08:00
compression.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ctree.c Btrfs: correctly determine if blocks are shared in btrfs_compare_trees 2014-03-10 15:16:50 -04:00
ctree.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
delayed-inode.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c Btrfs: cleanup delayed-ref.c:find_ref_head() 2014-03-10 15:16:46 -04:00
delayed-ref.h Btrfs: attach delayed ref updates to delayed ref heads 2014-01-28 13:20:25 -08:00
dev-replace.c Btrfs: fix use-after-free in the finishing procedure of the device replace 2014-03-10 15:15:39 -04:00
dev-replace.h
dir-item.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
disk-io.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
disk-io.h Btrfs: add a sanity test for btrfs_split_item 2013-11-11 21:51:02 -05:00
export.c btrfs: remove fs/btrfs/compat.h 2013-11-11 22:03:19 -05:00
export.h
extent_io.c Btrfs: more efficient split extent state insertion 2014-03-10 15:16:57 -04:00
extent_io.h Btrfs: move the extent buffer radix tree into the fs_info 2014-01-28 13:19:55 -08:00
extent_map.c Btrfs: more efficient btrfs_drop_extent_cache 2014-03-10 15:16:57 -04:00
extent_map.h Btrfs: more efficient btrfs_drop_extent_cache 2014-03-10 15:16:57 -04:00
extent-tree.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
file-item.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
file.c Btrfs: fix preallocate vs double nocow write 2014-03-10 15:17:00 -04:00
free-space-cache.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
free-space-cache.h Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
hash.c Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c btrfs: cleanup: removed unused 'btrfs_get_inode_ref_index' 2014-01-28 13:19:39 -08:00
inode-map.c btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) 2013-11-11 22:11:53 -05:00
inode-map.h
inode.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
ioctl.c Btrfs: make defrag not fragment files when using prealloc extents 2014-03-10 15:17:17 -04:00
Kconfig Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
locking.c btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
locking.h
lzo.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
Makefile Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
math.h
ordered-data.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
ordered-data.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
orphan.c btrfs: expand btrfs_find_item() to include find_orphan_item functionality 2014-01-28 13:19:37 -08:00
print-tree.c Btrfs: don't use ram_bytes for uncompressed inline items 2014-01-29 07:06:29 -08:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
props.c Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
raid56.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
raid56.h
rcu-string.h
reada.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
relocation.c Btrfs: fix an oops when we fail to relocate tree blocks 2014-01-28 13:20:14 -08:00
root-tree.c btrfs: Use PTR_ERR_OR_ZERO 2014-03-10 15:16:51 -04:00
scrub.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
send.c Btrfs: skip search tree for REG files 2014-03-10 15:17:01 -04:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c
super.c btrfs: Cleanup the old btrfs_worker. 2014-03-10 15:17:15 -04:00
sysfs.c btrfs: add simple debugfs interface 2014-03-10 15:15:51 -04:00
sysfs.h btrfs: add simple debugfs interface 2014-03-10 15:15:51 -04:00
transaction.c Btrfs: cancel scrub on transaction abortion 2014-03-10 15:16:54 -04:00
transaction.h Btrfs: make fsync latency less sucky 2014-01-28 13:20:25 -08:00
tree-defrag.c Btrfs: cleanup dead code of defragment 2013-11-11 21:59:45 -05:00
tree-log.c Btrfs: stop joining the log transaction if sync log fails 2014-03-10 15:16:44 -04:00
tree-log.h Btrfs: just wait or commit our own log sub-transaction 2014-03-10 15:16:43 -04:00
ulist.c Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
ulist.h Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
uuid-tree.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00
volumes.c btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
volumes.h btrfs: Cleanup the "_struct" suffix in btrfs_workequeue 2014-03-10 15:17:16 -04:00
xattr.c Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
xattr.h
zlib.c Btrfs: convert printk to btrfs_ and fix BTRFS prefix 2014-01-28 13:20:05 -08:00