linux/fs
Filipe Manana 44f714dae5 Btrfs: improve performance on fsync against new inode after rename/unlink
With commit 56f23fdbb6 ("Btrfs: fix file/data loss caused by fsync after
rename and new inode") we got simple fix for a functional issue when the
following sequence of actions is done:

  at transaction N
  create file A at directory D
  at transaction N + M (where M >= 1)
  move/rename existing file A from directory D to directory E
  create a new file named A at directory D
  fsync the new file
  power fail

The solution was to simply detect such scenario and fallback to a full
transaction commit when we detect it. However this turned out to had a
significant impact on throughput (and a bit on latency too) for benchmarks
using the dbench tool, which simulates real workloads from smbd (Samba)
servers. For example on a test vm (with a debug kernel):

Unpatched:
Throughput 19.1572 MB/sec  32 clients  32 procs  max_latency=1005.229 ms

Patched:
Throughput 23.7015 MB/sec  32 clients  32 procs  max_latency=809.206 ms

The patched results (this patch is applied) are similar to the results of
a kernel with the commit 56f23fdbb6 ("Btrfs: fix file/data loss caused
by fsync after rename and new inode") reverted.

This change avoids the fallback to a transaction commit and instead makes
sure all the names of the conflicting inode (the one that had a name in a
past transaction that matches the name of the new file in the same parent
directory) are logged so that at log replay time we don't lose neither the
new file nor the old file, and the old file gets the name it was renamed
to.

This also ends up avoiding a full transaction commit for a similar case
that involves an unlink instead of a rename of the old file:

  at transaction N
  create file A at directory D
  at transaction N + M (where M >= 1)
  remove file A
  create a new file named A at directory D
  fsync the new file
  power fail

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-08-01 07:32:14 +01:00
..
9p 9p: use file_dentry() 2016-06-30 23:28:09 -04:00
adfs
affs affs: fix remount failure when there are no options changed 2016-05-28 16:50:24 -07:00
afs remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
autofs4 autofs: don't get stuck in a loop if vfs_write() returns an error 2016-06-24 17:23:52 -07:00
befs fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL 2016-05-23 17:04:14 -07:00
bfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
btrfs Btrfs: improve performance on fsync against new inode after rename/unlink 2016-08-01 07:32:14 +01:00
cachefiles FS-Cache: make check_consistency callback return int 2016-06-01 10:29:39 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-07-01 15:20:11 -07:00
cifs File names with trailing period or space need special case conversion 2016-06-24 12:05:52 -05:00
coda
configfs configfs_readdir(): make safe under shared lock 2016-05-09 11:41:13 -04:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto
debugfs debugfs: open_proxy_open(): avoid double fops release 2016-06-15 04:56:35 -07:00
devpts devpts: Make each mount of devpts an independent filesystem. 2016-06-05 10:36:01 -07:00
dlm
ecryptfs Merge branch 'stacking-fixes' (vfs stacking fixes from Jann) 2016-06-10 12:10:02 -07:00
efivarfs fs/efivarfs/inode.c: use generic UUID library 2016-05-20 17:58:30 -07:00
efs fs/efs/super.c: fix return value 2016-05-20 17:58:30 -07:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-05-17 17:05:30 -07:00
exportfs
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
f2fs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
fat Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
freevxfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
fscache FS-Cache: wake write waiter after invalidating writes 2016-06-01 10:29:09 +02:00
fuse fuse: serialize dirops by default 2016-06-30 13:10:49 +02:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
hfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
hfsplus switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
hostfs hostfs: switch to ->iterate_shared() 2016-05-12 19:49:30 -04:00
hpfs hpfs: implement the show_options method 2016-05-28 16:50:24 -07:00
hugetlbfs
isofs Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
jbd2 jbd2: get rid of superfluous __GFP_REPEAT 2016-06-24 17:23:52 -07:00
jffs2 switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
jfs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
kernfs switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
lockd lockd: unregister notifier blocks if the service fails to come up completely 2016-06-30 16:35:07 -04:00
logfs logfs: no need to lock directory in lseek 2016-05-09 11:42:19 -04:00
minix
ncpfs
nfs NFS: Fix another OPEN_DOWNGRADE bug 2016-06-28 16:55:34 -04:00
nfs_common
nfsd nfsd: check permissions when setting ACLs 2016-06-24 12:11:52 -04:00
nilfs2 fs/nilfs2: fix potential underflow in call to crc32_le 2016-06-24 17:23:52 -07:00
nls
notify fsnotify: avoid spurious EMFILE errors from inotify_init() 2016-05-19 19:12:14 -07:00
ntfs
ocfs2 ocfs2: disable BUG assertions in reading blocks 2016-06-24 17:23:52 -07:00
omfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
openpromfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
orangefs switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
overlayfs ovl: warn instead of error if d_type is not supported 2016-07-03 09:39:31 +02:00
proc Merge branch 'stacking-fixes' (vfs stacking fixes from Jann) 2016-06-10 12:10:02 -07:00
pstore
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota
ramfs tmpfs/ramfs: fix VM_MAYSHARE mappings for NOMMU 2016-05-20 17:58:30 -07:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2016-06-19 07:05:14 -10:00
romfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
squashfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
sysfs
sysv
tracefs
ubifs UBIFS: Implement ->migratepage() 2016-06-23 00:29:53 +02:00
udf udf: Use correct partition reference number for metadata 2016-05-19 13:00:35 +02:00
ufs
xfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-27 17:14:05 -07:00
aio.c aio: make aio_setup_ring killable 2016-05-23 17:04:14 -07:00
anon_inodes.c
attr.c
bad_inode.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf_fdpic.c coredump: fix dumping through pipes 2016-06-07 22:07:09 -04:00
binfmt_elf.c coredump: fix dumping through pipes 2016-06-07 22:07:09 -04:00
binfmt_em86.c
binfmt_flat.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
binfmt_misc.c
binfmt_script.c
block_dev.c DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
buffer.c mm, page_alloc: avoid looking up the first zone in a zonelist twice 2016-05-19 19:12:14 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
coredump.c coredump: fix dumping through pipes 2016-06-07 22:07:09 -04:00
dax.c dax: fix offset overflow in dax_io 2016-06-27 12:18:44 -07:00
dcache.c fix idiotic braino in d_alloc_parallel() 2016-06-20 10:07:42 -04:00
dcookies.c
direct-io.c direct-io: fix direct write stale data exposure from concurrent buffered read 2016-05-27 14:49:37 -07:00
drop_caches.c
eventfd.c
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c exec: make exec path waiting for mmap_sem killable 2016-05-23 17:04:14 -07:00
fcntl.c
fhandle.c
file_table.c
file.c
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c mm,writeback: don't use memory reserves for wb_start_writeback 2016-05-20 17:58:30 -07:00
inode.c
internal.h much milder d_walk() race 2016-06-10 11:32:47 -04:00
ioctl.c
Kconfig dax: Make huge page handling depend of CONFIG_BROKEN 2016-05-19 15:13:17 -06:00
Kconfig.binfmt ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
libfs.c lockless next_positive() 2016-06-20 17:11:29 -04:00
locks.c locks: use file_inode() 2016-07-01 10:24:18 -04:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-06-07 20:41:36 -07:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-07-01 15:20:11 -07:00
no-block.c
nsfs.c
open.c Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 14:41:03 -07:00
pipe.c
pnode.c
pnode.h
posix_acl.c posix_acl: Add set_posix_acl 2016-06-24 12:11:34 -04:00
proc_namespace.c
read_write.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:46:23 -07:00
readdir.c restore killability of old mutex_lock_killable(&inode->i_mutex) users 2016-05-26 00:13:25 -04:00
select.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
seq_file.c
signalfd.c
splice.c Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-20 17:58:30 -07:00
utimes.c
xattr.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00