linux/fs
Dave Chinner 23647e9878 xfs: log vector rounding leaks log space
commit 110dc24ad2 upstream.

The addition of direct formatting of log items into the CIL
linear buffer added alignment restrictions that the start of each
vector needed to be 64 bit aligned. Hence padding was added in
xlog_finish_iovec() to round up the vector length to ensure the next
vector started with the correct alignment.

This adds a small number of bytes to the size of
the linear buffer that is otherwise unused. The issue is that we
then use the linear buffer size to determine the log space used by
the log item, and this includes the unused space. Hence when we
account for space used by the log item, it's more than is actually
written into the iclogs, and hence we slowly leak this space.

This results on log hangs when reserving space, with threads getting
stuck with these stack traces:

Call Trace:
[<ffffffff81d15989>] schedule+0x29/0x70
[<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
[<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
[<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
[<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
.....

The 4 bytes is significant. Brain Foster did all the hard work in
tracking down a reproducable leak to inode chunk allocation (it went
away with the ikeep mount option). His rough numbers were that
creating 50,000 inodes leaked 11 log blocks. This turns out to be
roughly 800 inode chunks or 1600 inode cluster buffers. That
works out at roughly 4 bytes per cluster buffer logged, and at that
I started looking for a 4 byte leak in the buffer logging code.

What I found was that a struct xfs_buf_log_format structure for an
inode cluster buffer is 28 bytes in length. This gets rounded up to
32 bytes, but the vector length remains 28 bytes. Hence the CIL
ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
for that vector rather than 28 bytes which are written into the log.

The fix for this problem is to separately track the bytes used by
the log vectors in the item and use that instead of the buffer
length when accounting for the log space that will be used by the
formatted log item.

Again, thanks to Brian Foster for doing all the hard work and long
hours to isolate this leak and make finding the bug relatively
simple.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Bill <billstuff2001@sbcglobal.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14 09:38:26 +08:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-01-28 08:38:04 -08:00
adfs adfs: delayed freeing of sbi 2013-10-24 23:43:27 -04:00
affs fs/affs/super.c: bugfix / double free 2014-06-07 10:28:16 -07:00
afs afs: proc cells and rootcell are writeable 2014-02-01 10:59:39 -08:00
autofs4 autofs: fix lockref lookup 2014-06-07 10:28:20 -07:00
befs befs: iget_locked() doesn't return an ERR_PTR 2014-01-25 03:14:38 -05:00
bfs
btrfs btrfs: allocate raid type kobjects dynamically 2014-06-30 20:12:02 -07:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-11-13 15:34:18 +09:00
ceph ceph: clear directory's completeness when creating file 2014-06-07 10:28:20 -07:00
cifs CIFS: fix mount failure with broken pathnames when smb3 mount with mapchars option 2014-07-09 11:18:27 -07:00
coda coda_revalidate_inode(): switch to passing inode... 2013-11-09 00:16:21 -05:00
configfs configfs: fix race between dentry put and lookup 2013-11-21 16:42:27 -08:00
cramfs cramfs: take headers to fs/cramfs 2014-01-25 03:13:02 -05:00
debugfs debugfs: use list_next_entry() in debugfs_remove_recursive() 2013-11-13 12:09:24 +09:00
devpts devpts: plug the memory leak in kill_sb 2013-11-13 12:09:36 +09:00
dlm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
ecryptfs ecryptfs: fix failure handling in ->readlink() 2014-01-25 03:13:00 -05:00
efivarfs consolidate simple ->d_delete() instances 2013-11-15 22:04:17 -05:00
efs efs: get rid of ->put_super() 2014-01-25 03:13:02 -05:00
exofs exofs: Print less in r4w 2014-01-23 18:54:14 +02:00
exportfs exportfs: fix quadratic behavior in filehandle lookup 2013-11-09 00:16:38 -05:00
ext2 ext2/3/4: use generic posix ACL infrastructure 2014-01-25 23:58:19 -05:00
ext3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-01-28 08:38:04 -08:00
ext4 ext4: fix a potential deadlock in __ext4_es_shrink() 2014-07-17 16:21:05 -07:00
f2fs Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
fat fat: rcu-delay unloading nls and freeing sbi 2013-10-24 23:43:28 -04:00
freevxfs
fscache FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree 2014-02-17 13:47:35 -08:00
fuse fuse: ignore entry-timeout on LOOKUP_REVAL 2014-07-28 08:05:57 -07:00
gfs2 Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
hfs fs/hfs/btree.h: remove duplicate defines 2013-11-13 12:09:32 +09:00
hfsplus hfsplus: add HFSX subfolder count support 2014-03-10 17:26:21 -07:00
hostfs um: hostfs: make functions static 2014-01-26 11:51:09 +01:00
hpfs hpfs: optimize quad buffer loading 2014-02-02 16:24:07 -08:00
hppfs
hugetlbfs
isofs isofs: don't pass dentry to isofs_hash{i,}_common() 2013-10-24 23:34:59 -04:00
jbd jbd: Revise KERN_EMERG error messages 2013-12-04 12:27:46 +01:00
jbd2 ext4: disable synchronous transaction batching if max_batch_time==0 2014-07-17 16:21:05 -07:00
jffs2 jffs2: remove from wait queue after schedule() 2014-04-26 17:19:06 -07:00
jfs jfs: set i_ctime when setting ACL 2014-02-13 15:56:05 -06:00
kernfs kernfs: add back missing error check in kernfs_fop_mmap() 2014-06-07 10:28:08 -07:00
lockd lockd: ensure we tear down any live sockets when socket creation fails during lockd_up 2014-05-13 13:32:56 +02:00
logfs Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
minix
ncpfs ncpfs: rcu-delay unload_nls() and freeing ncp_server 2013-10-24 23:43:28 -04:00
nfs nfs: only show Posix ACLs in listxattr if actually present 2014-07-31 12:52:54 -07:00
nfs_common
nfsd nfsd: fix rare symlink decoding bug 2014-07-09 11:18:27 -07:00
nilfs2 Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
nls nls: have register_nls() set ->owner 2014-01-25 03:14:05 -05:00
notify fsnotify: Allocate overflow events with proper type 2014-02-25 11:18:06 +01:00
ntfs fix O_SYNC|O_APPEND syncing the wrong range on write() 2014-02-09 15:18:09 -05:00
ocfs2 ocfs2: fix panic on kfree(xattr->name) 2014-05-06 07:59:36 -07:00
omfs
openpromfs
proc mm: add !pte_present() check on existing hugetlb_entry callbacks 2014-06-11 11:54:13 -07:00
pstore pstore: Don't allow high traffic options on fragile devices 2013-12-20 13:12:01 -08:00
qnx4 qnx4: clean qnx4_fill_super() up 2014-01-25 03:13:03 -05:00
qnx6
quota quota: missing lock in dqcache_shrink_scan() 2014-07-28 08:05:58 -07:00
ramfs fs/ramfs: move ramfs_aops to inode.c 2014-01-23 16:36:58 -08:00
reiserfs reiserfs: call truncate_setsize under tailpack mutex 2014-07-06 18:57:29 -07:00
romfs romfs: fix returm err while getting inode in fill_super 2014-01-23 16:37:04 -08:00
squashfs Squashfs: fix failure to unlock pages on decompress error 2013-11-24 01:02:50 +00:00
sysfs sysfs: make sure read buffer is zeroed 2014-06-07 10:28:24 -07:00
sysv
ubifs UBIFS: Remove incorrect assertion in shrink_tnc() 2014-07-06 18:57:26 -07:00
udf udf: Fix data corruption on file type conversion 2014-02-20 21:56:00 +01:00
ufs
xfs xfs: log vector rounding leaks log space 2014-08-14 09:38:26 +08:00
Kconfig fs: remove generic_acl 2014-01-26 08:26:40 -05:00
Kconfig.binfmt
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-01-28 08:38:04 -08:00
aio.c aio: protect reqs_available updates from changes in interrupt handlers 2014-07-28 08:06:03 -07:00
anon_inodes.c vfs: Allocate anon_inode_inode in anon_inode_init() 2014-03-27 09:52:54 -07:00
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:40:32 -07:00
bad_inode.c
binfmt_aout.c dump_skip(): dump_seek() replacement taking coredump_params 2013-11-09 00:16:26 -05:00
binfmt_elf.c fs: binfmt_elf: remove unused defines INTERPRETER_NONE and INTERPRETER_ELF 2014-01-23 16:36:58 -08:00
binfmt_elf_fdpic.c elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo) 2013-11-09 00:16:30 -05:00
binfmt_em86.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world 2014-02-21 15:56:36 -08:00
bio.c block: Fix cloning of discard/write same bios 2014-02-11 08:40:45 -07:00
block_dev.c
buffer.c mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq 2014-02-06 13:48:51 -08:00
char_dev.c Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block 2013-11-14 12:08:14 +09:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c fs/compat_ioctl.c: fix an underflow issue (harmless) 2014-01-21 16:19:42 -08:00
coredump.c coredump: fix the setting of PF_DUMPCORE 2014-07-31 12:52:56 -07:00
dcache.c fix races between __d_instantiate() and checks of dentry flags 2014-06-07 10:28:20 -07:00
dcookies.c fs/compat: fix lookup_dcookie() parameter handling 2014-01-29 16:22:40 -08:00
direct-io.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
drop_caches.c
eventfd.c eventfd_ctx_fdget(): use fdget() instead of fget() 2014-01-25 03:13:04 -05:00
eventpoll.c epoll: fix use-after-free in eventpoll_release_file 2014-06-30 20:12:02 -07:00
exec.c metag: Reduce maximum stack size to 256MB 2014-06-07 10:28:23 -07:00
fcntl.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
fhandle.c
file.c vfs: Don't let __fdget_pos() get FMODE_PATH files 2014-03-23 00:03:12 -04:00
file_table.c don't bother with {get,put}_write_access() on non-regular files 2014-05-31 13:20:29 -07:00
filesystems.c
fs-writeback.c bdi: avoid oops on device removal 2014-04-26 17:19:05 -07:00
fs_struct.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
inode.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:40:32 -07:00
internal.h get rid of s_files and files_lock 2013-11-09 00:16:20 -05:00
ioctl.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
ioprio.c
libfs.c consolidate simple ->d_delete() instances 2013-11-15 22:04:17 -05:00
locks.c locks: allow __break_lease to sleep even when break_time is 0 2014-05-13 13:32:53 +02:00
mbcache.c
mount.h switch mnt_hash to hlist 2014-03-30 19:18:51 -04:00
mpage.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
namei.c fs: umount on symlink leaks mnt count 2014-07-31 12:52:56 -07:00
namespace.c smarter propagate_mnt() 2014-05-06 07:59:36 -07:00
no-block.c
open.c don't bother with {get,put}_write_access() on non-regular files 2014-05-31 13:20:29 -07:00
pipe.c fs/pipe.c: skip file_update_time on frozen fs 2014-01-23 16:37:00 -08:00
pnode.c smarter propagate_mnt() 2014-05-06 07:59:36 -07:00
pnode.h smarter propagate_mnt() 2014-05-06 07:59:36 -07:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-06-07 10:28:16 -07:00
proc_namespace.c fs/proc_namespace.c: simplify testing nsp and nsp->mnt_ns 2014-01-23 16:37:02 -08:00
read_write.c vfs: atomic f_pos access in llseek() 2014-03-23 00:03:12 -04:00
readdir.c file->f_op is never NULL... 2013-10-24 23:34:54 -04:00
select.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-11-13 15:34:18 +09:00
seq_file.c seq_file: always clear m->count when we free m->buf 2013-11-18 19:07:53 -08:00
signalfd.c
splice.c fuse: fix pipe_buf_operations 2014-01-22 19:36:57 +01:00
stack.c
stat.c vfs: split out vfs_getattr_nosec 2013-11-09 00:16:31 -05:00
statfs.c
super.c fs: Don't return 0 from get_anon_bdev 2014-05-31 13:20:31 -07:00
sync.c Revert "writeback: do not sync data dirtied after sync start" 2014-02-22 02:02:28 +01:00
timerfd.c
utimes.c locks: break delegations on any attribute modification 2013-11-09 00:16:44 -05:00
xattr.c