linux

History

Dave Chinner 80168676eb xfs: force background CIL push under sustained load I have been seeing occasional pauses in transaction throughput up to 30s long under heavy parallel workloads. The only notable thing was that the xfsaild was trying to be active during the pauses, but making no progress. It was running exactly 20 times a second (on the 50ms no-progress backoff), and the number of pushbuf events was constant across this time as well. IOWs, the xfsaild appeared to be stuck on buffers that it could not push out. Further investigation indicated that it was trying to push out inode buffers that were pinned and/or locked. The xfsbufd was also getting woken at the same frequency (by the xfsaild, no doubt) to push out delayed write buffers. The xfsbufd was not making any progress because all the buffers in the delwri queue were pinned. This scan- and-make-no-progress dance went one in the trace for some seconds, before the xfssyncd came along an issued a log force, and then things started going again. However, I noticed something strange about the log force - there were way too many IO's issued. 516 log buffers were written, to be exact. That added up to 129MB of log IO, which got me very interested because it's almost exactly 25% of the size of the log. He delayed logging code is suppose to aggregate the minimum of 25% of the log or 8MB worth of changes before flushing. That's what really puzzled me - why did a log force write 129MB instead of only 8MB? Essentially what has happened is that no CIL pushes had occurred since the previous tail push which cleared out 25% of the log space. That caused all the new transactions to block because there wasn't log space for them, but they kick the xfsaild to push the tail. However, the xfsaild was not making progress because there were buffers it could not lock and flush, and the xfsbufd could not flush them because they were pinned. As a result, both the xfsaild and the xfsbufd could not move the tail of the log forward without the CIL first committing. The cause of the problem was that the background CIL push, which should happen when 8MB of aggregated changes have been committed, is being held off by the concurrent transaction commit load. The background push does a down_write_trylock() which will fail if there is a concurrent transaction commit holding the push lock in read mode. With 8 CPUs all doing transactions as fast as they can, there was enough concurrent transaction commits to hold off the background push until tail-pushing could no longer free log space, and the halt would occur. It should be noted that there is no reason why it would halt at 25% of log space used by a single CIL checkpoint. This bug could definitely violate the "no transaction should be larger than half the log" requirement and hence result in corruption if the system crashed under heavy load. This sort of bug is exactly the reason why delayed logging was tagged as experimental.... The fix is to start blocking background pushes once the threshold has been exceeded. Rework the threshold calculations to keep the amount of log space a CIL checkpoint can use to below that of the AIL push threshold to avoid the problem completely. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>		2010-09-29 07:51:03 -05:00
..
linux-2.6	xfs: log IO completion workqueue is a high priority queue	2010-09-10 10:16:54 -05:00
quota	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6	2010-08-07 12:57:07 -07:00
support	xfs: drop dmapi hooks	2010-07-26 13:16:33 -05:00
Kconfig	…
Makefile	xfs: simplify log item descriptor tracking	2010-07-26 13:16:34 -05:00
xfs.h	…
xfs_acl.h	xfs: constify xattr_handler	2010-05-21 18:31:19 -04:00
xfs_ag.h	xfs: fix access to upper inodes without inode64	2010-05-28 15:19:56 -05:00
xfs_alloc.c	xfs: fix gcc 4.6 set but not read and unused statement warnings	2010-07-26 13:16:51 -05:00
xfs_alloc.h	xfs: do not use emums for flags used in tracing	2010-07-26 13:16:43 -05:00
xfs_alloc_btree.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_alloc_btree.h	…
xfs_arch.h	…
xfs_attr.c	xfs: remove unused delta tracking code in xfs_bmapi	2010-07-26 13:16:39 -05:00
xfs_attr.h	xfs: convert attr to use unsigned names	2010-01-20 10:47:48 +11:00
xfs_attr_leaf.c	xfs: remove unused delta tracking code in xfs_bmapi	2010-07-26 13:16:39 -05:00
xfs_attr_leaf.h	…
xfs_attr_sf.h	xfs: convert attr to use unsigned names	2010-01-20 10:47:48 +11:00
xfs_bit.c	…
xfs_bit.h	…
xfs_bmap.c	xfs: Make fiemap work with sparse files	2010-09-03 09:02:11 -05:00
xfs_bmap.h	xfs: remove unused delta tracking code in xfs_bmapi	2010-07-26 13:16:39 -05:00
xfs_bmap_btree.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_bmap_btree.h	xfs: make several more functions static	2010-01-15 15:31:38 -06:00
xfs_btree.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_btree.h	…
xfs_btree_trace.c	…
xfs_btree_trace.h	…
xfs_buf_item.c	xfs: kill the b_strat callback in xfs_buf	2010-07-26 13:16:52 -05:00
xfs_buf_item.h	xfs: give li_cb callbacks the correct prototype	2010-07-26 13:16:35 -05:00
xfs_da_btree.c	xfs: fix gcc 4.6 set but not read and unused statement warnings	2010-07-26 13:16:51 -05:00
xfs_da_btree.h	xfs: convert dirnameops to unsigned char names	2010-01-20 10:47:17 +11:00
xfs_dfrag.c	xfs: simplify inode to transaction joining	2010-07-26 13:16:36 -05:00
xfs_dfrag.h	xfs: clean up inconsistent variable naming in xfs_swap_extent	2010-01-15 15:31:23 -06:00
xfs_dinode.h	…
xfs_dir2.c	xfs: split xfs_itrace_entry	2010-07-26 13:16:44 -05:00
xfs_dir2.h	xfs: make xfs_dir_cilookup_result use unsigned char	2010-01-20 10:47:25 +11:00
xfs_dir2_block.c	xfs: fix gcc 4.6 set but not read and unused statement warnings	2010-07-26 13:16:51 -05:00
xfs_dir2_block.h	…
xfs_dir2_data.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_dir2_data.h	…
xfs_dir2_leaf.c	xfs: remove unused delta tracking code in xfs_bmapi	2010-07-26 13:16:39 -05:00
xfs_dir2_leaf.h	…
xfs_dir2_node.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_dir2_node.h	xfs: make several more functions static	2010-01-15 15:31:38 -06:00
xfs_dir2_sf.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_dir2_sf.h	…
xfs_error.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_error.h	xfs: add const qualifiers to xfs error function args	2010-05-19 09:58:11 -05:00
xfs_extfree_item.c	xfs: fix the xfs_log_iovec i_addr type	2010-07-26 13:16:36 -05:00
xfs_extfree_item.h	…
xfs_filestream.c	xfs: clean up filestreams helpers	2010-07-26 13:16:51 -05:00
xfs_filestream.h	xfs: clean up filestreams helpers	2010-07-26 13:16:51 -05:00
xfs_fs.h	xfs: Make fiemap work with sparse files	2010-09-03 09:02:11 -05:00
xfs_fsops.c	xfs: dummy transactions should not dirty VFS state	2010-08-24 11:46:31 +10:00
xfs_fsops.h	xfs: dummy transactions should not dirty VFS state	2010-08-24 11:46:31 +10:00
xfs_ialloc.c	xfs: fix untrusted inode number lookup	2010-08-24 11:42:30 +10:00
xfs_ialloc.h	…
xfs_ialloc_btree.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_ialloc_btree.h	…
xfs_iget.c	xfs: fix gcc 4.6 set but not read and unused statement warnings	2010-07-26 13:16:51 -05:00
xfs_inode.c	xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE	2010-08-24 11:42:41 +10:00
xfs_inode.h	xfs: simplify and remove xfs_ireclaim	2010-07-26 13:16:48 -05:00
xfs_inode_item.c	xfs: fix big endian build	2010-07-26 16:07:38 -05:00
xfs_inode_item.h	xfs: simplify inode to transaction joining	2010-07-26 13:16:36 -05:00
xfs_inum.h	…
xfs_iomap.c	xfs: small cleanups for xfs_iomap / __xfs_get_blocks	2010-07-26 13:16:42 -05:00
xfs_iomap.h	xfs: do not use emums for flags used in tracing	2010-07-26 13:16:43 -05:00
xfs_itable.c	xfs: remove xfs_iput	2010-07-26 13:16:44 -05:00
xfs_itable.h	xfs: remove block number from inode lookup code	2010-06-24 11:35:17 +10:00
xfs_log.c	xfs: Reduce log force overhead for delayed logging	2010-08-24 11:40:03 +10:00
xfs_log.h	xfs: remove the unused XFS_LOG_SLEEP and XFS_LOG_NOSLEEP flags	2010-07-26 13:16:38 -05:00
xfs_log_cil.c	xfs: force background CIL push under sustained load	2010-09-29 07:51:03 -05:00
xfs_log_priv.h	xfs: force background CIL push under sustained load	2010-09-29 07:51:03 -05:00
xfs_log_recover.c	xfs: fix the xfs_log_iovec i_addr type	2010-07-26 13:16:36 -05:00
xfs_log_recover.h	xfs: Clean up XFS_BLI_* flag namespace	2010-05-24 10:33:39 -05:00
xfs_mount.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_mount.h	xfs: remove obsolete osyncisosync mount option	2010-07-26 13:16:51 -05:00
xfs_mru_cache.c	xfs: Kill filestreams cache flush	2010-01-15 15:34:22 -06:00
xfs_mru_cache.h	xfs: Kill filestreams cache flush	2010-01-15 15:34:22 -06:00
xfs_quota.h	xfs: removed unused XFS_QMOPT_ flags	2010-05-19 09:58:15 -05:00
xfs_refcache.h	…
xfs_rename.c	xfs: split xfs_itrace_entry	2010-07-26 13:16:44 -05:00
xfs_rtalloc.c	xfs: remove unused delta tracking code in xfs_bmapi	2010-07-26 13:16:39 -05:00
xfs_rtalloc.h	xfs: be more explicit if RT mount fails due to config	2010-05-28 14:58:24 -05:00
xfs_rw.c	xfs: remove unneeded #include statements	2010-07-26 13:16:33 -05:00
xfs_rw.h	xfs: only clear the suid bit once in xfs_write	2010-02-12 13:43:57 -06:00
xfs_sb.h	…
xfs_trans.c	xfs: unlock items before allowing the CIL to commit	2010-08-24 11:42:52 +10:00
xfs_trans.h	xfs: remove the unused XFS_TRANS_NOSLEEP/XFS_TRANS_WAIT flags	2010-07-26 13:16:38 -05:00
xfs_trans_ail.c	xfs: drop dmapi hooks	2010-07-26 13:16:33 -05:00
xfs_trans_buf.c	xfs: give li_cb callbacks the correct prototype	2010-07-26 13:16:35 -05:00
xfs_trans_extfree.c	xfs: simplify log item descriptor tracking	2010-07-26 13:16:34 -05:00
xfs_trans_inode.c	xfs: simplify inode to transaction joining	2010-07-26 13:16:36 -05:00
xfs_trans_priv.h	xfs: unlock items before allowing the CIL to commit	2010-08-24 11:42:52 +10:00
xfs_trans_space.h	…
xfs_types.h	xfs: make the log ticket ID available outside the log infrastructure	2010-05-24 10:33:52 -05:00
xfs_utils.c	xfs: simplify xfs_truncate_file	2010-07-26 13:16:52 -05:00
xfs_utils.h	xfs: simplify xfs_truncate_file	2010-07-26 13:16:52 -05:00
xfs_vnodeops.c	xfs: prevent 32bit overflow in space reservation	2010-09-03 12:19:33 +10:00
xfs_vnodeops.h	xfs: kill xfs_lrw.h	2010-03-01 16:35:44 -06:00