linux/mm
Darrick J. Wong 1d1d1a7672 mm: only enforce stable page writes if the backing device requires it
Create a helper function to check if a backing device requires stable
page writes and, if so, performs the necessary wait.  Then, make it so
that all points in the memory manager that handle making pages writable
use the helper function.  This should provide stable page write support
to most filesystems, while eliminating unnecessary waiting for devices
that don't require the feature.

Before this patchset, all filesystems would block, regardless of whether
or not it was necessary.  ext3 would wait, but still generate occasional
checksum errors.  The network filesystems were left to do their own
thing, so they'd wait too.

After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it.  ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait.  Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all.
The blocking behavior is back to what it was before 3.0 if you don't
have a disk requiring stable page writes.

Here's the result of using dbench to test latency on ext2:

3.8.0-rc3:
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        109347     0.028    59.817
 ReadX         347180     0.004     3.391
 Flush          15514    29.828   287.283

Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms

3.8.0-rc3 + patches:
 WriteX        105556     0.029     4.273
 ReadX         335004     0.005     4.112
 Flush          14982    30.540   298.634

Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms

As you can see, the maximum write latency drops considerably with this
patch enabled.  The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21 17:22:19 -08:00
..
Kconfig memory-hotplug: document and enable CONFIG_MOVABLE_NODE 2012-12-18 15:02:12 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
Makefile mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
backing-dev.c bdi: allow block devices to say that they require stable page writes 2013-02-21 17:22:19 -08:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment 2013-01-11 14:54:55 -08:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: compaction: partially revert capture of suitable high-order page 2013-01-11 14:54:56 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c mm: only enforce stable page writes if the backing device requires it 2013-02-21 17:22:19 -08:00
filemap_xip.c mm: move all mmu notifier invocations to be done outside the PT lock 2012-10-09 16:22:58 +09:00
fremap.c remap_file_pages: correctly handle the case of a NULL vm_ops pointer 2012-10-19 13:37:57 -07:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c thp: avoid dumping huge zero page 2013-02-05 20:38:46 +11:00
hugetlb.c mm/hugetlb: set PTE as huge in hugetlb_change_protection and remove_migration_pte 2013-02-05 20:38:47 +11:00
hugetlb_cgroup.c mm/hugetlb: create hugetlb cgroup file in hugetlb_init 2012-12-18 15:02:15 -08:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: compaction: partially revert capture of suitable high-order page 2013-01-11 14:54:56 -08:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c mm/kmemleak.c: remove obsolete simple_strtoul 2012-12-18 15:02:15 -08:00
ksm.c ksm: make rmap walks more scalable 2012-12-20 07:06:56 -08:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: prepare VM_DONTDUMP for using in drivers 2012-10-09 16:22:18 +09:00
memblock.c mm: memblock: fix wrong memmove size in memblock_merge_regions() 2013-01-11 14:54:54 -08:00
memcontrol.c memcg: fix kmemcg registration for late caches 2013-02-12 14:34:00 -08:00
memory-failure.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
memory.c mm: reinstante dropped pmd_trans_splitting() check 2013-01-09 08:36:54 -08:00
memory_hotplug.c mm/memory_hotplug.c: improve comments 2012-12-18 15:02:15 -08:00
mempolicy.c mm: mempolicy: Convert shared_policy mutex to spinlock 2013-01-02 17:32:13 -08:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm/hugetlb: set PTE as huge in hugetlb_change_protection and remove_migration_pte 2013-02-05 20:38:47 +11:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c mm: don't overwrite mm->def_flags in do_mlockall() 2013-02-12 14:34:00 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 18:19:48 -08:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm/mmu_notifier: allocate mmu_notifier in advance 2012-10-25 14:37:53 -07:00
mmzone.c memcg: fix hotplugged memory zone oops 2012-11-16 14:33:04 -08:00
mprotect.c mm/mprotect.c: coding-style cleanups 2012-12-18 15:02:15 -08:00
mremap.c sched: Move sched.h sysctl bits into separate header 2013-02-07 20:50:54 +01:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c mm: introduce new field "managed_pages" to struct zone 2012-12-12 17:38:34 -08:00
nommu.c sched: Move sched.h sysctl bits into separate header 2013-02-07 20:50:54 +01:00
oom_kill.c mm, oom: remove redundant sleep in pagefault oom handler 2012-12-12 17:38:34 -08:00
page-writeback.c mm: only enforce stable page writes if the backing device requires it 2013-02-21 17:22:19 -08:00
page_alloc.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 18:19:48 -08:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c mm: fix zone_watermark_ok_safe() accounting of isolated pages 2013-01-04 16:11:46 -08:00
pagewalk.c thp: change split_huge_page_pmd() interface 2012-12-12 17:38:31 -08:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c mm, percpu: Make sure percpu_alloc early parameter has an argument 2012-12-02 06:23:04 -08:00
pgtable-generic.c mm: Only flush the TLB when clearing an accessible pte 2012-12-11 14:28:34 +00:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
shmem.c mempolicy: remove arg from mpol_parse_str, mpol_to_str 2013-01-02 09:27:10 -08:00
slab.c memcg: add comments clarifying aspects of cache attribute propagation 2012-12-18 15:02:15 -08:00
slab.h slab: propagate tunable values 2012-12-18 15:02:14 -08:00
slab_common.c slab: propagate tunable values 2012-12-18 15:02:14 -08:00
slob.c sl[au]b: always get the cache from its page in kmem_cache_free() 2012-12-18 15:02:14 -08:00
slub.c slub: drop mutex before deleting sysfs entry 2012-12-18 15:02:15 -08:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c memory-hotplug, mm/sparse.c: clear the memory to store struct page 2012-12-11 17:22:23 -08:00
swap.c mm: remove vma arg from page_evictable 2012-10-09 16:22:55 +09:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swapfile.c mm, oom: fix race when specifying a thread as the oom origin 2012-12-11 17:22:27 -08:00
truncate.c mm: drop vmtruncate 2012-12-20 18:46:29 -05:00
util.c Merge branch 'master' into for-next 2012-10-28 19:29:19 +01:00
vmalloc.c mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD 2012-12-11 17:22:22 -08:00
vmscan.c MM: vmscan: remove __devinit attribute. 2013-01-03 15:57:13 -08:00
vmstat.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00