linux/Documentation/filesystems
Minchan Kim bda807d444 mm: migrate: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough
to make high-order pages.  But recently, embedded system(e.g., webOS,
android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
have seen several reports about troubles of small high-order allocation.
For fixing the problem, there were several efforts (e,g,.  enhance
compaction algorithm, SLUB fallback to 0-order page, reserved memory,
vmalloc and so on) but if there are lots of non-movable pages in system,
their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages with
movable.  For the feature, this patch introduces functions related to
migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three
functions which are function pointers of struct
address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully.  On returing true, VM marks the
page as PG_isolated so concurrent isolation in several CPUs skip the
page for isolation.  If a driver cannot isolate the page, it should
return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.  The
function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage.  Keep in mind that you should
indicate to the VM the oldpage is no longer movable via
__ClearPageMovable() under page_lock if you migrated the oldpage
successfully and returns 0.  If driver cannot migrate the page at the
moment, driver can return -EAGAIN.  On -EAGAIN, VM will retry page
migration in a short time because VM interprets -EAGAIN as "temporal
migration failure".  On returning any error except -EAGAIN, VM will give
up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed
page.  In this function, driver should put the isolated page back to the
own data structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under
page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family
functions which will be called by VM.  Exactly speaking, PG_movable is
not a real flag of struct page.  Rather than, VM reuses page->mapping's
lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly.  Instead, driver
should use page_mapping which mask off the low two bits of page->mapping
so it can get right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.  As
well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
__ClearPageMovable).  But __PageMovable is cheap to catch whether page
is LRU or non-lru movable once the page has been isolated.  Because LRU
pages never can have PAGE_MAPPING_MOVABLE in page->mapping.  It is also
good for just peeking to test non-lru movable pages before more
expensive checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page.  The lock_page prevents
sudden destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via
__ClearMovablePage under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated
page as PG_isolated under lock_page.  So if a CPU encounters PG_isolated
non-lru movable page, it can skip it.  Driver doesn't need to manipulate
the flag because VM will set/clear it automatically.  Keep in mind that
if driver sees PG_isolated page, it means the page have been isolated by
VM so it shouldn't touch page.lru field.  PG_isolated is alias with
PG_reclaim flag so driver shouldn't use the flag for own purpose.

[opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
  Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
caching FS-Cache: Count the number of initialised operations 2015-04-02 14:28:53 +01:00
cifs Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
configfs configfs: switch ->default groups to a linked list 2016-03-06 16:11:24 +01:00
nfs Various bugfixes, a RDMA update from Chuck Lever, and support for a new 2016-03-24 19:50:32 -07:00
pohmelfs Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
.gitignore Documentation: update .gitignore files 2014-09-26 11:02:59 +02:00
00-INDEX dax: replace XIP documentation with DAX documentation 2015-02-16 17:56:03 -08:00
9p.txt 9p: update documentation 2014-01-24 10:55:21 -06:00
Locking mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
Makefile configfs: remove old API 2015-10-13 22:17:57 -07:00
adfs.txt
affs.txt affs: add mount option to avoid filename truncates 2014-04-07 16:36:08 -07:00
afs.txt
autofs4-mount-control.txt doc: fix double words 2014-03-21 13:16:58 +01:00
autofs4.txt autofs: the documentation I wanted to read 2014-10-14 02:18:17 +02:00
automount-support.txt Documentation: remove outdated information from automount-support.txt 2015-05-15 01:10:38 -04:00
befs.txt
bfs.txt
btrfs.txt Documentation: btrfs: remove usage specific information 2016-03-11 17:02:09 +01:00
ceph.txt
coda.txt
cramfs.txt mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
dax.txt dax: some small updates to dax.txt documentation 2016-07-26 16:19:19 -07:00
debugfs.txt debugfs: Pass bool pointer to debugfs_create_bool() 2015-10-04 11:36:07 +01:00
devpts.txt devpts: Make each mount of devpts an independent filesystem. 2016-06-05 10:36:01 -07:00
directory-locking update D/f/directory-locking 2016-05-26 00:04:18 -04:00
dlmfs.txt ocfs2: update web page + git tree in documentation 2015-02-28 09:57:50 -08:00
dnotify.txt
dnotify_test.c
ecryptfs.txt
efivarfs.txt efi: Make efivarfs entries immutable by default 2016-02-10 16:25:52 +00:00
exofs.txt
ext2.txt fs: Remove ext3 filesystem driver 2015-07-23 20:59:40 +02:00
ext3.txt fs: Remove ext3 filesystem driver 2015-07-23 20:59:40 +02:00
ext4.txt ext4: add DAX functionality 2015-02-16 17:56:04 -08:00
f2fs.txt f2fs: introduce new option for controlling data flush 2015-12-16 09:25:48 -08:00
fiemap.txt fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
files.txt
fuse.txt
gfs2-glocks.txt gfs2: Remove gl_spin define 2015-10-29 12:57:48 -05:00
gfs2-uevents.txt
gfs2.txt
hfs.txt
hfsplus.txt Documentation: update URL to hfsplus Technote 1150 2014-02-20 14:48:51 +01:00
hpfs.txt
inotify.txt inotify: update documentation to reflect code changes 2015-02-10 14:30:28 -08:00
isofs.txt
jfs.txt
locks.txt
logfs.txt
mandatory-locking.txt
ncpfs.txt
nilfs2.txt nilfs2: clarify permission to replicate the design 2016-05-23 17:04:14 -07:00
ntfs.txt NTFS: Remove changelog from Documentation/filesystems/ntfs.txt. 2014-10-16 12:43:57 +01:00
ocfs2-online-filecheck.txt ocfs2: add feature document for online file check 2016-03-22 15:36:02 -07:00
ocfs2.txt ocfs2: update web page + git tree in documentation 2015-02-28 09:57:50 -08:00
omfs.txt
orangefs.txt Orangefs: update orangefs.txt 2016-02-26 14:39:08 -05:00
overlayfs.txt ovl: update documentation 2016-05-27 08:55:26 +02:00
path-lookup.md Documentation: add new description of path-name lookup. 2015-11-02 18:18:25 -07:00
path-lookup.txt Documentation: add new description of path-name lookup. 2015-11-02 18:18:25 -07:00
porting switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
proc.txt irq/Documentation: Correct result of echnoing 5 to smp_affinity 2016-07-10 16:55:04 +02:00
qnx6.txt Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
quota.txt quota: Update documentation 2015-05-18 11:23:07 +02:00
ramfs-rootfs-initramfs.txt initmpfs: use initramfs if rootfstype= or root= specified 2013-09-11 15:59:38 -07:00
relay.txt
romfs.txt
seq_file.txt Documentation: update seq_file 2014-12-29 15:40:18 -07:00
sharedsubtree.txt doc: fix grammar 2016-03-09 15:33:06 -07:00
spufs.txt
squashfs.txt Squashfs: Add LZ4 compression configuration option 2014-11-27 18:48:44 +00:00
sysfs-pci.txt
sysfs-tagging.txt sysfs-tagging.txt: fix pre-kernfs references 2015-09-13 14:38:51 -06:00
sysfs.txt sysfs.txt: mention that store method buffers are null-terminated 2015-09-13 14:38:51 -06:00
sysv-fs.txt
tmpfs.txt mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
ubifs.txt
udf.txt
ufs.txt
vfat.txt fat: add config option to set UTF-8 mount option by default 2016-03-22 15:36:02 -07:00
vfs.txt mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
xfs-delayed-logging-design.txt
xfs-self-describing-metadata.txt
xfs.txt xfs: fix kernel version in docs 2015-06-01 07:15:38 +10:00