linux

Commit Graph

Author	SHA1	Message	Date
Jaegeuk Kim	de93653fe3	f2fs: reserve the xattr space dynamically This patch enables the number of direct pointers inside on-disk inode block to be changed dynamically according to the size of inline xattr space. The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file has inline xattr flag. The number of direct pointers that will be used by inline xattrs is defined as F2FS_INLINE_XATTR_ADDRS. Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-26 20:15:01 +09:00
Jaegeuk Kim	444c580f7e	f2fs: add flags for inline xattrs This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR, and add a mount option, inline_xattr, which is enabled when xattr is set. If the mount option is enabled, all the files are marked with the inline_xattrs flag. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-26 20:02:12 +09:00
Wei Yongjun	6e6b978c32	f2fs: fix error return code in init_f2fs_fs() Fix to return -ENOMEM in the kset create and add error handling case instead of 0, as done elsewhere in this function. Introduced by commit `b59d0bae6c`. (f2fs: add sysfs support for controlling the gc_thread) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> [Jaegeuk Kim: merge the patch with previous modification] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-26 19:36:46 +09:00
Jaegeuk Kim	d59ff4df7b	f2fs: fix wrong BUG_ON condition This patch removes a false-alaramed BUG_ON. The previous BUG_ON condition didn't cover the following true scenario. In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully, and then, 2) init_inode_metadata returns -ENOSPC. At this moment, a new clean data page is remained in the page cache, but its block address still indicates NEW_ADDR. After then, even if sync is called, this clean data page cannot be written to the disk due to the clean state. So this means that get_lock_data_page should make a new empty page when its block address is NEW_ADDR and its page is not uptodated. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-20 19:32:48 +09:00
Zhao Hongjiang	9890ff3f23	f2fs: fix memory leak when init f2fs filesystem fail When any of the caches create fails in init_f2fs_fs(), the other caches which are create successful should be free. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-20 18:58:44 +09:00
Gu Zheng	7b40527508	f2fs: fix a compound statement label error An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS disabled. fs/f2fs/segment.c:556:1: error: label at end of compound statement So clean up the 'out' label to fix it. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-19 11:51:08 +09:00
Jin Xu	92c4342fb7	f2fs: avoid writing inode redundantly when creating a file In f2fs_write_inode, updating inode after f2fs_balance_fs is not a optimized way in the case that f2fs_gc is performed ahead. The inode page will be unnecessarily written out twice, one of which is in f2fs_gc->...->sync_node_pages and the other is in update_inode_page. Let's update the inode page in prior to f2fs_balance_fs to avoid this. To reproduce it, $ touch file (before this step, should make the device need f2fs_gc) $ sync (or wait the bdi to write dirty inode) Signed-off-by: Jin Xu <jinuxstyle@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-19 09:43:25 +09:00
Dan Carpenter	e27dae4d66	f2fs: alloc_page() doesn't return an ERR_PTR alloc_page() returns a NULL on failure, it never returns an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-19 09:42:29 +09:00
Jaegeuk Kim	479bd73ac4	f2fs: should cover i_xattr_nid with its xattr node page lock Previously, f2fs_setxattr assigns i_xattr_nid in the inode page inconsistently. The scenario is: = Thread 1 = = Thread 2 = = fi->i_xattr_nid = = on-disk nid = f2fs_setxattr 0 0 new_node_page X 0 sync_inode_page X X checkpoint X X -. grab_cache_page X X \| --> allocate a new xattr node block or -ENOSPC <----------------' At this moment, the checkpoint stores inconsistent data where the inode has i_xattr_nid but actual xattr node block is not allocated yet. So, we should assign the real i_xattr_nid only after its xattr node block is allocated. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-12 16:04:53 +09:00
Jaegeuk Kim	9c02740c01	f2fs: check the free space first in new_node_page Let's check the free space in prior to the main process of allocating a new node page. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-12 16:00:46 +09:00
Gu Zheng	41dfde135f	f2fs: clean up the needless end 'return' of void function Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-12 11:49:22 +09:00
Jaegeuk Kim	d71b5564c0	f2fs: introduce cur_cp_version function to reduce code size This patch introduces a new inline function, cur_cp_version, to reduce redundant codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:37 +09:00
Jaegeuk Kim	e518ff81c3	f2fs: fix inconsistency between xattr node blocks and its inode Previously xattr node blocks are stored to the COLD_NODE log, which means that our roll-forward mechanism doesn't recover the xattr node blocks at all. Only the direct node blocks in the WARM_NODE log can be recovered. So, let's resolve the issue simply by conducting checkpoint during fsync when a file has a modified xattr node block. This approach is able to degrade the performance, but normally the checkpoint overhead is shown at the initial fsync call after the xattr entry changes. Once the checkpoint is done, no additional overhead would be occurred. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:24 +09:00
Jaegeuk Kim	dbe6a5ff4f	f2fs: fix the use of XATTR_NODE_OFFSET This patch fixes the use of XATTR_NODE_OFFSET. o The offset should not use several MSB bits which are used by marking node blocks. o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality during the fsync call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 14:57:56 +09:00
Jaegeuk Kim	c2d715d144	f2fs: fix a build failure due to missing the kobject header This patch should resolve the following error reported by kbuild test robot. All error/warnings: In file included from fs/f2fs/dir.c:13:0: >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type struct kobject s_kobj; The failure was caused by missing the kobject header file in dir.c. So, this patch move the header file to the right location, f2fs.h. CC: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-08 15:56:49 +09:00
Jin Xu	a569469e96	f2fs: fix a deadlock in fsync This patch fixes a deadlock bug that occurs quite often when there are concurrent write and fsync on a same file. Following is the simplified call trace when tasks get hung. fsync thread: - f2fs_sync_file ... - f2fs_write_data_pages ... - update_extent_cache ... - update_inode - wait_on_page_writeback bdi writeback thread - __writeback_single_inode - f2fs_write_data_pages - mutex_lock(sbi->writepages) The deadlock happens when the fsync thread waits on a inode page that has been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately, no one else could be able to submit the cached bio to block layer for writeback. This is because the fsync thread already hold a sbi->fs_lock and the sbi->writepages lock, causing the bdi thread being blocked when attempt to write data pages for the same inode. At the same time, f2fs_gc thread does not notice the situation and could not help. Even the sync syscall gets blocked. To fix it, we could submit the cached bio first before waiting on a inode page that is being written back. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:36 +09:00
Namjae Jeon	df273efc36	f2fs: remove redundant code from f2fs_write_begin This code is being used for nobh_write_end() function. But since now f2fs_write_end function is added so there is no need for this code. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:35 +09:00
Namjae Jeon	d2dc095f42	f2fs: add sysfs entries to select the gc policy Add sysfs entry gc_idle to control the gc policy. Where gc_idle = 1 corresponds to selecting a cost benefit approach, while gc_idle = 2 corresponds to selecting a greedy approach to garbage collection. The selection is mutually exclusive one approach will work at any point. If gc_idle = 0, then this option is disabled. Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: change the select_gc_type() flow slightly] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:18 +09:00
Namjae Jeon	b59d0bae6c	f2fs: add sysfs support for controlling the gc_thread Add sysfs entries to control the timing parameters for f2fs gc thread. Various Sysfs options introduced are: gc_min_sleep_time: Min Sleep time for GC in ms gc_max_sleep_time: Max Sleep time for GC in ms gc_no_gc_sleep_time: Default Sleep time for GC in ms Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: fix an umount bug and some minor changes] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 21:53:34 +09:00
Dan Carpenter	f0c5e565bb	f2fs: remove an unneeded kfree(NULL) This kfree() is no longer needed after a79dc083d7 "f2fs: move bio_private allocation out of f2fs_bio_alloc()". The "bio->bi_private" is NULL here so it's a no-op. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-31 19:07:01 +09:00
Jaegeuk Kim	cbd56e7d20	f2fs: fix handling orphan inodes This patch fixes mishandling of the sbi->n_orphans variable. If users request lots of f2fs_unlink(), check_orphan_space() could be contended. In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink() would fall into the wrong state which results in the failure of add_orphan_inode(). So, let's increment sbi->n_orphans virtually prior to the actual orphan inode stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode or remove_orphan_inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	d8207f6958	f2fs: move bio_private allocation out of f2fs_bio_alloc() bio->bi_private is not always needed. As in the reading data path, end_read_io does not need bio_private for further using, so moving bio_private allocation out of f2fs_bio_alloc(). Alloc it in the submit_write_page(), and ignore it in the f2fs_readpage(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	60ed9a0f53	f2fs: use list_for_each rather than list_for_each_safe, in remove_orphan_inode() As we remove the target single node, so list_for_each is enought, in order to clean up, we use list_for_each_entry instead. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	2d219c5188	f2fs: use seq_puts()/seq_putc() rather than seq_printf() where possible For string without format specifiers, using seq_puts()/seq_putc() instead of seq_printf(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Jaegeuk Kim	f0947e5cce	f2fs: fix i_name during f2fs_sync_file As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1. The errorneous scenario is like this. 1. touch test1 2. link test1 test2 3. unlink test2 4. fsync test1 After this, i_name should be test1. CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Jaegeuk Kim	1cd14cafc6	f2fs: update file name in the inode block during f2fs_rename The error is reproducible by: 0. mkfs.f2fs /dev/sdb1 & mount 1. touch test1 2. touch test2 3. mv test1 test2 4. umount 5. dumpt.f2fs -i 4 /dev/sdb1 After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get test1 instead of test2. This is because f2fs didn't update the file name during the f2fs_rename. So, this patch fixes that. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	4559071063	f2fs: introduce help function F2FS_NODE() Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Gu Zheng	963d4f7d7b	f2fs: add a help func F2FS_STAT() to get the f2fs_stat_info Add a help func F2FS_STAT() to get the f2fs_stat_info. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	5e176d54a6	f2fs: add proc entry to monitor current usage of segments You can monitor valid block counts of whole segments in: /proc/fs/f2fs/sdb1/segment_info. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	e5d2385ed6	f2fs: recover date requested by fdatasync In order to support SQLite that uses fdatasync instead of fsync, we should guarantee the data requested by fdatasync can be recovered after sudden-power- off. So, let's remove the fdatasync condition in f2fs_sync_file. Otherwise, we can restore the data after sudden-power-off due to nonexistence of any fsync mark'ed node blocks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	99b072bb38	f2fs: fix readdir incorrectness In the previous Al Viro's readdir patch set, there occurs a bug when running xfstest: 006 as follows. [Error output] alpha size = 4, name length = 6, total files = 4096, nproc=1 1023 files created rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty [Correct output] alpha size = 4, name length = 6, total files = 4096, nproc=1 4097 files created This bug is due to the misupdate of directory position in ctx. So, this patch fixes this. [AV: fixed a braino] CC: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-07-08 13:35:48 +04:00
Linus Torvalds	3f490f7f99	This patch-set includes the following major enhancement patches. o remount_fs callback function o restore parent inode number to enhance the fsync performance o xattr security labels o reduce the number of redundant lock/unlock data pages o avoid frequent write_inode calls The other minor bug fixes are as follows. o endian conversion bugs o various bugs in the roll-forward recovery routine -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJR0h8YAAoJEEAUqH6CSFDSJNMP/1V+e/PaLa8YsHuw5eWPT3Fe RYFXJv7SXadHqgQInjwD7JOF8BC9DlUyknYUiXFUZgvKsgMHz+OTCNl+1hLTbUXt e6Rhn7vU/fMu3TEtj+v6g8JbpXdXH8TSbtvh9LpoczRS98GYpZuckP0BtQsVkTL3 jIq6WD8JRkb2DvpDl7viTT0Eq0T61CSmtOwOIvfhhiVxggvDWcnR1mTM9Tymdi4J NKtFkjCsKP0Z/7IZZUJAczGEHjsT+O6JDwp8+KVWuZT4BSuchoX4MYAY5wX527Ne rZvkolbbfnAFCC3BtETr0DPOkpxnHmJ6dEveIxjZ9cux12CAFA/Ww1QAL4ygiDkd Avn8BBEJnfnuzeOUkE0by+9hdF9LOU3CVSxiDhWJegYB16z+c9pSBD9xvlKhKk5B QNsjptB6m0CftAq7vIDsryL60uJ2cSHFxPqfwAHEpNngiU/NohTFSZE0sUMbLUNh FI6NrHoT7yld1HcB6cvL1lnUqIENFbNhDSSDcTdlN49IJJap4oqtgCnmMMIwbUCO vR2/26k5W7NwG+K6XN2IX13AsayzQahxTv8in/+LV0bfjHo6/1VzzGcqAmXJbDQw dLrNAeWaaIJi8J/zJiENMbFKXTj8Wax9jxKsW+4towQuyEt4ADvyt1c5gX3VR42T x8+YEargsdBf6FAhtN+H =qFcy -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches: - remount_fs callback function - restore parent inode number to enhance the fsync performance - xattr security labels - reduce the number of redundant lock/unlock data pages - avoid frequent write_inode calls The other minor bug fixes are as follows. - endian conversion bugs - various bugs in the roll-forward recovery routine" * tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits) f2fs: fix to recover i_size from roll-forward f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes() f2fs: remove reusing any prefree segments f2fs: code cleanup and simplify in func {find/add}_gc_inode f2fs: optimize the init_dirty_segmap function f2fs: fix an endian conversion bug detected by sparse f2fs: fix crc endian conversion f2fs: add remount_fs callback support f2fs: recover wrong pino after checkpoint during fsync f2fs: optimize do_write_data_page() f2fs: make locate_dirty_segment() as static f2fs: remove unnecessary parameter "offset" from __add_sum_entry() f2fs: avoid freqeunt write_inode calls f2fs: optimise the truncate_data_blocks_range() range f2fs: use the F2FS specific flags in f2fs_ioctl() f2fs: sync dir->i_size with its block allocation f2fs: fix i_blocks translation on various types of files f2fs: set sb->s_fs_info before calling parse_options() f2fs: support xattr security labels f2fs: fix iget/iput of dir during recovery ...	2013-07-02 09:42:38 -07:00
Linus Torvalds	9e239bb939	Lots of bug fixes, cleanups and optimizations. In the bug fixes category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR0XhgAAoJENNvdpvBGATwMXkQAJwTPk5XYLqtAwLziFLvM6wG 0tWa1QAzTNo80tLyM9iGqI6x74X5nddLw5NMICUmPooOa9agMuA4tlYVSss5jWzV yyB7vLzsc/2eZJusuVqfTKrdGybE+M766OI6VO9WodOoIF1l51JXKjktKeaWegfv NkcLKlakD4V+ZASEDB/cOcR/lTwAs9dQ89AZzgPiW+G8Do922QbqkENJB8mhalbg rFGX+lu9W0f3fqdmT3Xi8KGn3EglETdVd6jU7kOZN4vb5LcF5BKHQnnUmMlpeWMT ksOVasb3RZgcsyf5ZOV5feXV601EsNtPBrHAmH22pWQy3rdTIvMv/il63XlVUXZ2 AXT3cHEvNQP0/yVaOTCZ9xQVxT8sL4mI6kENP9PtNuntx7E90JBshiP5m24kzTZ/ zkIeDa+FPhsDx1D5EKErinFLqPV8cPWONbIt/qAgo6663zeeIyMVhzxO4resTS9k U2QEztQH+hDDbjgABtz9M/GjSrohkTYNSkKXzhTjqr/m5huBrVMngjy/F4/7G7RD vSEx5aXqyagnrUcjsupx+biJ1QvbvZWOVxAE/6hNQNRGDt9gQtHAmKw1eG2mugHX +TFDxodNE4iWEURenkUxXW3mDx7hFbGZR0poHG3M/LVhKMAAAw0zoKrrUG5c70G7 XrddRLGlk4Hf+2o7/D7B =SwaI -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Lots of bug fixes, cleanups and optimizations. In the bug fixes category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) ext4: optimize starting extent in ext4_ext_rm_leaf() jbd2: invalidate handle if jbd2_journal_restart() fails ext4: translate flag bits to strings in tracepoints ext4: fix up error handling for mpage_map_and_submit_extent() jbd2: fix theoretical race in jbd2__journal_restart ext4: only zero partial blocks in ext4_zero_partial_blocks() ext4: check error return from ext4_write_inline_data_end() ext4: delete unnecessary C statements ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree() jbd2: move superblock checksum calculation to jbd2_write_superblock() ext4: pass inode pointer instead of file pointer to punch hole ext4: improve free space calculation for inline_data ext4: reduce object size when !CONFIG_PRINTK ext4: improve extent cache shrink mechanism to avoid to burn CPU time ext4: implement error handling of ext4_mb_new_preallocation() ext4: fix corruption when online resizing a fs with 1K block size ext4: delete unused variables ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents jbd2: remove debug dependency on debug_fs and update Kconfig help text jbd2: use a single printk for jbd_debug() ...	2013-07-02 09:39:34 -07:00
Jaegeuk Kim	a1dd3c13ce	f2fs: fix to recover i_size from roll-forward If user requests many data writes and fsync together, the last updated i_size should be stored to the inode block consistently. But, previous write_end just marks the inode as dirty and doesn't update its metadata into its inode block. After that, fsync just writes the inode block with newly updated data index excluding inode metadata updates. So, this patch introduces write_end in which updates inode block too when the i_size is changed. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:16 +09:00
Gu Zheng	5ebefc5b40	f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes() As destroy_fsync_dnodes() is a simple list-cleanup func, so delete the unused and unrelated f2fs_sb_info argument of it. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:15 +09:00
Jaegeuk Kim	763bfe1bc5	f2fs: remove reusing any prefree segments This patch removes check_prefree_segments initially designed to enhance the performance by narrowing the range of LBA usage across the whole block device. When allocating a new segment, previous f2fs tries to find proper prefree segments, and then, if finds a segment, it reuses the segment for further data or node block allocation. However, I found that this was totally wrong approach since the prefree segments have several data or node blocks that will be used by the roll-forward mechanism operated after sudden-power-off. Let's assume the following scenario. /* write 8MB with fsync / for (i = 0; i < 2048; i++) { offset = i 4096; write(fd, offset, 4KB); fsync(fd); } In this case, naive segment allocation sequence will be like: data segment: x, x+1, x+2, x+3 node segment: y, y+1, y+2, y+3. But, if we can reuse prefree segments, the sequence can be like: data segment: x, x+1, y, y+1 node segment: y, y+1, y+2, y+3. Because, y, y+1, and y+2 became prefree segments one by one, and those are reused by data allocation. After conducting this workload, we should consider how to recover the latest inode with its data. If we reuse the prefree segments such as y or y+1, we lost the old node blocks so that f2fs even cannot start roll-forward recovery. Therefore, I suggest that we should remove reusing prefree segments. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:15 +09:00
Gu Zheng	6cc4af5606	f2fs: code cleanup and simplify in func {find/add}_gc_inode This patch simplifies list operations in find_gc_inode and add_gc_inode. Just simple code cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:14 +09:00
Namjae Jeon	8736fbf003	f2fs: optimize the init_dirty_segmap function Optimize the while loop condition Since this condition will always be true and while loop will be terminated by the following condition in code: if (segno >= TOTAL_SEGS(sbi)) break; Hence we can replace the while loop condition with while(1) instead of always checking for segno to be less than Total segs. Also we do not need to use TOTAL_SEGS() everytime. We can store this value in a local variable since this value is constant. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:14 +09:00
Jaegeuk Kim	060dd67b3c	f2fs: fix an endian conversion bug detected by sparse This patch should fix the following bug reported by kbuild test robot. fs/f2fs/recovery.c:233:33: sparse: incorrect type in assignment (different base types) parse warnings: (new ones prefixed by >>) >> recovery.c:233: sparse: incorrect type in assignment (different base types) recovery.c:233: expected unsigned int [unsigned] [assigned] ofs_in_node recovery.c:233: got restricted __le16 [assigned] [usertype] ofs_in_node >> recovery.c:238: sparse: incorrect type in assignment (different base types) recovery.c:238: expected unsigned int [unsigned] ofs_in_node recovery.c:238: got restricted __le16 [assigned] [usertype] ofs_in_node Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:48:13 +09:00
Jaegeuk Kim	7e586fa024	f2fs: fix crc endian conversion While calculating CRC for the checkpoint block, we use __u32, but when storing the crc value to the disk, we use __le32. Let's fix the inconsistency. Reported-and-Tested-by: Oded Gabbay <ogabbay@advaoptical.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-02 08:47:35 +09:00
Al Viro	6f7f231e7b	[readdir] convert f2fs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:56:46 +04:00
Namjae Jeon	696c018c77	f2fs: add remount_fs callback support Add the f2fs_remount function call which will be used during the filesystem remounting. This function will help us to change the mount options specific to f2fs. Also modify the f2fs background_gc mount option, which will allow the user to dynamically trun on/off the garbage collection in f2fs based on the background_gc value. If background_gc=on, Garbage collection will be turned off & if background_gc=off, Garbage collection will be truned on. By default the garbage collection is on in f2fs. Change Log: v2: Incorporated the review comments by Gu Zheng. Removing the restore part for VFS flags Updating comments with proper flag conditions Display GC background option as ON/OFF Revised conditions to stop GC in case of remount v1: Initial changes for adding remount_fs callback support. Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: change /** with /* for the coding style] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-17 21:17:55 +09:00
Jaegeuk Kim	354a3399dc	f2fs: recover wrong pino after checkpoint during fsync If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:45 +09:00
Haicheng Li	b25958b6ec	f2fs: optimize do_write_data_page() Since "need_inplace_update() == true" is a very rare case, using unlikely() to give compiler a chance to optimize the code. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:44 +09:00
Haicheng Li	8d8451af68	f2fs: make locate_dirty_segment() as static It's used only locally and could be static. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:44 +09:00
Haicheng Li	e79efe3b69	f2fs: remove unnecessary parameter "offset" from __add_sum_entry() We can get the value directly from pointer "curseg". Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:43 +09:00
Jaegeuk Kim	b3783873cc	f2fs: avoid freqeunt write_inode calls If update_inode is called, we don't need to do write_inode. So, let's use a dirty flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:43 +09:00
Namjae Jeon	d7cc950b4c	f2fs: optimise the truncate_data_blocks_range() range The function truncate_data_blocks_range() decrements the valid block count of inode via dec_valid_block_count(). Since this function updates the i_blocks field of inode, we can update this field once we have calculated total the number of blocks to be freed. Therefore we can decrement valid blocks outside of the for loop. if (nr_free) { + dec_valid_block_count(sbi, dn->inode, nr_free); set_page_dirty(dn->node_page); sync_inode_page(dn); } 'nr_free' tells the total number of blocks freed. So, we can just directly pass this value to dec_valid_block_count() and update the i_blocks. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:42 +09:00
Namjae Jeon	6a3e8ef0de	f2fs: use the F2FS specific flags in f2fs_ioctl() In f2fs_ioctl() function, it is using generic flags. Since F2FS specific flags are defined. So lets use those flags. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:26 +09:00
Jaegeuk Kim	699489bbbe	f2fs: sync dir->i_size with its block allocation If new dentry block is allocated and its i_size is updated, we should update its inode block together in order to sync i_size and its block allocation. Otherwise, we can loose additional dentry block due to the unconsistent i_size. Errorneous Scenario ------------------- In the recovery routine, - recovery_dentry \| - __f2fs_add_link \| \| - get_new_data_page \| \| \| - i_size_write(new_i_size) \| \| \| - mark_inode_dirty_sync(dir) \| \| - update_parent_metadata \| \| \| - mark_inode_dirty(dir) \| - write_checkpoint - sync_dirty_dir_inodes - filemap_flush(dentry_blocks) - f2fs_write_data_page - skip to write the last dentry block due to index < i_size In the above flow, new_i_size is not updated to its inode block so that the last dentry block will be lost accordingly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-12 08:18:35 +09:00
Jaegeuk Kim	2d4d9fb591	f2fs: fix i_blocks translation on various types of files Basically an inode manages the number of allocated blocks with inode->i_blocks which is represented in a unit of sectors, not file system blocks. But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr translates it to the number of sectors when fstat is called. However, previously f2fs_file_inode_operations only has this, so this patch adds it to all the types of inode_operations. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-11 16:01:09 +09:00
Gu Zheng	5fb08372a6	f2fs: set sb->s_fs_info before calling parse_options() In f2fs_fill_super(), set sb->s_fs_info before calling parse_options(), then we can get f2fs_sb_info via F2FS_SB(sb) in parse_options(). So that the second argument "sbi" of func parse_options() is no longer needed. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-11 16:01:08 +09:00
Jaegeuk Kim	8ae8f1627f	f2fs: support xattr security labels This patch adds the support of security labels for f2fs, which will be used by Linus Security Models (LSMs). Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules: "Linux Security Modules (LSM) is a framework that allows the Linux kernel to support a variety of computer security models while avoiding favoritism toward any single security implementation. The framework is licensed under the terms of the GNU General Public License and is standard part of the Linux kernel since Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted modules in the official kernel.". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-11 16:01:03 +09:00
Jaegeuk Kim	5deb82671a	f2fs: fix iget/iput of dir during recovery It is possible that iput is skipped after iget during the recovery. In recover_dentry(), dir = f2fs_iget(); ... if (de && inode->i_ino == le32_to_cpu(de->ino)) goto out; In this case, this dir is not able to be added in dirty_dir_inode_list. The actual linking is done only when set_page_dirty() is called. So let's add this newly got inode into the list explicitly, and put it at the end of the recovery routine. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-07 13:21:37 +09:00
Namjae Jeon	b2b3460a94	f2fs: reorganise the function get_victim_by_default Fix the function get_victim_by_default, where it checks for the condition that p.min_segno != NULL_SEGNO as shown: if (p.min_segno != NULL_SEGNO) goto got_it; and if above condition is true then got_it: if (p.min_segno != NULL_SEGNO) { So this condition is being checked twice. Hence move the goto statement after the if condition so that duplication of condition check is avoided. Also this function makes a call to get_max_cost() to compute the max cost based on the f2fs_sbi_info and victim policy. Since get_max_cost depends on on three parameters of victim_sel_policy => alloc_mode, gc_mode & ofs_unit, once this victim policy is initialised, these value will not change till the execution time of get_victim_by_default() & also f2fs_sbi_info structure parameters will not change. Hence making calls to get_max_cost() in while loop does not seems to be a good point. Instead we can call it once in begining and store the results in local variable, which later can serve our purpose for comparing the cost with max cost inside the while loop. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-06 14:20:46 +09:00
Jason Hrycay	1e03e38b35	f2fs: handle errors from get_node_page calls Add check for error pointers returned from get_node_page in order to avoid dereferencing a bad address on the next use. Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-03 19:49:09 +09:00
Jaegeuk Kim	83d5d6f66b	f2fs: cover cp_file information with ilock If a file is linked with other files, it should be checkpointed at every fsync calls. For this, we use set_cp_file() with FADVISE_CP_BIT, but previously we didn't cover the flag by the global lock. This patch fixes that the inode page stores this correctly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:06 +09:00
Jaegeuk Kim	afc3eda2a8	f2fs: fix incorrect iputs during the dentry recovery - iget/iput flow in the dentry recovery process 1. dir = f2fs_iget 2. set FI_DELAY_IPUT to dir 3. add dir to the dirty_dir_list - __f2fs_add_link - recover_dentry) 4. iput dir by remove_dirty_dir_inode - sync_dirty_dir_inodes - write_chekcpoint If dir's i_count is not 1 (i.e., root dir), remove_dirty_dir_inode is called later and then iput is triggered again due to the FI_DELAY_IPUT flag. So, let's unset the flag properly once iput is triggered. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:06 +09:00
Jaegeuk Kim	6b8213d9a4	f2fs: fix dentry recovery routine The error scenario is: 1. create /a (1.a link /a /b) 2. sync 3. unlinke /a 4. create /a 5. fsync /a 6. Sudden power-off When the f2fs recovers the fsynced dentry, /a, we discover an exsiting dentry at f2fs_find_entry() in recover_dentry(). In such the case, we should unlink the existing dentry and its inode and then recover newly created dentry. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Jaegeuk Kim	3b10b1fd2b	f2fs: iput only if whole data blocks are flushed If there remains some unwritten blocks from the recovery, we should not call iput on that directory inode. Otherwise, we can loose some dentry blocks after the recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Namjae Jeon	7a267f8d74	f2fs: return proper error from start_gc_thread when there is an error from kthread_run, then return proper error rather than returning -ENOMEM. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Namjae Jeon	a06a241603	f2fs: optimize several routines in node.h There are various functions with common code which could be separated out to make common routines. So, made new routines and in order to retain the same call path and no major changes, written some macros to access those routines. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Namjae Jeon	4777f86b7c	f2fs: remove unneeded initializations in f2fs_parent_dir There is no need to initialize few pointers in f2fs_parent_dir as the values are not checked and instead directly initialized values are used. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Namjae Jeon	35b09d82c3	f2fs: push some variables to debug part Some, counters are needed only for the statistical information while debugging. So, those can be controlled using CONFIG_F2FS_STAT_FS, pushing the usage for few variables under this flag. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:05 +09:00
Jaegeuk Kim	a9841c4dbb	f2fs: align data types between on-disk and in-memory block addresses The on-disk block address is defined as __le32, but in-memory block address, block_t, does as u64. Let's synchronize them to 32 bits. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Dan Carpenter	f28c06fa6f	f2fs: dereferencing an ERR_PTR There is an error path where "dir" is an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	6f6fd833e1	f2fs: use ihold Use the following helper function committed by Al. commit `7de9c6ee3e` Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Oct 23 11:11:40 2010 -0400 new helper: ihold() ... Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	93ff10d690	f2fs: should not make_bad_inode on f2fs_link failure If -ENOSPC is met during f2fs_link, we should not make the inode as bad. The inode is still alive. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	39cf72cf09	f2fs: fix to handle do_recover_data errors This patch adds error handling codes of check_index_in_prev_nodes and its caller, do_recover_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	b292dcab06	f2fs: reuse the locked dnode page and its inode This patch fixes the following deadlock bug during the recovery. INFO: task mount:1322 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount D ffffffff81125870 0 1322 1266 0x00000000 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520 Call Trace: [<ffffffff81125870>] ? __lock_page+0x70/0x70 [<ffffffff816a92d9>] schedule+0x29/0x70 [<ffffffff816a93af>] io_schedule+0x8f/0xd0 [<ffffffff8112587e>] sleep_on_page+0xe/0x20 [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81125867>] __lock_page+0x67/0x70 [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40 [<ffffffff81126857>] find_lock_page+0x67/0x80 [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0 [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs] [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs] [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs] [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40 [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs] [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210 [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs] [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff81185a13>] mount_fs+0x43/0x1b0 [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20 [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120 [<ffffffff811a2cb7>] do_mount+0x237/0xa10 [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80 [<ffffffff811a3520>] SyS_mount+0x90/0xe0 [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b The bug is triggered when check_index_in_prev_nodes tries to get the direct node page by calling get_node_page. At this point, if the direct node page is already locked by get_dnode_of_data, its caller, we got a deadlock condition. This patch adds additional condition check for the reuse of locked direct node pages prior to the get_node_page call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	b638f0c4b8	f2fs: fix wrong condition check While an orphan inode has zero link_count, f2fs_gc is able to select the inode for foreground gc. - f2fs_gc - do_garbage_collect - gc_data_segment : f2fs_iget is failed : get_valid_blocks() != 0, so that retry --> here we got the infinite loop. This patch resolved this issue. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Jaegeuk Kim	77888c1e42	f2fs: add f2fs_readonly() Introduce a simple macro function for readability. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Jaegeuk Kim	6f85b35203	f2fs: avoid RECLAIM_FS-ON-W: deadlock This patch tries to avoid the following deadlock condition of which the reclaim path can trigger f2fs_balance_fs again. ================================= [ INFO: inconsistent lock state ] --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes: (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs] {RECLAIM_FS-ON-W} state was registered at: [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140 [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0 [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0 [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180 [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0 [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0 [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs] [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs] [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs] [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs] [<ffffffff811ae51b>] notify_change+0x1db/0x3a0 [<ffffffff8118fbd0>] do_truncate+0x60/0xa0 [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0 [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0 [<ffffffff8118ffee>] SyS_truncate+0xe/0x10 [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Jaegeuk Kim	2c2c149f7d	f2fs: don't do checkpoint if error is occurred If we met an error during the dentry recovery, we should not conduct checkpoint. Otherwise, some errorneous dentry blocks overwrites the existing blocks that contain the remaining recovery information. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Jaegeuk Kim	45856aff0d	f2fs: fix to unlock page before exit If we got an error after lock_page, we should unlock it before exit. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Jaegeuk Kim	9a55ed656c	f2fs: remove unnecessary kmap/kunmap operations The allocated page used by the recovery is not on HIGHMEM, so that we don't need to use kmap/kunmap. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Namjae Jeon	9851e6e189	f2fs: reorganize f2fs_vm_page_mkwrite Few things can be changed in the default mkwrite function 1) Make file_update_time at the start before acquiring any lock 2) the condition page_offset(page) >= i_size_read(inode) should be changed to page_offset(page) > i_size_read 3) Move wait_on_page_writeback. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
majianpeng	145b04e5ed	f2fs: use list_for_each_entry rather than list_for_each_entry_safe We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its list operations. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Haicheng Li	81fb5e8746	f2fs: remove unecessary variable and code Code cleanup without behavior changed. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Peter Zijlstra	bfe35965ec	f2fs, lockdep: annotate mutex_lock_all() Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all() acquires an array of locks (in global/local lock style). Any such operation is always serialized using cp_mutex, therefore there is no fs_lock[] lock-order issue; tell lockdep about this using the mutex_lock_nest_lock() primitive. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Jaegeuk Kim	f356fe0cba	f2fs: add debug msgs in the recovery routine This patch adds some trivial debugging messages in the recovery process. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Jaegeuk Kim	44a83ff6a8	f2fs: update inode page after creation I found a bug when testing power-off-recovery as follows. [Bug Scenario] 1. create a file 2. fsync the file 3. reboot w/o any sync 4. try to recover the file - found its fsync mark - found its dentry mark : try to recover its dentry - get its file name - get its parent inode number : here we got zero value The reason why we get the wrong parent inode number is that we didn't synchronize the inode page with its newly created inode information perfectly. Especially, previous f2fs stores fi->i_pino and writes it to the cached node page in a wrong order, which incurs the zero-valued i_pino during the recovery. So, this patch modifies the creation flow to fix the synchronization order of inode page with its inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Jaegeuk Kim	64aa7ed98d	f2fs: change get_new_data_page to pass a locked node page This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	1646cfac95	f2fs: skip get_node_page if locked node page is passed If get_dnode_of_data gets a locked node page, let's skip redundant get_node_page calls. This is for the futher enhancement. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	0a364af18f	f2fs: remove unnecessary por_doing check This por_doing check is totally not related to the recovery process. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	74d0b917ef	f2fs: fix BUG_ON during f2fs_evict_inode(dir) During the dentry recovery routine, recover_inode() triggers __f2fs_add_link with its directory inode. In the following scenario, a bug is captured. 1. dir = f2fs_iget(pino) 2. __f2fs_add_link(dir, name) 3. iput(dir) -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents)) Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable] [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs] Call Trace: [<ffffffff8118ea00>] evict+0xb0/0x1b0 [<ffffffff8118f1c5>] iput+0x105/0x190 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs] [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70 This means that we should flush all the dentry pages between iget and iput(). But, during the recovery routine, it is unallowed due to consistency, so we have to wait the whole recovery process. And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we can put the stale dir inodes from the dirty_dir_inode_list. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	8c26d7d571	f2fs: fix por_doing variable coverage The reason of using sbi->por_doing is to alleviate data writes during the recovery. The find_fsync_dnodes() produces some dirty dentry pages, so we should cover it too with sbi->por_doing. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	addbe45b00	f2fs: remove redundant assignment We don't need to assign a value redundantly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Jaegeuk Kim	650495dedc	f2fs: fix the inconsistent state of data pages In get_lock_data_page, if there is a data race between get_dnode_of_data for node and grab_cache_page for data, f2fs is able to face with the following BUG_ON(dn.data_blkaddr == NEW_ADDR). kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251! [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs] Call Trace: [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs] [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b This bug is able to be occurred when the block address of the data block is changed after f2fs_put_dnode(). In order to avoid that, this patch fixes the lock order of node and data blocks in which the node block lock is covered by the data block lock. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:00 +09:00
Jaegeuk Kim	65e5cd0a15	f2fs: fix inconsistency of block count during recovery Currently f2fs recovers the dentry of fsynced files. When power-off-recovery is conducted, this newly recovered inode should increase node block count as well as inode block count. This patch resolves this inconsistency that results in: 1. create a file 2. write data 3. fsync 4. reboot without sync 5. mount and recover the file 6. node block count is 1 and inode block count is 2 : fall into the inconsistent state 7. unlink the file : trigger the following BUG_ON ------------[ cut here ]------------ kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716! Call Trace: [<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs] [<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs] [<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs] [<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs] [<ffffffff811c7a67>] evict+0xa7/0x1a0 [<ffffffff811c82b5>] iput+0x105/0x190 [<ffffffff811c2b30>] d_kill+0xe0/0x120 [<ffffffff811c2c57>] dput+0xe7/0x1e0 [<ffffffff811acc3d>] __fput+0x19d/0x2d0 [<ffffffff811acd7e>] ____fput+0xe/0x10 [<ffffffff81070645>] task_work_run+0xb5/0xe0 [<ffffffff81002941>] do_notify_resume+0x71/0xb0 [<ffffffff8175f14a>] int_signal+0x12/0x17 Reported-and-Tested-by: Chris Fries <C.Fries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:00 +09:00
Lukas Czerner	d47992f86b	mm: change invalidatepage prototype to accept length Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>	2013-05-21 23:17:23 -04:00
Linus Torvalds	942d33da99	f2fs updates for v3.10 This patch-set includes the following major enhancement patches. o introduce a new gloabl lock scheme o add tracepoints on several major functions o fix the overall cleaning process focused on victim selection o apply the block plugging to merge IOs as much as possible o enhance management of free nids and its list o enhance the readahead mode for node pages o address several cretical deadlock conditions o reduce lock_page calls The other minor bug fixes and enhancements are as follows. o calculation mistakes: overflow o bio types: READ, READA, and READ_SYNC o fix the recovery flow, data races, and null pointer errors -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ AATl8b+cW/aTZ4l7WOPU =Hqii -----END PGP SIGNATURE----- Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce a new gloabl lock scheme - add tracepoints on several major functions - fix the overall cleaning process focused on victim selection - apply the block plugging to merge IOs as much as possible - enhance management of free nids and its list - enhance the readahead mode for node pages - address several cretical deadlock conditions - reduce lock_page calls The other minor bug fixes and enhancements are as follows. - calculation mistakes: overflow - bio types: READ, READA, and READ_SYNC - fix the recovery flow, data races, and null pointer errors" * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: cover free_nid management with spin_lock f2fs: optimize scan_nat_page() f2fs: code cleanup for scan_nat_page() and build_free_nids() f2fs: bugfix for alloc_nid_failed() f2fs: recover when journal contains deleted files f2fs: continue to mount after failing recovery f2fs: avoid deadlock during evict after f2fs_gc f2fs: modify the number of issued pages to merge IOs f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry. f2fs: fix inconsistent using of NM_WOUT_THRESHOLD f2fs: check truncation of mapping after lock_page f2fs: enhance alloc_nid and build_free_nids flows f2fs: add a tracepoint on f2fs_new_inode f2fs: check nid == 0 in add_free_nid f2fs: add REQ_META about metadata requests for submit f2fs: give a chance to merge IOs by IO scheduler f2fs: avoid frequent background GC f2fs: add tracepoints to debug checkpoint request f2fs: add tracepoints for write page operations f2fs: add tracepoints to debug the block allocation ...	2013-05-08 15:11:48 -07:00
Jaegeuk Kim	59bbd474ab	f2fs: cover free_nid management with spin_lock After build_free_nids() searches free nid candidates from nat pages and current journal blocks, it checks all the candidates if they are allocated so that the nat cache has its nid with an allocated block address. In this procedure, previously we used list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list). But, this is not covered by free_nid_list_lock, resulting in null pointer bug. This patch moves this checking routine inside add_free_nid() in order not to use the spin_lock. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:22 +09:00
Haicheng Li	23d3884427	f2fs: optimize scan_nat_page() When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> [Jaegeuk Kim: fix handling the return value of add_free_nid()] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:22 +09:00
Haicheng Li	8760952d92	f2fs: code cleanup for scan_nat_page() and build_free_nids() This patch does two cleanups: 1. remove unused variable "fcnt" in build_free_nids(). 2. make scan_nat_page() as void type and remove useless variable "fcnt". Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:21 +09:00
Haicheng Li	95630cbadc	f2fs: bugfix for alloc_nid_failed() Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS Since there is NOT nmi->free_nid_list_lock spinlock protection between a sequential calling of alloc_nid() and alloc_nid_failed(), some other threads may already add new free_nid to the free_nid_list during this period. We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> [Jaegeuk Kim: fit the coding style] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:20 +09:00
Chris Fries	047184b42b	f2fs: recover when journal contains deleted files When recovering a journal file with fsync data for files that have been deleted, don't bail out on recovery. Signed-off-by: Chris Fries <C.Fries@motorola.com> Reviewed-by: Russell Knize <rknize2@motorola.com> Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com> [Jaegeuk Kim: fit the coding style] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:20 +09:00
Chris Fries	bde582b225	f2fs: continue to mount after failing recovery When unable to roll forward the journal, we shouldn't bail out and not mount, we should continue to attempt the mount. Bad recovery data is likely unrecoverable at this point, and requiring the user to try to mount again doesn't solve any issues. Signed-off-by: Chris Fries <C.Fries@motorola.com> Reviewed-by: Russell Knize <rknize2@motorola.com> Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:19 +09:00
Jaegeuk Kim	531ad7d58c	f2fs: avoid deadlock during evict after f2fs_gc o Deadlock case #1 Thread 1: - writeback_sb_inodes - do_writepages - f2fs_write_data_pages - write_cache_pages - f2fs_write_data_page - f2fs_balance_fs - wait mutex_lock(gc_mutex) Thread 2: - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - f2fs_iget - wait iget_locked(inode->i_lock) Thread 3: - do_unlinkat - iput - lock(inode->i_lock) - evict - inode_wait_for_writeback o Deadlock case #2 Thread 1: - __writeback_single_inode : set I_SYNC - do_writepages - f2fs_write_data_page - f2fs_balance_fs - f2fs_gc - iput - evict - inode_wait_for_writeback(I_SYNC) In order to avoid this, even though iput is called with the zero-reference count, we need to stop the eviction procedure if the inode is on writeback. So this patch links f2fs_drop_inode which checks the I_SYNC flag. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-08 19:54:08 +09:00
Kent Overstreet	a27bb332c0	aio: don't include aio.h in sched.h Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-07 20:16:25 -07:00

1 2 3 4 5 ...

321 Commits