linux

Commit Graph

Author	SHA1	Message	Date
Jaegeuk Kim	4e6a8d9b22	f2fs: relax async discard commands more This patch relaxes async discard commands to avoid waiting its end_io during checkpoint. Instead of waiting them during checkpoint, it will be done when actually reusing them. Test on initial partition of nvme drive. # time fstrim /mnt/test Before : 6.158s After : 4.822s Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	26a28a0c1e	f2fs: show the max number of atomic operations This patch adds to show the max number of atomic operations which are conducting concurrently. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	0a595ebaaa	f2fs: support IO alignment for DATA and NODE writes This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	9d52a504db	f2fs: reassign new segment for mode=lfs Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	650d3c4e56	f2fs: fix a missing discard prefree segments If userspace issue a fstrim with a range not involve prefree segments, it will reuse these segments without discard. This patch fix it. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Damien Le Moal	f99e86485c	block: Rename blk_queue_zone_size and bdev_zone_size All block device data fields and functions returning a number of 512B sectors are by convention named xxx_sectors while names in the form xxx_size are generally used for a number of bytes. The blk_queue_zone_size and bdev_zone_size functions were not following this convention so rename them. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Collapsed the two patches, they were nonsensically split and broke bisection. Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-12 07:58:32 -07:00
Linus Torvalds	09cb6464fe	for-f2fs-4.10 This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX 3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0 wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX VysJUcNMRKxMUTWOKsxp =Boum -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...	2016-12-14 09:07:36 -08:00
Jaegeuk Kim	5eba8c5d1f	f2fs: fix to access nullified flush_cmd_control pointer f2fs_sync_file() remount_ro - f2fs_readonly - destroy_flush_cmd_control - f2fs_issue_flush - no fcc pointer! So, this patch doesn't free fcc in this case, but just stop its kernel thread which sends flush commands. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-07 18:56:50 -08:00
Jaegeuk Kim	f455c8a5f0	f2fs: call sync_fs when f2fs is idle The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-05 11:44:07 -08:00
Arnd Bergmann	19c526515f	f2fs: fix 32-bit build The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED on 32-bit machines because of a 64-bit division: fs/f2fs/f2fs.o: In function `__issue_discard_async': extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod' Fortunately, bdev_zone_size() is guaranteed to return a power-of-two number, so we can replace the % operator with a cheaper bit mask. Fixes: 792b84b74b54 ("f2fs: support multiple devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:09 -08:00
Geliang Tang	b4ceec2921	f2fs: drop duplicate header timer.h Drop duplicate header timer.h from segment.c. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:06 -08:00
Chao Yu	36951b38d1	f2fs: don't wait writeback for datas during checkpoint Normally, while committing checkpoint, we will wait on all pages to be writebacked no matter the page is data or metadata, so in scenario where there are lots of data IO being submitted with metadata, we may suffer long latency for waiting writeback during checkpoint. Indeed, we only care about persistence for pages with metadata, but not pages with data, as file system consistent are only related to metadate, so in order to avoid encountering long latency in above scenario, let's recognize and reference metadata in submitted IOs, wait writeback only for metadatas. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:59 -08:00
Jaegeuk Kim	c79b7ff1d3	f2fs: fix wrong written_valid_blocks counting Previously, written_valid_blocks was got by ckpt->valid_block_count. But if the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value. Fix it to get the number from actual written block count from sit entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:58 -08:00
Jaegeuk Kim	7702bdbe50	f2fs: avoid BG_GC in f2fs_balance_fs If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same time, all the threads would do FG_GC or BG_GC. In this critical path, we totally don't need to do BG_GC at all. Let's avoid that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:57 -08:00
Jaegeuk Kim	3c62be17d4	f2fs: support multiple devices This patch implements multiple devices support for f2fs. Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big volume under one f2fs instance. Internal block management is very simple, but we will modify block allocation and background GC policy to boost IO speed by exploiting them accoording to each device speed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:13 -08:00
Jaegeuk Kim	6ae1be13e8	f2fs: revert segment allocation for direct IO Now we don't need to be too much careful about storage alignment for dio, since its speed becomes quite fast and we'd better avoid any misalignment first. Revert: `38aa0889b2` (f2fs: align direct_io'ed data to section) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:02 -08:00
Jaegeuk Kim	bdb7d964c4	f2fs: assign segments correctly for direct_io Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or four logs, we do not use that type at all. Let's fix it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:28 -08:00
Damien Le Moal	126606c7a9	f2fs: Trace reset zone events Similarly to the regular discard, trace zone reset events. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:24 -08:00
Damien Le Moal	f46e8809e8	f2fs: Reset sequential zones on zoned block devices When a zoned block device is mounted, discarding sections contained in sequential zones must reset the zone write pointer. For sections contained in conventional zones, the regular discard is used if the drive supports it. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:23 -08:00
Jaegeuk Kim	664ba972df	f2fs: use BIO_MAX_PAGES for bio allocation We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:07 -08:00
Chao Yu	3a2ad5672b	f2fs: don't interrupt free nids building during nid allocation Let build_free_nids support sync/async methods, in allocation flow of nids, we use synchronuous method, so that we can avoid looping in alloc_nid when free memory is low; in unblock_operations and f2fs_balance_fs_bg we use asynchronuous method in where low memory condition can interrupt us. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:02 -08:00
Eric Biggers	0c0b471e43	f2fs: fix sparse warnings f2fs contained a number of endianness conversion bugs. Also, one function should have been 'static'. Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/' Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:57 -08:00
Chao Yu	933439c8f3	f2fs: give a chance to detach from dirty list If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:51 -08:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Chao Yu	0f34802858	f2fs: support checkpoint error injection This patch adds to support checkpoint error injection in f2fs for testing fatal error tolerance, it will be useful that it can simulate abnormal power off by f2fs itself instead of calling godown ioctl by running apps. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:35 -07:00
Yunlei He	9c094040c5	f2fs: remove redundant value definition This patch remove redundant value definition in build_sit_entries Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:32 -07:00
Chao Yu	aaec2b1d18	f2fs: introduce cp_lock to protect updating of ckpt_flags This patch introduces spinlock to protect updating process of ckpt_flags field in struct f2fs_checkpoint, it avoids incorrectly updating in race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 17:34:20 -07:00
Jaegeuk Kim	a468f0ef51	f2fs: use crc and cp version to determine roll-forward recovery Previously, we used cp_version only to detect recoverable dnodes. In order to avoid same garbage cp_version, we needed to truncate the next dnode during checkpoint, resulting in additional discard or data write. If we can distinguish this by using crc in addition to cp_version, we can remove this overhead. There is backward compatibility concern where it changes node_footer layout. So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to detect new layout. New layout will be activated only when this flag is set. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-30 10:05:46 -07:00
Yunlei He	5d4c0af41f	f2fs: preallocate blocks for encrypted file This patch allow preallocates data blocks for buffered aio writes in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix to avoid BUG_ON] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-22 11:43:08 -07:00
Jaegeuk Kim	7f3037a5ec	f2fs: check free_sections for defragmentation Fix wrong condition check for defragmentation of a file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-12 10:30:41 -07:00
Yunlei He	ed214a1183	f2fs: forbid to do fstrim if fs has some error This patch skip fstrim if sbi set SBI_NEED_FSCK flag Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-12 10:30:40 -07:00
Chao Yu	6ab2a3085e	f2fs: fix minor typo Correct typo from 'destory' to 'destroy'. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-07 18:53:48 -07:00
Chao Yu	275b66b09e	f2fs: support async discard Like most filesystems, f2fs will issue discard command synchronously, so when user trigger fstrim through ioctl, multiple discard commands will be issued serially with sync mode, which makes poor performance. In this patch we try to support async discard, so that all discard commands can be issued and be waited for endio in batch to improve performance. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-07 17:27:38 -07:00
Chao Yu	74fa5f3d43	f2fs: schedule in between two continous batch discards In batch discard approach of fstrim will grab/release gc_mutex lock repeatly, it makes contention of the lock becoming more intensive. So after one batch discards were issued in checkpoint and the lock was released, it's better to do schedule() to increase opportunity of grabbing gc_mutex lock for other competitors. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-07 17:27:33 -07:00
Chao Yu	e932835377	f2fs: check return value of write_checkpoint during fstrim During fstrim, if one of multiple write_checkpoint failed, break off and return error number to caller. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-08-29 18:31:11 -07:00
Chao Yu	d600af236d	f2fs: avoid unneeded loop in build_sit_entries When building each sit entry in cache, firstly, we will load it from sit page, and then check all entries in sit journal, if there is one updated entry in journal, cover cached entry with the journaled one. Actually, most of check operation is unneeded since we only need to update cached entries with journaled entries in batch, so changing the flow as below for more efficient: 1. load all sit entries into cache from sit pages; 2. update sit entries with journal. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-08-29 18:31:09 -07:00
Jaegeuk Kim	3e025740b9	f2fs: do not use discard_map for hard disks We don't need to keep discard_map, if disk does not support discard command. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-08-24 10:16:14 -07:00
Linus Torvalds	4fc29c1aa3	The major change in this version is mitigating cpu overheads on write paths by replacing redundant inode page updates with mark_inode_dirty calls. And we tried to reduce lock contentions as well to improve filesystem scalability. Other feature is setting F2FS automatically when detecting host-managed SMR. = Enhancement = - ioctl to move a range of data between files - inject orphan inode errors - avoid flush commands congestion - support lazytime = Bug fixes = - return proper results for some dentry operations - fix deadlock in add_link failure - disable extent_cache for fcollapse/finsert -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXmDJFAAoJEEAUqH6CSFDSJeYP/0ru8+5/ui5VTCdNPQB9KxYD DIUaDGpeoLvmn3ZdrMEdyNr6kWbgjCE9JjOGPQ7l1/apErOGVPyaBwflKcCDwloU pAlEqVM1Q9j4qH4i9SWTlvPtsHBHB7G7YSe3vDB9fJGSTqumubIlnaBm+Wfjx31U p53WcPn9LpOyzfmvZf2tOHmvZ7bWLkE/a07x9kPC6XHUFb9C17jLRFFGeuhZQHv1 Yo7HgokBnPExa8TnEILYyX/x+eecFS/1Cp/cN0STsebSu8pStTHTcAP7qEpKQB88 Cc51Lf+d5gFeydxKDFxwdH3VWOGIr9Ppako+lHW83gJcHP0zw8zdxULab+HJMa4n MOByRRiafwu1sL0dl7TCfsYNIHdEnXhWbhcRhMVZbb5C2Q6+Htuac8ZrKSOWExNN DUqRkzeTib9u+cHxUTFFPgOGdUjDLmg3XHU7mvb+2hViluVjIImC4tqD5XPpv7vt WnaDJxLCGD/6DF2yhiVY9NysuxInLTNFFCF06LworZ4L24hlg5TvN0UeUNRO9954 ux6f+lSORCzV3TmrsHP5vwjSAW26FviPXV1q1HHJeTpWKMlhsZtHmOAJOtZKKmxP WFnHT0aiWF+sQf4qfxVQL+lLqtgRKJAI9zqGRyfDJWJp5aXdRuVsZs9pWNQF7lCo 5gVnCYk3ULjXG3b23j2S =tKTR -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major change in this version is mitigating cpu overheads on write paths by replacing redundant inode page updates with mark_inode_dirty calls. And we tried to reduce lock contentions as well to improve filesystem scalability. Other feature is setting F2FS automatically when detecting host-managed SMR. Enhancements: - ioctl to move a range of data between files - inject orphan inode errors - avoid flush commands congestion - support lazytime Bug fixes: - return proper results for some dentry operations - fix deadlock in add_link failure - disable extent_cache for fcollapse/finsert" * tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: clean up coding style and redundancy f2fs: get victim segment again after new cp f2fs: handle error case with f2fs_bug_on f2fs: avoid data race when deciding checkpoin in f2fs_sync_file f2fs: support an ioctl to move a range of data blocks f2fs: fix to report error number of f2fs_find_entry f2fs: avoid memory allocation failure due to a long length f2fs: reset default idle interval value f2fs: use blk_plug in all the possible paths f2fs: fix to avoid data update racing between GC and DIO f2fs: add maximum prefree segments f2fs: disable extent_cache for fcollapse/finsert inodes f2fs: refactor __exchange_data_block for speed up f2fs: fix ERR_PTR returned by bio f2fs: avoid mark_inode_dirty f2fs: move i_size_write in f2fs_write_end f2fs: fix to avoid redundant discard during fstrim f2fs: avoid mismatching block range for discard f2fs: fix incorrect f_bfree calculation in ->statfs f2fs: use percpu_rw_semaphore ...	2016-07-27 10:36:31 -07:00
Jaegeuk Kim	9dfa1baff7	f2fs: use blk_plug in all the possible paths This patch reverts `19a5f5e2ef` (f2fs: drop any block plugging), and adds blk_plug in write paths additionally. The main reason is that blk_start_plug can be used to wake up from low-power mode before submitting further bios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-15 15:21:23 -07:00
Jaegeuk Kim	44a83499dd	f2fs: add maximum prefree segments In 1TB storage, we need to admit 22841 prefree segments, which can consume too much segments. This patch sets 8GB in max. prefree segments in that case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-15 15:21:21 -07:00
Chao Yu	c24a0fd655	f2fs: fix to avoid redundant discard during fstrim With below test steps, f2fs will issue redundant discard when doing fstrim, the reason is that we issue discards for both prefree segments and consecutive freed region user wants to trim, part regions they covered are overlapped, here, we change to do not to issue any discards for prefree segments in trimmed range. 1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs 2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ 3. dd if=/dev/zero of=/mnt/f2fs/a bs=2M count=1 4. dd if=/dev/zero of=/mnt/f2fs/b bs=1M count=1 5. sync 6. rm /mnt/f2fs/a /mnt/f2fs/b 7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/ Before: <...>-5428 [001] ...1 9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200 <...>-5428 [001] ...1 9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 After: <...>-6764 [000] ...1 9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:34 -07:00
Yunlei He	c7b41e1613	f2fs: avoid mismatching block range for discard This patch skip discard block range smaller than trim_minlen, and can not be merged by neighbour Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-08 10:33:33 -07:00
Jaegeuk Kim	ad4edb8314	f2fs: produce more nids and reduce readahead nats The readahead nat pages are more likely to be reclaimed quickly, so it'd better to gather more free nids in advance. And, let's keep some free nids as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-06 10:44:08 -07:00
Jaegeuk Kim	52763a4b7a	f2fs: detect host-managed SMR by feature flag If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-07-06 10:44:07 -07:00
Jaegeuk Kim	36abef4e79	f2fs: introduce mode=lfs mount option This mount option is to enable original log-structured filesystem forcefully. So, there should be no random writes for main area. Especially, this supports host-managed SMR device. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-13 11:55:21 -07:00
Jaegeuk Kim	19a5f5e2ef	f2fs: drop any block plugging In f2fs, we don't need to keep block plugging for NODE and DATA writes, since we already merged bios as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-08 10:25:51 -07:00
Jaegeuk Kim	7dfeaa3220	f2fs: avoid reverse IO order for NODE and DATA There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(), which incur unnecessary reversed bio submission. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-08 10:25:50 -07:00
Mike Christie	04d328defd	f2fs: use bio op accessors Separate the op from the rq_flag_bits and have f2fs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	4e49ea4a3d	block/fs/drivers: remove rw argument from submit_bio This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Jaegeuk Kim	e589c2c477	f2fs: control not to exceed # of cached nat entries This is to avoid cache entry management overhead including radix tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-07 10:18:08 -07:00
Jaegeuk Kim	0a87f664d1	f2fs: detect congestion of flush command issues If flush commands do not incur any congestion, we don't need to throw that to dispatching queue which causes unnecessary latency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:15 -07:00
Jaegeuk Kim	91942321e4	f2fs: use inode pointer for {set, clear}_inode_flag This patch refactors to use inode pointer for set_inode_flag and clear_inode_flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-06-02 18:05:07 -07:00
Chao Yu	c81ced05b2	f2fs: fix to clear page private flag Commit `28bc106b23` ("f2fs: support revoking atomic written pages") forgot to clear page private flag correctly, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-03 11:20:06 -07:00
Jaegeuk Kim	63c52d7878	f2fs: don't invalidate atomic page if successful If we committed atomic write successfully, we don't need to invalidate pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:50:25 -07:00
Jaegeuk Kim	26dc3d4424	f2fs: unset atomic/volatile flag in f2fs_release_file The atomic/volatile operation should be done in pair of start and commit ioctl. For example, if a killed process remains open-ended atomic operation, we should drop its flag as well as its atomic data. Otherwise, if sqlite initiates another operation which doesn't require atomic writes, it will lose every data, since f2fs still treats with them as atomic writes; nobody will trigger its commit. Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Chao Yu	f28b3434af	f2fs: introduce f2fs_update_data_blkaddr for cleanup Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:52:01 -08:00
Chao Yu	4356e48e64	f2fs crypto: fix incorrect positioning for GCing encrypted data page For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:51:58 -08:00
Chao Yu	7a9d75481b	f2fs: trace old block address for CoWed page This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:40:02 -08:00
Chao Yu	b7ad7512b8	f2fs: split journal cache from curseg cache In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:54 -08:00
Chao Yu	e9f5b8b8d6	f2fs: enhance IO path with block plug Try to use block plug in more place as below to let process cache bios as much as possbile, in order to reduce lock overhead of queue in IO scheduler. 1) sync_meta_pages 2) ra_meta_pages 3) f2fs_balance_fs_bg Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:54 -08:00
Chao Yu	dfc08a12e4	f2fs: introduce f2fs_journal struct to wrap journal info Introduce a new structure f2fs_journal to wrap journal info in struct f2fs_summary_block for readability. struct f2fs_journal { union { __le16 n_nats; __le16 n_sits; }; union { struct nat_journal nat_j; struct sit_journal sit_j; struct f2fs_extra_info info; }; } __packed; struct f2fs_summary_block { struct f2fs_summary entries[ENTRIES_IN_SUM]; struct f2fs_journal journal; struct summary_footer footer; } __packed; Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:53 -08:00
Chao Yu	28bc106b23	f2fs: support revoking atomic written pages f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	29b96b547e	f2fs: split drop_inmem_pages from commit_inmem_pages Split drop_inmem_pages from commit_inmem_pages for code readability, and prepare for the following modification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	60b286c442	f2fs: use correct errno This patch is to fix misused error number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0c3a579758	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fec1d6576c	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0ab1435631	f2fs: correct search area in get_new_segment get_new_segment starts from current segment position, tries to search a free segment among its right neighbors locate in same section. But previously our search area was set as [current segment, max segment], which means we have to search to more bits in free_segmap bitmap for some worse cases. So here we correct the search area to [current segment, last segment in section] to avoid unnecessary searching. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	7d768d2c26	f2fs: flush dirty nat entries when exceeding threshold When testing f2fs with xfstest, generic/251 is stuck for long time, the case uses below serials to obtain fresh released space in device, in order to prepare for following fstrim test. 1. rm -rf /mnt/dir 2. mkdir /mnt/dir/ 3. cp -axT `pwd`/ /mnt/dir/ 4. goto 1 During preparing step, all nat entries will be cached in nat cache, most of them are dirty entries with invalid blkaddr, which means nodes related to these entries have been truncated, and they could be reused after the dirty entries been checkpointed. However, there was no checkpoint been triggered, so nid allocators (e.g. mkdir, creat) will run into long journey of iterating all NAT pages, looking for free nids in alloc_nid->build_free_nids. Here, in f2fs_balance_fs_bg we give another chance to do checkpoint to flush nat entries for reusing them in free nid cache when dirty entry count exceeds 10% of max count. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0fd785eb93	f2fs: relocate is_merged_page Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	42190d2a86	f2fs: monitor the number of background checkpoint This patch adds to show the number of background checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:42 -08:00
Jaegeuk Kim	d0239e1bf5	f2fs: detect idle time depending on user behavior This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:37 -08:00
Jaegeuk Kim	6beceb5427	f2fs: introduce time and interval facility This patch adds time and interval arrays to store some timing variables. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:36:27 -08:00
Jaegeuk Kim	2c4db1a6f6	f2fs: clean up f2fs_balance_fs This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:23 -08:00
Chao Yu	c34f42e2cb	f2fs: report error of do_checkpoint do_checkpoint and write_checkpoint can fail due to reasons like triggering in a readonly fs or encountering IO error of storage device. So it's better to report such error info to user, let user be aware of failure of doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:09 -08:00
Chao Yu	36b35a0dbe	f2fs: support data flush in background Previously, when finishing a checkpoint, we have persisted all fs meta info including meta inode, node inode, dentry page of directory inode, so, after a sudden power cut, f2fs can recover from last checkpoint with full directory structure. But during checkpoint, we didn't flush dirty pages of regular and symlink inode, so such dirty datas still in memory will be lost in that moment of power off. In order to reduce the chance of lost data, this patch enables f2fs_balance_fs_bg with the ability of data flushing. It will try to flush user data before starting a checkpoint. So user's data written after last checkpoint which may not be fsynced could be saved. When we mount with data_flush option, after every period of cp_interval (could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds user data could be flushed into device once f2fs_balance_fs_bg was called in kworker thread or gc thread. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-17 09:53:26 -08:00
Jaegeuk Kim	80609448cd	f2fs: enhance the bit operation for SSR This patch enhances the existing bit operation when f2fs allocates SSR blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-09 09:50:32 -08:00
Chao Yu	855639deca	f2fs: clean up code with __has_cursum_space Clean up codes in lookup_journal_in_cursum() with __has_cursum_space(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 12:07:55 -08:00
Chao Yu	f478f43fa0	f2fs: clear page uptodate when dropping cache for atomic write We should clear uptodate flag for all pages atomic written when we drop them, otherwise before these cached pages were reclaimed or invalidated eventually, we will see invalid data when hitting them again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:35 -08:00
Fan Li	692223d132	f2fs: optimize __find_rev_next_bit 1. Skip __reverse_ulong if the bitmap is empty. 2. Reduce branches and codes. According to my test, the performance of this new version is 5% higher on an empty bitmap of 64bytes, and remains about the same in the worst scenario. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:35 -08:00
Chao Yu	7fee740697	f2fs: fix to clear GCed flag for atomic written page Atomic write page can be GCed, after committing this kind of page, we should clear the GCed flag for it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-22 09:37:13 -07:00
Jaegeuk Kim	2b246fb0f6	f2fs: don't need to submit bio on error case If commit_atomic_write is failed, we don't need to submit any bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-21 19:05:53 -07:00
Jaegeuk Kim	f96999c35f	f2fs: refactor __find_rev_next_{zero}_bit This patch refactors __find_rev_next_{zero}_bit which was disabled previously due to bugs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-21 15:26:00 -07:00
Chao Yu	08b39fbd59	f2fs crypto: fix racing of accessing encrypted page among different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-13 09:52:34 -07:00
Chao Yu	26879fb101	f2fs: support lower priority asynchronous readahead in ra_meta_pages Now, we use ra_meta_pages to reads continuous physical blocks as much as possible to improve performance of following reads. However, ra_meta_pages uses a synchronous readahead approach by submitting bio with READ, as READ is with high priority, it can not be used in the case of preloading blocks, and it's not sure when these RAed pages will be used. This patch supports asynchronous readahead in ra_meta_pages by tagging bio with READA flag in order to allow preloading. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:15 -07:00
Chao Yu	2b947003fa	f2fs: don't tag REQ_META for temporary non-meta pages In recovery or checkpoint flow, we grab pages temperarily in meta inode's mapping for caching temperary data, actually, datas in these pages were not meta data of f2fs, but still we tag them with REQ_META flag. However, lower device like eMMC may do some optimization for data of such type. So in order to avoid wrong optimization, we'd better remove such flag for temperary non-meta pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:01:46 -07:00
Jaegeuk Kim	6e2c64ad7c	f2fs: fix SSA updates resulting in corruption The f2fs_collapse_range and f2fs_insert_range changes the block addresses directly. But that can cause uncovered SSA updates. In that case, we need to give up to change the block addresses and do buffered writes to keep filesystem consistency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 13:38:02 -07:00
Jaegeuk Kim	60b99b486b	f2fs: introduce a periodic checkpoint flow This patch introduces a periodic checkpoint feature. Note that, this is not enforcing to conduct checkpoints very strictly in terms of trigger timing, instead just hope to help user experiences. The default value is 60 seconds. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:57 -07:00
Chao Yu	d530d4d8e2	f2fs: support synchronous gc in ioctl This patch drops in batches gc triggered through ioctl, since user can easily control the gc by designing the loop around the ->ioctl. We support synchronous gc by forcing using FG_GC in f2fs_gc, so with it, user can make sure that in this round all blocks gced were persistent in the device until ioctl returned. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:56 -07:00
Jaegeuk Kim	39307a8e24	f2fs: use vmalloc to handle -ENOMEM error This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:55 -07:00
Jaegeuk Kim	80c545055d	f2fs: use __GFP_NOFAIL to avoid infinite loop __GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:37:21 -07:00
Jaegeuk Kim	740432f835	f2fs: handle failed bio allocation As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:09 -07:00
Chao Yu	31696580bf	f2fs: shrink free_nids entries This patch introduces __count_free_nids/try_to_free_nids and registers them in slab shrinker for shrinking under memory pressure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:06 -07:00
Jaegeuk Kim	47e70ca46f	f2fs: do not assign a new segment for dio under space shortage If there is not enough free segment, we should not assign a new segment explicitly. Otherwise, we can run out of free segment. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-14 16:02:13 -07:00
Chao Yu	decd36b6c4	f2fs: remove inmem radix tree Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit `042b7816aa` ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-11 11:31:14 -07:00
Chao Yu	e90c2d2850	f2fs: invalidate temporary meta page To avoid meeting garbage data in next free node block at the end of warm node chain when doing recovery, we will try to zero out that invalid block. If the device is not support discard, our way for zeroing out block is: grabbing a temporary zeroed page in meta inode, then, issue write request with this page. But, we forget to release that temporary page, so our memory usage will increase without gaining any hit ratio benefit, so it's better to free it for saving memory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:19:21 -07:00
Jaegeuk Kim	edb27deea7	f2fs: handle error cases in commit_inmem_pages This patch adds to handle error cases in commit_inmem_pages. If an error occurs, it stops to write the pages and return the error right away. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:15 -07:00
Jaegeuk Kim	554df79e52	f2fs: shrink extent_cache entries This patch registers shrinking extent_caches. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:55 -07:00
Jaegeuk Kim	1b38dc8e74	f2fs: shrink nat_cache entries This patch registers shrinking nat_cache entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:55 -07:00
Jaegeuk Kim	6282adbf93	f2fs: call set_page_dirty to attach i_wb for cgroup The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback is called, __inc_wb_stat() updates i_wb's stat. So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to any writebacking pages. This patch should resolve the following kernel panic reported by Andreas Reis. https://bugzilla.kernel.org/show_bug.cgi?id=101801 --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> --- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 PGD 2951ff067 PUD 2df43f067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 10356 Comm: gcc Tainted: G W 4.2.0-1-cu #1 Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS T01 02/03/2015 task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000 RIP: 0010:[<ffffffff8149deea>] [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP: 0018:ffff880295143ac8 EFLAGS: 00010082 RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001 RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088 RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30 R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088 R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0 FS: 00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40 Call Trace: [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a RIP [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP <ffff880295143ac8> CR2: 00000000000000a8 ---[ end trace 5132449a58ed93a3 ]--- note: gcc[10356] exited with preempt_count 2 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-07-25 08:54:26 -07:00

1 2 3 4 5 ...

318 Commits