In this round, we've added two small interfaces, 1) GC_URGENT_LOW mode for
performance, and 2) F2FS_IOC_SEC_TRIM_FILE ioctl for security. The new GC
mode allows Android to run some lower priority GCs in background, while new
ioctl discards user information without race condition when the account is
removed. In addition, some patches were merged to address latency-related
issues. We've fixed some compression-related bug fixes as well as edge race
conditions.
Enhancement:
- add GC_URGENT_LOW mode in gc_urgent
- introduce F2FS_IOC_SEC_TRIM_FILE ioctl
- bypass racy readahead to improve read latencies
- shrink node_write lock coverage to avoid long latency
Bug fix:
- fix missing compression flag control, i_size, and mount option
- fix deadlock between quota writes and checkpoint
- remove inode eviction path in synchronous path to avoid deadlock
- fix to wait GCed compressed page writeback
- fix a kernel panic in f2fs_is_compressed_page
- check page dirty status before writeback
- wait page writeback before update in node page write flow
- fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry
We've added some minor sanity checks and refactored trivial code blocks for
better readability and debugging information.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl8xhqMACgkQQBSofoJI
UNLYgA//WMoOqBACDuOWwYmgQ8oq4vrH2LOwssZF9/77vEfaHKc+TSq1il54lcUl
MPEx7FK54CnfT8VLLR5ByobZFyH9FFeAw4FN4LBhcfE8jh8ysAGjeoZjwfcmJF6R
cVKtn8ltUpgH3IEUuPjTiKkVNHfVJxuuL3zHbg1CEl+AkR6NJ/U9kNLwf7ZgPWq2
I0qwyXRlUIEChhyPZB+Y6RsdGjkeievKld56DMCgG73f4yHRO/yBcrfsN875sGdM
ALL+mYiunMT6aXcfoiQiAjeImoNajuflI6Zso2Sk8Vl6sBj0QwAuEBF5x1Z5e1mt
YVYNuC4ucqsDBKpOqtsPP0MFTC2T5Rr9wWXjqv+9TjN7zvJ8xx+zDWtQxvI2bpqB
4ZRxaJP45aThLYh/SEYDmj+ppyPtfLDeG0HzUkwMmuopf9eg+kxGPjBsZewgkCKg
kmMKU0P7deGlkrWLUcz2vm0Lso+ieGm0IeLOQl9/rOLu3IQQFia0Vla7dLDgqF0P
sz+udIiBztC3JPEmEZhfayA6P6e1TyWQUdquL08jp+DZD17gPqcaZDhHr62U5rmK
7zoiZqkR03SbNaFhBhhoVOaAVcAnF0pSIgqzkCa3dVXxp1QV+JfD9CGR9NFyiIqB
HK5RPFskIUCg0K2LSaAKbyoFWa/fJ8ZD8/CbFWcnXfWzoaaSkmc=
=PjaF
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added two small interfaces: (a) GC_URGENT_LOW
mode for performance and (b) F2FS_IOC_SEC_TRIM_FILE ioctl for
security.
The new GC mode allows Android to run some lower priority GCs in
background, while new ioctl discards user information without race
condition when the account is removed.
In addition, some patches were merged to address latency-related
issues. We've fixed some compression-related bug fixes as well as edge
race conditions.
Enhancements:
- add GC_URGENT_LOW mode in gc_urgent
- introduce F2FS_IOC_SEC_TRIM_FILE ioctl
- bypass racy readahead to improve read latencies
- shrink node_write lock coverage to avoid long latency
Bug fixes:
- fix missing compression flag control, i_size, and mount option
- fix deadlock between quota writes and checkpoint
- remove inode eviction path in synchronous path to avoid deadlock
- fix to wait GCed compressed page writeback
- fix a kernel panic in f2fs_is_compressed_page
- check page dirty status before writeback
- wait page writeback before update in node page write flow
- fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry
We've added some minor sanity checks and refactored trivial code
blocks for better readability and debugging information"
* tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
f2fs: prepare a waiter before entering io_schedule
f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive
f2fs: replace test_and_set/clear_bit() with set/clear_bit()
f2fs: make file immutable even if releasing zero compression block
f2fs: compress: disable compression mount option if compression is off
f2fs: compress: add sanity check during compressed cluster read
f2fs: use macro instead of f2fs verity version
f2fs: fix deadlock between quota writes and checkpoint
f2fs: correct comment of f2fs_exist_written_data
f2fs: compress: delay temp page allocation
f2fs: compress: fix to update isize when overwriting compressed file
f2fs: space related cleanup
f2fs: fix use-after-free issue
f2fs: Change the type of f2fs_flush_inline_data() to void
f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl
f2fs: should avoid inode eviction in synchronous path
f2fs: segment.h: delete a duplicated word
f2fs: compress: fix to avoid memory leak on cc->cpages
f2fs: use generic names for generic ioctls
f2fs: don't keep meta inode pages used for compressed block migration
...
Current judgment condition of f2fs_bug_on in function update_sit_entry():
new_vblocks >> (sizeof(unsigned short) << 3) ||
new_vblocks > sbi->blocks_per_seg
which equivalents to:
new_vblocks < 0 || new_vblocks > sbi->blocks_per_seg
The latter is more intuitive.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reported-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since set/clear_inode_flag() don't need to return value to show
if flag is set, we can just call set/clear_bit() here.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When we use F2FS_IOC_RELEASE_COMPRESS_BLOCKS ioctl, if we can't find
any compressed blocks in the file even with large file size, the
ioctl just ends up without changing the file's status as immutable.
It makes the user, who expects that the file is immutable when it
returns successfully, confused.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If CONFIG_F2FS_FS_COMPRESSION is off, don't allow to configure or
show compression related mount option.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_read_multi_pages(), we don't have to check cluster's type
again, since overwrite or partial truncation need page lock in
cluster which has already been held by reader, so cluster's type
is stable, let's change check condition to sanity check.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Because fsverity_descriptor_location.version is constant,
so use macro for better reading.
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Function parameter mode could be TRANS_DIR_INO.
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, we allocate temp pages which is used to pad hole in
cluster during read IO submission, it may take long time before
releasing them in f2fs_decompress_pages(), since they are only
used as temp output buffer in decompression context, so let's
just do the allocation in that context to reduce time of memory
pool resource occupation.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We missed to update isize of compressed file in write_end() with
below case:
cluster size is 16KB
- write 14KB data from offset 0
- overwrite 16KB data from offset 0
Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Just for code style, no logic change
1. delete useless space
2. change spaces into tab
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During umount, f2fs_put_super() unregisters procfs entries after
f2fs_destroy_segment_manager(), it may cause use-after-free
issue when umount races with procfs accessing, fix it by relocating
f2fs_unregister_sysfs().
[Chao Yu: change commit title/message a bit]
Signed-off-by: Li Guifu <bluce.liguifu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The return value of f2fs_flush_inline_data() is not used,
so delete it.
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added a new ioctl to send discard commands or/and zero out
to selected data area of a regular file for security reason.
The way of handling range.len of F2FS_IOC_SEC_TRIM_FILE:
1. Added -1 value support for range.len to secure trim the whole blocks
starting from range.start regardless of i_size.
2. If the end of the range passes over the end of file, it means until
the end of file (i_size).
3. ignored the case of that range.len is zero to prevent the function
from making end_addr zero and triggering different behaviour of
the function.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=208565
PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init"
#0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
#1 [<c0b423c8>] (schedule) from [<c0b459d4>]
#2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
#3 [<c0b44fa0>] (down_read) from [<c044233c>]
#4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
#5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
#6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
#7 [<c030be18>] (evict) from [<c030a558>]
#8 [<c030a558>] (iput) from [<c047c600>]
#9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
#10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
#11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
#12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
#13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
#14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
#15 [<c0324294>] (do_fsync) from [<c0324014>]
#16 [<c0324014>] (sys_fsync) from [<c0108bc0>]
This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
iput() requires f2fs_lock_op() again resulting in livelock.
Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Drop the repeated word "the" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Memory allocated for storing compressed pages' poitner should be
released after f2fs_write_compressed_pages(), otherwise it will
cause memory leak issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Don't define F2FS_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name. These aliases are unnecessary, and they make it unclear
which ioctls are f2fs-specific and which are generic.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is an effort to eliminate the uninitialized_var() macro[1].
The use of this macro is the wrong solution because it forces off ANY
analysis by the compiler for a given variable. It even masks "unused
variable" warnings.
Quoted from Linus[2]:
"It's a horrible thing to use, in that it adds extra cruft to the
source code, and then shuts up a compiler warning (even the _reliable_
warnings from gcc)."
Fix it by remove this variable since it is not needed at all.
[1] https://github.com/KSPP/linux/issues/81
[2] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200615085132.166470-1-yanaijie@huawei.com
Signed-off-by: Kees Cook <keescook@chromium.org>
meta inode's pages are used for encrypted, verity and compressed blocks,
so the meta inode's cache invalidation condition in do_checkpoint() should
consider compression as well, not just for verity and encryption, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wire up f2fs to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:
- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.
- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.
- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.
- Not doing filesystem-layer crypto on inline-encrypted files.
This patch includes a fix for a race during IPU by
Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Satya Tangirala <satyat@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
- don't panic kernel if f2fs_get_node_page() fails in
f2fs_recover_inline_data() or f2fs_recover_inline_xattr();
- return error number of f2fs_truncate_blocks() to
f2fs_recover_inline_data()'s caller;
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added a new gc_urgent mode, GC_URGENT_LOW, in which mode
F2FS will lower the bar of checking idle in order to
process outstanding discard commands and GC a little bit
aggressively.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If two readahead threads having same offset enter in readpages, every read
IOs are split and issued to the disk which giving lower bandwidth.
This patch tries to avoid redundant readahead calls.
Fixes one build error reported by Randy.
Fix build error when F2FS_FS_COMPRESSION is not set/enabled.
This label is needed in either case.
../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’:
../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined
goto next_page;
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs_grab_cache_page() fails, it needs to return -ENOMEM.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The parameter op_flag is not used in f2fs_get_read_data_page(),
but it is used in f2fs_grab_read_bio(). Obviously, op_flag is
not passed to f2fs_grab_read_bio() successfully. We need to add
parameter in f2fs_submit_page_read() to pass it.
The case:
- gc_data_segment
- f2fs_get_read_data_page(.., op_flag = REQ_RAHEAD,..)
- f2fs_submit_page_read
- f2fs_grab_read_bio(.., op_flag = 0, ..)
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
to two independent functions:
- f2fs_allocate_new_segment() for specified type segment allocation
- f2fs_allocate_new_segments() for all data type segments allocation
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
if get_node_path() return -E2BIG and trace of
f2fs_truncate_inode_blocks_enter/exit enabled
then the matching-pair of trace_exit will lost
in log.
Signed-off-by: Yubo Feng <fengyubo3@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the f2fs_unlink we do not add trace end for some
error paths, just add.
Signed-off-by: Lihong Kou <koulihong@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_write_raw_pages(), we need to check page dirty status before
writeback, because there could be a racer (e.g. reclaimer) helps
writebacking the dirty page.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The parameter compr is unused in the f2fs_cluster_blocks function
so we no longer need to pass it as a parameter.
Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
to show f2fs_fiemap()'s result as below:
f2fs_fiemap: dev = (251,0), ino = 7, lblock:0, pblock:1625292800, len:2097152, flags:0, ret:0
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
to show f2fs_bmap()'s result as below:
f2fs_bmap: dev = (251,0), ino = 7, lblock:0, pblock:396800
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If compression is disable, we should return zero rather than -EOPNOTSUPP
to indicate f2fs_bmap() is not supported.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In current version, @state will only be FREE_NID. This parameter
has no real effect so remove it to keep clean.
Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
stakable/stackable
Signed-off-by: Liu Song <fishland@aliyun.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Filesystem including f2fs should support stable page for special
device like software raid, however there is one missing path that
page could be updated while it is writeback state as below, fix
this.
- gc_node_segment
- f2fs_move_node_page
- __write_node_page
- set_page_writeback
- do_read_inode
- f2fs_init_extent_tree
- __f2fs_init_extent_tree
i_ext->len = 0;
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When f2fs_ioc_gc_range performs multiple segments gc ops, the return
value of f2fs_ioc_gc_range is determined by the last segment gc ops.
If its ops failed, the f2fs_ioc_gc_range will be considered to be failed
despite some of previous segments gc ops succeeded. Therefore, so we
fix: Redefine the return value of getting victim ops and add exception
handle for f2fs_gc. In particular, 1).if target has no valid block, it
will go on. 2).if target sectoion has valid block(s), but it is current
section, we will reminder the caller.
Signed-off-by: Qilong Zhang <zhangqilong3@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use validation of @fio to inidcate whether caller want to serialize IOs
in io.io_list or not, then @add_list will be redundant, remove it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- to avoid race between checkpoint and quota file writeback, it
just needs to hold read lock of node_write in writeback path.
- node_write lock has covered all LFS data write paths, it's not
necessary, we only need to hold node_write lock at write path of
quota file.
This refactors commit ca7f76e680 ("f2fs: fix wrong discard space").
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use kfree() instead of kvfree() to free variables allocated
by match_strdup(). Because the memory is allocated with kmalloc
inside match_strdup().
Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Assume each section has 4 segment:
.___________________________.
|_Segment0_|_..._|_Segment3_|
. .
. .
.__________.
|_section0_|
Segment 0~2 has 0 valid block, segment 3 has 512 valid blocks.
It will fail if we want to gc section0 in this scenes,
because all 4 segments in section0 is not dirty.
So we should use dirty section bitmap instead of dirty segment bitmap
to get right victim section.
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>