linux/fs
Dave Chinner a837eca241 iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
This reverts commit 61c6de6672.

The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:

# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION       -- xfs
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

generic/038 454s ...
 run fstests generic/038 at 2018-12-20 18:43:05
 XFS (sdc): Unmounting Filesystem
 XFS (sdc): Mounting V5 Filesystem
 XFS (sdc): Ending clean mount
 BUG: Bad page state in process kswapd0  pfn:3a7fa
 page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
 flags: 0xfffffc0000000()
 raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
 raw: 0000000000000001 0000000000000000 00000000ffffffff
 page dumped because: non-NULL mapping
 CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
 Call Trace:
  dump_stack+0x67/0x90
  bad_page.cold.116+0x8a/0xbd
  free_pcppages_bulk+0x4bf/0x6a0
  free_unref_page_list+0x10f/0x1f0
  shrink_page_list+0x49d/0xf50
  shrink_inactive_list+0x19d/0x3b0
  shrink_node_memcg.constprop.77+0x398/0x690
  ? shrink_slab.constprop.81+0x278/0x3f0
  shrink_node+0x7a/0x2f0
  kswapd+0x34b/0x6d0
  ? node_reclaim+0x240/0x240
  kthread+0x11f/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x24/0x30
 Disabling lock debugging due to kernel taint
....

The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.

generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.

It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-20 07:22:51 -08:00
..
9p Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
adfs
affs
afs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-30 10:47:50 -08:00
autofs
befs
bfs bfs: add sanity check at bfs_fill_super() 2018-11-03 10:09:38 -07:00
btrfs for-4.20-rc5-tag 2018-12-05 09:58:17 -08:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph ceph: make 'nocopyfrom' a default mount option 2018-12-11 18:22:17 +01:00
cifs CIFS: Avoid returning EBUSY to upper layer VFS 2018-12-07 00:59:23 -06:00
coda
configfs
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto
debugfs
devpts
dlm iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
ecryptfs ecryptfs_rename(): verify that lower dentries are still OK after lock_rename() 2018-10-09 23:33:17 -04:00
efivarfs
efs
exofs fs/exofs: only use true/false for asignment of bool type variable 2018-10-18 02:04:59 -04:00
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 ext2: fix potential use after free 2018-11-27 10:21:15 +01:00
ext4 A large number of ext4 bug fixes, mostly buffer and memory leaks on 2018-11-11 16:53:02 -06:00
f2fs Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
fat fat: truncate inode timestamp updates in setattr 2018-10-31 08:54:14 -07:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS 2018-12-11 21:47:28 +01:00
gfs2 Merge tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 2018-11-16 11:38:14 -06:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: do not free node before using 2018-11-30 14:56:14 -08:00
hostfs
hpfs
hugetlbfs
isofs
jbd2
jffs2 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-10-24 11:22:39 +01:00
jfs
kernfs mm: zero-seek shrinkers 2018-10-26 16:26:33 -07:00
lockd lockd: fix access beyond unterminated strings in prints 2018-10-29 16:58:04 -04:00
minix
nfs nfs: don't dirty kernel pages read by direct-io 2018-12-02 09:43:56 -05:00
nfs_common
nfsd nfsd: COPY and CLONE operations require the saved filehandle to be set 2018-11-08 12:11:45 -05:00
nilfs2 nilfs2: Use xa_erase_irq 2018-11-05 14:57:05 -05:00
nls
notify fanotify: fix handling of events on child sub-directory 2018-11-08 15:43:48 +01:00
ntfs ntfs: don't open-code ERR_CAST 2018-10-12 22:46:50 -04:00
ocfs2 ocfs2: fix potential use after free 2018-11-30 14:56:15 -08:00
omfs
openpromfs
orangefs Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
overlayfs Revert "ovl: relax permission checking on underlying layers" 2018-12-04 11:31:30 +01:00
proc New gcc plugin: stackleak 2018-11-01 11:46:27 -07:00
pstore pstore fix: 2018-11-30 09:03:15 -08:00
qnx4
qnx6
quota
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs
squashfs
sysfs
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs
ubifs ubifs: Remove unneeded semicolon 2018-10-23 13:49:02 +02:00
udf udf: Allow mounting volumes with incorrect identification strings 2018-11-19 10:27:59 +01:00
ufs
xfs xfs: fix inverted return from xfs_btree_sblock_verify_crc 2018-12-04 08:50:49 -08:00
Kconfig
Kconfig.binfmt
Makefile
aio.c for-linus-20181214 2018-12-14 12:18:30 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c iov_iter: Use accessor function 2018-10-24 00:40:44 +01:00
buffer.c for-linus-20181102 2018-11-02 11:25:48 -07:00
char_dev.c
compat.c
compat_binfmt_elf.c
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
coredump.c
d_path.c
dax.c dax: Fix unlock mismatch with updated API 2018-12-04 21:32:00 -08:00
dcache.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
dcookies.c
direct-io.c fs: fix lost error code in dio_complete 2018-11-30 08:35:14 -07:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c Revert "exec: make de_thread() freezable" 2018-12-04 16:04:20 +01:00
fcntl.c
fhandle.c
file.c
file_table.c
filesystems.c
fs-writeback.c fs: Convert writeback to XArray 2018-10-21 10:46:42 -04:00
fs_pin.c
fs_struct.c
inode.c mm: don't reclaim inodes with many attached pages 2018-11-18 10:15:09 -08:00
internal.h
ioctl.c vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
iomap.c iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" 2018-12-20 07:22:51 -08:00
libfs.c
locks.c
mbcache.c
mount.h
mpage.c
namei.c
namespace.c mnt: fix __detach_mounts infinite loop 2018-11-12 01:02:34 -06:00
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: allow some remap flags to be passed to vfs_clone_file_range 2018-12-04 08:50:49 -08:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c splice: don't read more than available pipe space 2018-12-04 08:50:49 -08:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered 2018-12-14 15:05:45 -08:00
utimes.c
xattr.c