linux/fs
Theodore Ts'o f4a01017d6 ext4: Fix potential reclaim deadlock when truncating partial block
The ext4_block_truncate_page() function previously called
grab_cache_page(), which called find_or_create_page() with the
__GFP_FS flag potentially set.  This could cause a deadlock if the
system is low on memory and it attempts a memory reclaim, which could
potentially call back into ext4.  So we need to call
find_or_create_page() directly, and remove the __GFP_FP flag to avoid
this potential deadlock.

Thanks to Roland Dreier for reporting a lockdep warning which showed
this problem.

[20786.363249] =================================
[20786.363257] [ INFO: inconsistent lock state ]
[20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9
[20786.363270] ---------------------------------
[20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes:
[20786.363291]  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
[20786.363314] {IN-RECLAIM_FS-W} state was registered at:
[20786.363320]   [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0
[20786.363334]   [<ffffffff8108d347>] __lock_acquire+0x287/0x430
[20786.363345]   [<ffffffff8108d595>] lock_acquire+0xa5/0x150
[20786.363355]   [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150
[20786.363365]   [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90
[20786.363377]   [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0
[20786.363389]   [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0
[20786.363401]   [<ffffffff81147095>] generic_drop_inode+0x25/0x30
[20786.363411]   [<ffffffff81145ce2>] iput+0x62/0x70
[20786.363420]   [<ffffffff81142878>] dentry_iput+0x98/0x110
[20786.363429]   [<ffffffff81142a00>] d_kill+0x50/0x80
[20786.363438]   [<ffffffff811444c5>] dput+0x95/0x180
[20786.363447]   [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70
[20786.363459]   [<ffffffff81142978>] d_free+0x28/0x60
[20786.363468]   [<ffffffff81142a18>] d_kill+0x68/0x80
[20786.363477]   [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0
[20786.363487]   [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290
[20786.363497]   [<ffffffff81142e89>] prune_dcache+0x109/0x1b0
[20786.363506]   [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50
[20786.363516]   [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190
[20786.363527]   [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640
[20786.363537]   [<ffffffff810f9a57>] kswapd+0x117/0x170
[20786.363546]   [<ffffffff810773ce>] kthread+0x9e/0xb0
[20786.363558]   [<ffffffff8101430a>] child_rip+0xa/0x20
[20786.363569]   [<ffffffffffffffff>] 0xffffffffffffffff
[20786.363598] irq event stamp: 15997
[20786.363603] hardirqs last  enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0
[20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0
[20786.363628] softirqs last  enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220
[20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30
[20786.363651] 
[20786.363653] other info that might help us debug this:
[20786.363660] 3 locks held by http/8397:
[20786.363665]  #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90
[20786.363685]  #1:  (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350
[20786.363707]  #2:  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
[20786.363724] 
[20786.363726] stack backtrace:
[20786.363734] Pid: 8397, comm: http Tainted: G         C 2.6.31-2-generic #14~rbd4gitd960eea9
[20786.363741] Call Trace:
[20786.363752]  [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0
[20786.363763]  [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0
[20786.363773]  [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280
[20786.363783]  [<ffffffff8108bd97>] mark_lock+0x137/0x1d0
[20786.363793]  [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0
[20786.363803]  [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0
[20786.363813]  [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180
[20786.363824]  [<ffffffff810e9411>] ? find_get_page+0x91/0xf0
[20786.363835]  [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0
[20786.363845]  [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70
[20786.363856]  [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0
[20786.363867]  [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460
[20786.363876]  [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150
[20786.363885]  [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150
[20786.363895]  [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0
[20786.363905]  [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0
[20786.363916]  [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290
[20786.363926]  [<ffffffff811ccc28>] ext4_truncate+0x498/0x630
[20786.363938]  [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0
[20786.363947]  [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290
[20786.363957]  [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10
[20786.363966]  [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0
[20786.363976]  [<ffffffff81107690>] vmtruncate+0xb0/0x110
[20786.363986]  [<ffffffff81147c05>] inode_setattr+0x35/0x170
[20786.363995]  [<ffffffff811c9906>] ext4_setattr+0x186/0x370
[20786.364005]  [<ffffffff81147eab>] notify_change+0x16b/0x350
[20786.364014]  [<ffffffff8112ed30>] do_truncate+0x70/0x90
[20786.364021]  [<ffffffff8112f48b>] T.657+0xeb/0x110
[20786.364021]  [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10
[20786.364021]  [<ffffffff81013132>] system_call_fastpath+0x16/0x1b

Reported-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-05 22:08:16 -04:00
..
9p 9P doesn't need BKL in ->umount_begin() 2009-06-17 00:36:36 -04:00
adfs Cleanup of adfs headers 2009-06-17 00:36:36 -04:00
affs affs: add ->sync_fs 2009-06-11 21:36:14 -04:00
afs AFS: Fix lock imbalance 2009-06-30 13:30:44 -07:00
autofs switch follow_down() 2009-06-11 21:36:01 -04:00
autofs4 switch follow_down() 2009-06-11 21:36:01 -04:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-06-17 08:46:57 -07:00
bfs bfs: add ->sync_fs 2009-06-11 21:36:14 -04:00
btrfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2009-07-02 16:52:38 -07:00
cachefiles enforce ->sync_fs is only called for rw superblock 2009-06-11 21:36:06 -04:00
cifs cifs: fix fh_mutex locking in cifs_reopen_file 2009-06-27 23:46:43 +00:00
coda splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
configfs configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 10:48:26 -07:00
cramfs fs/cramfs: return f_fsid for statfs(2) 2009-04-02 19:05:08 -07:00
debugfs debugfs: use specified mode to possibly mark files read/write only 2009-06-15 21:30:28 -07:00
devpts devpts: remove module-related code 2009-06-24 08:15:24 -04:00
dlm dlm: use more NOFS allocation 2009-05-15 11:24:59 -05:00
ecryptfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
efs get rid of BKL in fs/efs 2009-06-17 00:36:36 -04:00
exofs [SCSI] Merge branch 'linus' 2009-06-12 10:02:03 -05:00
exportfs Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
ext2 ext2: return -EIO not -ESTALE on directory traversal through deleted inode 2009-06-30 18:56:00 -07:00
ext3 helpers for acl caching + switch to those 2009-06-24 08:17:07 -04:00
ext4 ext4: Fix potential reclaim deadlock when truncating partial block 2009-07-05 22:08:16 -04:00
fat fat: Fix the removal of opts->fs_dmask 2009-06-20 21:50:47 +09:00
freevxfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
fscache FS-Cache: Fixup renamed filenames in comments in internal.h 2009-05-27 10:20:13 -07:00
fuse fuse: invalidation reverse calls 2009-06-30 20:12:24 +02:00
gfs2 block: rename CONFIG_LBD to CONFIG_LBDAF 2009-06-19 08:08:50 +02:00
hfs hfs: add ->sync_fs 2009-06-11 21:36:15 -04:00
hfsplus hfsplus: add ->sync_fs 2009-06-11 21:36:16 -04:00
hostfs hostfs: set maximum filesize in superblock for proper LFS support 2009-06-30 18:56:03 -07:00
hpfs Push BKL down into ->remount_fs() 2009-06-11 21:36:11 -04:00
hppfs hppfs: hppfs_read_file() may return -ERROR 2009-04-02 19:04:53 -07:00
hugetlbfs Merge branch 'master' into next 2009-05-22 18:40:59 +10:00
isofs isofs: cleanup mount option processing 2009-06-18 13:03:45 -07:00
jbd jbd: clean up journal_try_to_free_buffers() 2009-06-18 13:03:45 -07:00
jbd2 jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region 2009-06-20 23:34:44 -04:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2009-07-01 11:25:46 -07:00
jfs another race fix in jfs_check_acl() 2009-06-24 17:02:42 -04:00
lockd Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd 2009-06-22 12:55:50 -07:00
minix Making fs/minix/minix.h double including safe 2009-06-22 11:34:42 -07:00
ncpfs NLS: update handling of Unicode 2009-06-15 21:44:43 -07:00
nfs NFS: Correct the NFS mount path when following a referral 2009-06-22 21:28:25 -07:00
nfs_common SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only 2008-12-23 15:21:32 -05:00
nfsd NFSD: Don't hold unrefcounted creds over call to nfsd_setuser() 2009-07-03 10:21:10 -04:00
nilfs2 switch nilfs2 to inode->i_acl 2009-06-24 08:17:05 -04:00
nls NLS: update handling of Unicode 2009-06-15 21:44:43 -07:00
notify fs/notify/inotify: decrement user inotify count on close 2009-07-02 08:23:00 -04:00
ntfs ntfs: use is_power_of_2() function for clarity. 2009-06-16 19:47:48 -07:00
ocfs2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 2009-06-23 19:36:02 -07:00
omfs switch omfs to simple_fsync() 2009-06-11 21:36:13 -04:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2009-06-12 09:29:42 -07:00
proc proc: vmcore - use kzalloc in get_new_element() 2009-06-18 13:03:41 -07:00
qnx4 fs/qnx4: sanitize includes 2009-06-11 21:36:12 -04:00
quota quota: cleanup dquota sync functions (version 4) 2009-06-11 21:36:04 -04:00
ramfs ramfs: ignore unknown mount options 2009-06-14 17:58:25 -07:00
reiserfs helpers for acl caching + switch to those 2009-06-24 08:17:07 -04:00
romfs ROMFS: romfs_dev_read() error ignored 2009-05-09 10:49:41 -04:00
smbfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
squashfs push BKL down into ->put_super 2009-06-11 21:36:07 -04:00
sysfs Sysfs: fix possible memleak in sysfs_follow_link 2009-06-15 21:30:23 -07:00
sysv get rid of BKL in fs/sysv 2009-06-17 00:36:37 -04:00
ubifs helpers for acl caching + switch to those 2009-06-24 08:17:07 -04:00
udf udf: remove redundant tests on unsigned 2009-06-24 13:48:28 +02:00
ufs ufs: sector_t cannot be negative 2009-06-18 13:03:46 -07:00
xfs switch xfs to generic acl caching helpers 2009-06-24 08:17:07 -04:00
Kconfig Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd 2009-06-22 12:55:50 -07:00
Kconfig.binfmt CORE_DUMP_DEFAULT_ELF_HEADERS depends on ELF_CORE 2009-01-09 16:54:41 -08:00
Makefile nilfs2: update makefile and Kconfig 2009-04-07 08:31:16 -07:00
aio.c eventfd: revised interface and cleanups 2009-06-30 18:55:58 -07:00
anon_inodes.c fs: Provide empty .set_page_dirty() aop for anon inodes 2009-06-18 14:46:10 +02:00
attr.c vfs: Use lowercase names of quota functions 2009-03-26 02:18:35 +01:00
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf.c elf: fix one check-after-use 2009-07-01 11:14:28 -07:00
binfmt_elf_fdpic.c elf_core_dump: use rcu_read_lock() to access ->real_parent 2009-06-18 13:03:52 -07:00
binfmt_em86.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_flat.c flat: fix data sections alignment 2009-05-29 08:40:02 -07:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_som.c Don't crap into descriptor table in binfmt_som 2009-03-31 23:00:28 -04:00
bio-integrity.c block: Create bip slabs with embedded integrity vectors 2009-07-01 10:56:25 +02:00
bio.c block: Create bip slabs with embedded integrity vectors 2009-07-01 10:56:25 +02:00
block_dev.c vfs: Rename fsync_super() to sync_filesystem() (version 4) 2009-06-11 21:36:04 -04:00
buffer.c Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block 2009-06-11 11:10:35 -07:00
char_dev.c fs: Remove i_cindex from struct inode 2009-06-11 21:36:09 -04:00
compat.c trivial: fix comment typo in fs/compat.c 2009-06-12 18:01:44 +02:00
compat_binfmt_elf.c
compat_ioctl.c fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 2009-06-24 08:15:27 -04:00
dcache.c dcache: extrace and use d_unlinked() 2009-06-11 21:36:06 -04:00
dcookies.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
direct-io.c block: Do away with the notion of hardsect_size 2009-05-22 23:22:54 +02:00
drop_caches.c mm: remove __invalidate_mapping_pages variant 2009-06-16 19:47:43 -07:00
eventfd.c eventfd: revised interface and cleanups 2009-06-30 18:55:58 -07:00
eventpoll.c epoll: fix nested calls support 2009-06-18 13:03:41 -07:00
exec.c Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-11 14:01:07 -07:00
fcntl.c send_sigio_to_task: sanitize the usage of fown->signum 2009-06-16 15:36:17 -07:00
fifo.c [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
file.c
file_table.c fs: move mark_files_ro into file_table.c 2009-06-11 21:36:02 -04:00
filesystems.c fs: Mark get_filesystem_list() as __init function. 2009-04-20 23:02:52 -04:00
fs-writeback.c cleanup __writeback_single_inode 2009-06-24 08:15:26 -04:00
fs_struct.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
generic_acl.c New helper - current_umask() 2009-03-31 23:00:26 -04:00
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-06-24 10:03:12 -07:00
internal.h Trim a bit of crap from fs.h 2009-06-11 21:36:07 -04:00
ioctl.c fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 2009-06-24 08:15:27 -04:00
ioprio.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
libfs.c New helper - simple_fsync() 2009-06-11 21:36:11 -04:00
locks.c lockd: call locks_release_private to cleanup per-filesystem state 2009-04-24 16:36:03 -04:00
mbcache.c
mpage.c ext4: Properly initialize the buffer_head state 2009-05-13 15:13:42 -04:00
namei.c integrity: add ima_counts_put (updated) 2009-06-29 08:59:10 +10:00
namespace.c ... and the same for vfsmount id/mount group id 2009-06-24 08:15:26 -04:00
nfsctl.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
no-block.c
open.c fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 2009-06-24 08:15:27 -04:00
pipe.c splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
pnode.c
pnode.h
posix_acl.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
read_write.c splice: implement default splice_read method 2009-05-11 14:13:10 +02:00
read_write.h
readdir.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
select.c poll: avoid extra wakeups in select/poll 2009-06-16 19:47:48 -07:00
seq_file.c seq_file: add function to write binary data 2009-06-18 13:03:57 -07:00
signalfd.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
splice.c splice: fix kmaps in default_file_splice_write() 2009-05-19 11:37:46 +02:00
stack.c
stat.c kill vfs_stat_fd / vfs_lstat_fd 2009-04-20 23:02:52 -04:00
super.c ... and the same for vfsmount id/mount group id 2009-06-24 08:15:26 -04:00
sync.c remove the call to ->write_super in __sync_filesystem 2009-06-11 21:36:17 -04:00
timerfd.c timerfd: add flags check 2009-02-18 15:37:53 -08:00
utimes.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
xattr.c fs: introduce mnt_clone_write 2009-06-11 21:36:02 -04:00
xattr_acl.c