linux/lib
Michal Hocko 7e7844226f lockdep: allow to disable reclaim lockup detection
The current implementation of the reclaim lockup detection can lead to
false positives and those even happen and usually lead to tweak the code
to silence the lockdep by using GFP_NOFS even though the context can use
__GFP_FS just fine.

See

  http://lkml.kernel.org/r/20160512080321.GA18496@dastard

as an example.

  =================================
  [ INFO: inconsistent lock state ]
  4.5.0-rc2+ #4 Tainted: G           O
  ---------------------------------
  inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
  kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:

  (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]

  {RECLAIM_FS-ON-R} state was registered at:
    mark_held_locks+0x79/0xa0
    lockdep_trace_alloc+0xb3/0x100
    kmem_cache_alloc+0x33/0x230
    kmem_zone_alloc+0x81/0x120 [xfs]
    xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
    __xfs_refcount_find_shared+0x75/0x580 [xfs]
    xfs_refcount_find_shared+0x84/0xb0 [xfs]
    xfs_getbmap+0x608/0x8c0 [xfs]
    xfs_vn_fiemap+0xab/0xc0 [xfs]
    do_vfs_ioctl+0x498/0x670
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x12/0x6f

         CPU0
         ----
    lock(&xfs_nondir_ilock_class);
    <Interrupt>
      lock(&xfs_nondir_ilock_class);

   *** DEADLOCK ***

  3 locks held by kswapd0/543:

  stack backtrace:
  CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
  Call Trace:
   lock_acquire+0xd8/0x1e0
   down_write_nested+0x5e/0xc0
   xfs_ilock+0x177/0x200 [xfs]
   xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
   xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
   evict+0xc5/0x190
   dispose_list+0x39/0x60
   prune_icache_sb+0x4b/0x60
   super_cache_scan+0x14f/0x1a0
   shrink_slab.part.63.constprop.79+0x1e9/0x4e0
   shrink_zone+0x15e/0x170
   kswapd+0x4f1/0xa80
   kthread+0xf2/0x110
   ret_from_fork+0x3f/0x70

To quote Dave:
 "Ignoring whether reflink should be doing anything or not, that's a
  "xfs_refcountbt_init_cursor() gets called both outside and inside
  transactions" lockdep false positive case. The problem here is lockdep
  has seen this allocation from within a transaction, hence a GFP_NOFS
  allocation, and now it's seeing it in a GFP_KERNEL context. Also note
  that we have an active reference to this inode.

  So, because the reclaim annotations overload the interrupt level
  detections and it's seen the inode ilock been taken in reclaim
  ("interrupt") context, this triggers a reclaim context warning where
  it thinks it is unsafe to do this allocation in GFP_KERNEL context
  holding the inode ilock..."

This sounds like a fundamental problem of the reclaim lock detection.
It is really impossible to annotate such a special usecase IMHO unless
the reclaim lockup detection is reworked completely.  Until then it is
much better to provide a way to add "I know what I am doing flag" and
mark problematic places.  This would prevent from abusing GFP_NOFS flag
which has a runtime effect even on configurations which have lockdep
disabled.

Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
skip the current allocation request.

While we are at it also make sure that the radix tree doesn't
accidentaly override tags stored in the upper part of the gfp_mask.

Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:09 -07:00
..
842 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-03-17 21:38:27 -07:00
fonts lib/fonts/Kconfig: keep non-Sparc fonts listed together 2017-02-27 18:43:46 -08:00
lz4 lib/lz4: remove back-compat wrappers 2017-02-24 17:46:57 -08:00
lzo
mpi mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] 2016-11-25 12:57:50 +11:00
raid6 lib/raid6: Add AVX2 optimized xor_syndrome functions 2016-11-07 15:08:20 -08:00
reed_solomon
xz
zlib_deflate
zlib_inflate
.gitignore
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-28 10:00:39 -08:00
Kconfig.debug A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
Kconfig.kasan mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB 2016-07-28 16:07:41 -07:00
Kconfig.kgdb kgdb: depends on VT 2016-05-23 17:04:14 -07:00
Kconfig.kmemcheck
Kconfig.ubsan Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation 2016-12-14 16:04:08 -08:00
Makefile Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-05-02 15:53:46 -07:00
argv_split.c
asn1_decoder.c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-05-17 09:33:39 -07:00
assoc_array.c assoc_array: don't call compare_object() on a node 2016-04-06 14:06:48 +01:00
atomic64.c locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() 2016-06-16 10:48:32 +02:00
atomic64_test.c lib: add module support to atomic64 tests 2017-02-24 17:46:57 -08:00
audit.c
bcd.c
bch.c
bitmap.c kernel-api.rst: fix some complex tags at lib/bitmap.c 2017-04-02 14:29:33 -06:00
bitrev.c
bsearch.c
btree.c
bug.c debug: Add _ONCE() logic to report_bug() 2017-03-30 09:37:20 +02:00
build_OID_registry
bust_spinlocks.c
chacha20.c random: replace non-blocking pool with a Chacha20-based CRNG 2016-07-03 00:57:23 -04:00
check_signature.c
checksum.c ipv4: Update parameters for csum_tcpudp_magic to their original types 2016-03-13 23:55:13 -04:00
clz_ctz.c
clz_tab.c
cmdline.c boot/param: Move next_arg() function to lib/cmdline.c for later reuse 2017-04-18 10:37:13 +02:00
compat_audit.c
cordic.c
cpu_rmap.c
cpumask.c cpumask: Export cpumask_any_but() 2016-02-29 09:35:20 +01:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc7.c
crc8.c
crc16.c
crc32.c lib: add module support to crc32 tests 2017-02-24 17:46:57 -08:00
crc32defs.h
crc32test.c lib: add module support to crc32 tests 2017-02-24 17:46:57 -08:00
ctype.c
debug_info.c
debug_locks.c
debugobjects.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
dec_and_lock.c
decompress.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c lib/decompress_unlz4: change module to work with new LZ4 module version 2017-02-24 17:46:57 -08:00
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
devres.c
digsig.c KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() 2017-03-02 10:09:00 +11:00
div64.c
dma-debug.c lib/dma-debug.c: make locking work for RT 2017-05-03 15:52:07 -07:00
dma-noop.c lib/dma-noop: Clarify a comment 2017-01-24 12:23:35 -05:00
dma-virt.c lib/dma-virt: Add dma_virt_ops 2017-01-24 12:23:35 -05:00
dump_stack.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
dynamic_debug.c dynamic_debug: add jump label support 2016-08-04 08:50:07 -04:00
dynamic_queue_limits.c
earlycpio.c lib/cpio: Make find_cpio_data()'s offset arg optional 2016-06-08 11:04:19 +02:00
extable.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fault-inject.c
fdt.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_bit.c lib/find_bit.c: micro-optimise find_next_*_bit 2017-02-24 17:46:57 -08:00
flex_array.c
flex_proportions.c lib+mm: fix few spelling mistakes 2016-02-15 11:18:23 +01:00
gcd.c lib/GCD.c: use binary GCD algorithm instead of Euclidean 2016-05-20 17:58:30 -07:00
gen_crc32table.c
genalloc.c lib/genalloc.c: start search from start of chunk 2016-10-27 18:43:43 -07:00
glob.c lib: add module support to glob tests 2017-02-24 17:46:57 -08:00
globtest.c lib: add module support to glob tests 2017-02-24 17:46:57 -08:00
hexdump.c
hweight.c x86/hweight: Get rid of the special calling convention 2016-06-08 15:01:02 +02:00
idr.c idr: Add missing __rcu annotations 2017-02-13 21:44:10 -05:00
inflate.c
int_sqrt.c
interval_tree.c
interval_tree_test.c
iomap.c
iomap_copy.c
iommu-common.c
iommu-helper.c lib/iommu-helper: skip to next segment 2016-08-02 19:35:07 -04:00
ioremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
iov_iter.c Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-01 14:41:04 -07:00
irq_poll.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
irq_regs.c
is_single_threaded.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
jedec_ddr_data.c
kasprintf.c
kfifo.c
klist.c
kobject.c kobject: Export kobject_get_unless_zero() 2017-03-22 20:11:35 -06:00
kobject_uevent.c kobject: improve function-level documentation 2016-10-28 02:42:10 -04:00
kstrtox.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
kstrtox.h
lcm.c
libcrc32c.c
list_debug.c bug: switch data corruption check to __must_check 2017-02-24 17:46:56 -08:00
list_sort.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c locking/selftest: Fix output since KERN_CONT changes 2016-11-25 07:12:19 +01:00
lockref.c locking/core: Remove cpu_relax_lowlatency() users 2016-11-16 10:15:10 +01:00
lru_cache.c
memory-notifier-error-inject.c
memweight.c
net_utils.c
netdev-notifier-error-inject.c
nlattr.c netlink: pass extended ACK struct to parsing functions 2017-04-13 13:58:22 -04:00
nmi_backtrace.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
nodemask.c include/linux/nodemask.h: create next_node_in() helper 2016-05-19 19:12:14 -07:00
notifier-error-inject.c
notifier-error-inject.h
of-reconfig-notifier-error-inject.c
oid_registry.c
once.c
parman.c lib: Introduce priority array area manager 2017-02-03 16:35:42 -05:00
parser.c parser: add u64 number parser 2016-12-06 10:17:03 +02:00
pci_iomap.c
percpu-refcount.c percpu-refcount: init ->confirm_switch member properly 2016-08-11 13:52:23 -04:00
percpu_counter.c percpu_counter: percpu_counter_hotcpu_callback() cleanup 2017-01-20 10:06:56 -05:00
percpu_ida.c sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h> 2017-03-02 08:42:34 +01:00
percpu_test.c
plist.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
pm-notifier-error-inject.c
prime_numbers.c lib/prime_numbers: Suppress warn on kmalloc failure 2017-01-23 09:17:12 +01:00
radix-tree.c lockdep: allow to disable reclaim lockup detection 2017-05-03 15:52:09 -07:00
random32.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
ratelimit.c ratelimit: extend to print suppressed messages on release 2016-08-02 19:35:06 -04:00
rational.c
rbtree.c rbtree: use designated initializers 2017-02-24 17:46:57 -08:00
rbtree_test.c
reciprocal_div.c
refcount.c locking/refcounts: Use atomic_try_cmpxchg() 2017-03-23 08:54:41 +01:00
rhashtable.c rhashtable: compact struct rhashtable_params 2017-05-01 16:22:40 -04:00
sbitmap.c sbitmap: add sbitmap_get_shallow() operation 2017-04-14 14:06:52 -06:00
scatterlist.c scatterlist: do not disable IRQs in sg_copy_buffer 2017-02-27 18:43:46 -08:00
seq_buf.c
sg_pool.c lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c 2016-04-15 16:53:14 -04:00
sg_split.c
sha1.c
show_mem.c lib/show_mem.c: teach show_mem to work with the given nodemask 2017-02-22 16:41:30 -08:00
siphash.c siphash: implement HalfSipHash1-3 for hash tables 2017-01-09 13:58:57 -05:00
smp_processor_id.c sched/core: Remove the tsk_cpus_allowed() wrapper 2017-03-02 08:42:24 +01:00
sort.c lib: add CONFIG_TEST_SORT to enable self-test of sort() 2017-02-24 17:46:57 -08:00
stackdepot.c lib/stackdepot: export save/fetch stack for drivers 2016-11-11 08:12:37 -08:00
stmp_device.c
string.c kernel-api.rst: fix a series of errors when parsing C files 2017-04-02 14:31:49 -06:00
string_helpers.c string_helpers: add kstrdup_quotable_file 2016-04-21 10:47:26 +10:00
strncpy_from_user.c lib: harden strncpy_from_user 2016-10-11 15:06:30 -07:00
strnlen_user.c unsafe_[get|put]_user: change interface to use a error target label 2016-08-08 13:02:01 -07:00
swiotlb.c swiotlb: ensure that page-sized mappings are page-aligned 2017-01-15 12:37:24 -05:00
syscall.c lib/syscall: Clear return values when no stack 2017-03-24 07:43:35 +01:00
test-kstrtox.c
test-string_helpers.c
test_bitmap.c test_bitmap: unit tests for lib/bitmap.c 2016-02-19 22:54:09 -05:00
test_bpf.c bpf, arm64: fix jit branch offset related to ldimm64 2017-05-02 15:04:50 -04:00
test_firmware.c test_firmware: add test custom fallback trigger 2017-01-25 11:52:34 +01:00
test_hash.c lib/test_hash.c: fix warning in preprocessor symbol evaluation 2016-09-01 17:52:01 -07:00
test_hexdump.c
test_kasan.c kasan: report only the first error by default 2017-03-31 17:13:30 -07:00
test_module.c
test_parman.c lib: fix spelling mistake: "actualy" -> "actually" 2017-02-26 11:03:38 -05:00
test_printf.c mm, printk: introduce new format string for flags 2016-03-15 16:55:16 -07:00
test_rhashtable.c rhashtable-test: Fix max_size parameter description 2016-08-08 12:52:42 -07:00
test_siphash.c siphash: implement HalfSipHash1-3 for hash tables 2017-01-09 13:58:57 -05:00
test_sort.c lib/test_sort.c: make it explicitly non-modular 2017-02-24 17:46:57 -08:00
test_static_key_base.c
test_static_keys.c
test_user_copy.c lib: remove check for AVR32 arch in test_user_copy 2017-05-01 09:36:30 +02:00
test_uuid.c lib/uuid: add a test module 2016-05-30 15:26:57 -07:00
textsearch.c
timerqueue.c timerqueue: Use rb_entry_safe() instead of open-coding it 2017-01-20 08:03:42 +01:00
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c UBSAN: fix typo in format string 2016-08-02 17:31:41 -04:00
ubsan.h
ucs2_string.c lib/ucs2_string: Speed up ucs2_utf8size() 2016-09-09 16:08:46 +01:00
usercopy.c generic ...copy_..._user primitives 2017-03-28 18:22:11 -04:00
uuid.c lib/uuid.c: use correct offset in uuid parser 2016-05-30 15:26:57 -07:00
vsprintf.c kernel-api.rst: fix output of the vsnprintf() documentation 2017-04-02 14:29:19 -06:00
win_minmax.c lib/win_minmax: windowed min or max estimator 2016-09-21 00:22:59 -04:00