linux/kernel
Yasuaki Ishimatsu f53acc0b5b workqueue: zero cpumask of wq_numa_possible_cpumask on init
commit 5a6024f160 upstream.

When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

  BUG: unable to handle kernel paging request at 0000000000001d08
  IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
  PGD 0
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
   [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
   [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
   [<ffffffff811926f1>] new_slab+0x91/0x300
   [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
   [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
   [<ffffffff810a3c78>] ? load_balance+0x218/0x890
   [<ffffffff8101a679>] ? sched_clock+0x9/0x10
   [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
   [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
   [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
   [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
   [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
   [<ffffffff8105d0ec>] do_fork+0xbc/0x360
   [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
   [<ffffffff81086652>] kthreadd+0x2c2/0x300
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
   [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:

3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool->attrs->cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool->node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: bce903809a ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17 16:21:03 -07:00
..
cpu sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding 2014-01-13 17:38:55 +01:00
debug kgdb/kdb: Fix no KDB config problem 2014-01-25 08:55:09 +01:00
events perf: Fix race in removing an event 2014-06-11 11:54:08 -07:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq genirq: Sanitize spurious interrupt detection of threaded irqs 2014-06-30 20:12:00 -07:00
locking rtmutex: Plug slow unlock race 2014-06-30 20:11:58 -07:00
power arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
printk printk: fix syslog() overflowing user buffer 2014-02-17 12:24:45 -08:00
rcu Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-01-28 08:38:04 -08:00
sched sched: Fix sched_policy < 0 comparison 2014-06-11 11:54:11 -07:00
time tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() 2014-05-31 13:20:28 -07:00
trace tracing: Remove ftrace_stop/start() from reading the trace file 2014-07-09 11:18:28 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
Makefile KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y 2013-12-13 15:59:11 +00:00
acct.c
async.c
audit.c net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-06-26 15:15:38 -04:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-02-28 04:04:33 -08:00
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
auditfilter.c audit: Update kdoc for audit_send_reply and audit_list_rules_send 2014-03-08 15:31:54 -08:00
auditsc.c audit: remove superfluous new- prefix in AUDIT_LOGIN messages 2014-07-09 11:18:28 -07:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:40:32 -07:00
cgroup.c cgroup: fix a failure path in create_css() 2014-03-18 17:15:36 -04:00
cgroup_freezer.c cgroup: replace cftype->read_seq_string() with cftype->seq_show() 2013-12-05 12:28:04 -05:00
compat.c
configs.c
context_tracking.c context_tracking: Wrap static key check into more intuitive function name 2013-12-02 20:43:14 +01:00
cpu.c sched: Fix hotplug vs. set_cpus_allowed_ptr() 2014-06-11 11:54:11 -07:00
cpu_pm.c
cpuset.c cpuset,mempolicy: fix sleeping function called from invalid context 2014-07-17 16:21:03 -07:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c exit: call disassociate_ctty() before exit_task_namespaces() 2014-04-26 17:19:07 -07:00
extable.c kernel/extable: fix address-checks for core_kernel and init areas 2013-11-28 09:49:41 -08:00
fork.c tracing: Fix syscall_*regfunc() vs copy_process() race 2014-07-06 18:57:29 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex.c futex: Make lookup_pi_state more robust 2014-06-07 10:28:29 -07:00
futex_compat.c
groups.c
hrtimer.c hrtimer: Set expiry time before switch_hrtimer_base() 2014-06-07 10:28:12 -07:00
hung_task.c hung_task: Display every hung task warning 2014-01-25 12:13:33 +01:00
irq_work.c
itimer.c
jump_label.c
kallsyms.c
kcmp.c
kexec.c powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode 2014-06-07 10:28:28 -07:00
kmod.c execve: use 'struct filename *' for executable name passing 2014-02-05 12:54:53 -08:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kdump: fix exported size of vmcoreinfo note 2014-01-23 16:37:03 -08:00
kthread.c kthread: fix return value of kthread_create() upon SIGKILL. 2014-06-30 20:11:53 -07:00
latencytop.c
module-internal.h
module.c module: remove warning about waiting module removal. 2014-06-07 10:28:11 -07:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c panic: Make panic_timeout configurable 2013-11-26 12:12:26 +01:00
params.c params: improve standard definitions 2013-12-04 14:09:46 +10:30
pid.c
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-26 17:19:04 -07:00
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c mm: fix GFP_THISNODE callers and clarify 2014-03-10 17:26:19 -07:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c
res_counter.c
resource.c
seccomp.c
signal.c kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread() 2014-01-23 16:37:02 -08:00
smp.c kernel/smp.c: remove cpumask_ipi 2014-01-30 16:56:54 -08:00
smpboot.c
smpboot.h
softirq.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-31 09:02:51 -08:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys.c kernel/sys.c: k_getrusage() can use while_each_thread() 2014-01-23 16:37:02 -08:00
sys_ni.c
sysctl.c mm, pcp: allow restoring percpu_pagelist_fraction default 2014-07-09 11:18:26 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c timer: Prevent overflow in apply_slack 2014-06-07 10:28:09 -07:00
tracepoint.c tracepoint: Do not waste memory on mods with no tracepoints 2014-05-31 13:20:28 -07:00
tsacct.c
uid16.c
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
user_namespace.c user namespace: fix incorrect memory barriers 2014-04-26 17:19:02 -07:00
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog.c: remove preemption restrictions when restarting lockup detector 2014-07-06 18:57:27 -07:00
workqueue.c workqueue: zero cpumask of wq_numa_possible_cpumask on init 2014-07-17 16:21:03 -07:00
workqueue_internal.h