linux/drivers/edac
Daniel J Blueman a36a251710 EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
commit 0c510cc83b upstream.

When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
the kernel fatally dereferences unallocated structures, see splat below;
this occurs on at least NumaConnect systems.

Fix by checking if a memory controller info structure was found.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in:
CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
Stack:
 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
 ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
Call Trace:
 <IRQ>
 ? vprintk_default
 ? printk
 amd_decode_mce
 notifier_call_chain
 atomic_notifier_call_chain
 mce_log
 machine_check_poll
 mce_timer_fn
 ? mce_cpu_restart
 call_timer_fn.isra.29
 run_timer_softirq
 __do_softirq
 irq_exit
 smp_apic_timer_interrupt
 apic_timer_interrupt
 <EOI>
 ? down_read_trylock
 __do_page_fault
 ? __schedule
 do_page_fault
 page_fault

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
[ Boris: massage commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:43:31 -08:00
..
amd64_edac_dbg.c
amd64_edac_inj.c
amd64_edac.c EDAC, amd64_edac: Prevent OOPS with >16 memory controllers 2015-03-06 14:43:31 -08:00
amd64_edac.h
amd76x_edac.c
amd8111_edac.c
amd8111_edac.h
amd8131_edac.c
amd8131_edac.h
cell_edac.c
cpc925_edac.c cpc925_edac: Report UE events properly 2014-11-14 09:00:09 -08:00
e7xxx_edac.c e7xxx_edac: Report CE events properly 2014-11-14 09:00:08 -08:00
e752x_edac.c e752x_edac: Fix pci_dev usage count 2013-12-21 13:26:57 +01:00
edac_core.h
edac_device_sysfs.c
edac_device.c EDAC: Don't try to cancel workqueue when it's never setup 2014-01-10 15:57:36 +01:00
edac_mc_sysfs.c EDAC: Poll timeout cannot be zero, p2 2014-02-14 10:40:29 +01:00
edac_mc.c EDAC: Correct workqueue setup path 2014-02-14 10:40:47 +01:00
edac_module.c
edac_module.h EDAC: Poll timeout cannot be zero, p2 2014-02-14 10:40:29 +01:00
edac_pci_sysfs.c
edac_pci.c
edac_stub.c
ghes_edac.c
highbank_l2_edac.c
highbank_mc_edac.c
i7core_edac.c i7core_edac: Fix PCI device reference count 2014-02-25 08:54:45 +01:00
i3000_edac.c
i3200_edac.c i3200_edac: Report CE events properly 2014-11-14 09:00:08 -08:00
i5000_edac.c
i5100_edac.c
i5400_edac.c
i7300_edac.c i7300_edac: Fix device reference count 2014-02-25 09:43:13 +01:00
i82443bxgx_edac.c
i82860_edac.c i82860_edac: Report CE events properly 2014-11-14 09:00:08 -08:00
i82875p_edac.c
i82975x_edac.c
Kconfig
Makefile
mce_amd_inj.c
mce_amd.c
mce_amd.h
mpc85xx_edac.c
mpc85xx_edac.h
mv64x60_edac.c
mv64x60_edac.h
octeon_edac-l2c.c
octeon_edac-lmc.c
octeon_edac-pc.c
octeon_edac-pci.c
pasemi_edac.c
ppc4xx_edac.c
ppc4xx_edac.h
r82600_edac.c
sb_edac.c Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:10:27 -08:00
tile_edac.c
x38_edac.c