One of our users reported that when a user-level program SIGSEGVs under
UML kernel, the resulting core dump is not very usable.
I have reproduced that with the latest kernel:
make ARCH=um defconfig; make ARCH=um
Run the resulting kernel, then "inside" run this program:
#include <pthread.h>
void *fn(void *p)
{
abort();
}
int main()
{
pthread_t tid;
pthread_create(&tid, 0, fn, 0);
pthread_join(tid, 0);
return 0;
}
Analyze the coredump with GDB. Here is what you'll see:
sudo gdb -q -ex 'set solib-absolute-prefix ../root_fs' -ex 'file ../root_fs/var/tmp/mt-abort' -ex 'core ../root_fs/var/tmp/core.762'
Reading symbols from /usr/local/google/root_fs/var/tmp/mt-abort...done.
[New Thread 763]
[New Thread 762]
Core was generated by `./mt-abort'.
Program terminated with signal 6, Aborted.
#0 0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6
(gdb) info thread
2 Thread 762 0x0000000000000000 in ?? ()
* 1 Thread 763 0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6
Note that thread#2 looks funny.
(gdb) thread 2
[Switching to thread 2 (Thread 762)]#0 0x0000000000000000 in ?? ()
(gdb) info reg
rax 0x0 0
rbx 0x0 0
rcx 0x0 0
rdx 0x0 0
rsi 0x0 0
rdi 0x0 0
rbp 0x0 0x0
rsp 0x0 0x0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x0 0
r14 0x0 0
r15 0x0 0
rip 0x0 0
eflags 0x0 [ ]
cs 0x0 0
ss 0x0 0
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Examining the core shows that NT_PRSTATUS notes for all threads other than
the one that crashed are zeroed out.
I believe this is happening because neither ELF_CORE_COPY_TASK_REGS nor
task_pt_regs are defined under ARCH=um, and so elf_core_copy_task_regs()
becomes a no-op.
Attached patch fixes this for SUBARCH={x86_64,i386}.
Signed-off-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>