This commit fixes detaching on Linux when some thread exits the whole
thread group (process) just while we're detaching.
On Linux, a ptracer must detach from each LWP individually, with
PTRACE_DETACH. Since PTRACE_DETACH sets the thread running free, if
one of the already-detached threads causes the whole thread group to
exit (e.g., simply calls exit), the kernel force-kills the other
threads in the group, making them zombie, just as we're still
detaching them. Since PTRACE_DETACH against a zombie thread fails
with ESRCH, and gdb/gdbserver are not expecting this, the detach fails
with an error like: "Can't detach process: No such process.".
This patch detects this detach failure as normal, and instead of
erroring out, reaps the now-dead thread.
New test included, that exercises several different scenarios that
cause GDB/GDBserver to error out when it should not.
Tested on x86-64 GNU/Linux with {unix, native-gdbserver,
native-extended-gdbserver}
Note: without the previous fix, the "single-process + continue"
variant of the new test would fail with:
(gdb) PASS: gdb.threads/process-dies-while-detaching.exp: single-process: continue: watchpoint: switch to parent
continue
Continuing.
Warning:
Could not insert hardware watchpoint 3.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
Command aborted.
(gdb) FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: continue: watchpoint: continue
gdb/gdbserver/ChangeLog:
2016-07-01 Pedro Alves <palves@redhat.com>
Antoine Tremblay <antoine.tremblay@ericsson.com>
* linux-low.c: Change interface to take the target lwp_info
pointer directly and return void. Handle detaching from a zombie
thread.
(linux_detach_lwp_callback): New function.
(linux_detach): Detach from the leader thread after detaching from
the clone threads.
gdb/ChangeLog:
2016-07-01 Pedro Alves <palves@redhat.com>
Antoine Tremblay <antoine.tremblay@ericsson.com>
* inf-ptrace.c (inf_ptrace_detach_success): New function, factored
out from ...
(inf_ptrace_detach): ... here.
* inf-ptrace.h (inf_ptrace_detach_success): New declaration.
* linux-nat.c (get_pending_status): Rename to ...
(get_detach_signal): ... this, and return a host signal instead of
filling in a wait status.
(detach_one_lwp): New function, factored out from detach_callback
and adjusted to handle detaching from a zombie thread.
(detach_callback): Skip the leader thread.
(linux_nat_detach): No longer defer to inf_ptrace_detach to detach
the leader thread, nor build a signal string to pass down.
Instead, use target_announce_detach, detach_one_lwp and
inf_ptrace_detach_success.
gdb/testsuite/ChangeLog:
2016-07-01 Pedro Alves <palves@redhat.com>
Antoine Tremblay <antoine.tremblay@ericsson.com>
* gdb.threads/process-dies-while-detaching.c: New file.
* gdb.threads/process-dies-while-detaching.exp: New file.
And with that, we can switch the current UI to the UI whose input
descriptor woke up the event loop. IOW, if the user types in UI 2,
the event loop wakes up, switches to UI 2, and processes the input.
Next the user types in UI 3, the event loop wakes up and switches to
UI 3, etc.
gdb/ChangeLog:
2016-06-21 Pedro Alves <palves@redhat.com>
* event-top.c (input_fd): Delete.
(stdin_event_handler): Switch to the UI whose input descriptor got
the event. Adjust to per-UI input_fd.
(gdb_setup_readline): Don't set the input_fd global. Adjust to
per-UI input_fd.
(gdb_disable_readline): Adjust to per-UI input_fd.
* event-top.h (input_fd): Delete declaration.
* linux-nat.c (linux_nat_terminal_inferior): Don't remove input_fd
from the event-loop here.
(linux_nat_terminal_ours): Don't register input_fd in the
event-loop here.
* main.c (captured_main): Adjust to per-UI input_fd.
* remote.c (remote_terminal_inferior): Don't remove input_fd from
the event-loop here.
(remote_terminal_ours): Don't register input_fd in the event-loop
here.
* target.c: Include top.h and event-top.h.
(target_terminal_inferior): Remove input_fd from the event-loop
here.
(target_terminal_ours): Register input_fd in the event-loop.
* top.h (struct ui) <input_fd>: New field.
When GDB attaches to a process, it looks at the /proc/PID/task/ dir
for all clone threads of that process, and attaches to each of them.
Usually, if there is more than one clone thread, it means the program
is multi threaded and linked with pthreads. Thus when GDB soon after
attaching finds and loads a libthread_db matching the process, it'll
add a thread to the thread list for each of the initially found
lower-level LWPs.
If, however, GDB fails to find/load a matching libthread_db, nothing
is adding the LWPs to the thread list. And because of that, "detach"
hits an internal error:
(gdb) PASS: gdb.threads/clone-attach-detach.exp: fg attach 1: attach
info threads
Id Target Id Frame
* 1 LWP 6891 "clone-attach-de" 0x00007f87e5fd0790 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
(gdb) FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: info threads shows two LWPs
detach
.../src/gdb/thread.c:1010: internal-error: is_executing: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: detach (GDB internal error)
From here:
...
#8 0x00000000007ba7cc in internal_error (file=0x98ea68 ".../src/gdb/thread.c", line=1010, fmt=0x98ea30 "%s: Assertion `%s' failed.")
at .../src/gdb/common/errors.c:55
#9 0x000000000064bb83 in is_executing (ptid=...) at .../src/gdb/thread.c:1010
#10 0x00000000004c23bb in get_pending_status (lp=0x12c5cc0, status=0x7fffffffdc0c) at .../src/gdb/linux-nat.c:1235
#11 0x00000000004c2738 in detach_callback (lp=0x12c5cc0, data=0x0) at .../src/gdb/linux-nat.c:1317
#12 0x00000000004c1a2a in iterate_over_lwps (filter=..., callback=0x4c2599 <detach_callback>, data=0x0) at .../src/gdb/linux-nat.c:899
#13 0x00000000004c295c in linux_nat_detach (ops=0xe7bd30, args=0x0, from_tty=1) at .../src/gdb/linux-nat.c:1358
#14 0x000000000068284d in delegate_detach (self=0xe7bd30, arg1=0x0, arg2=1) at .../src/gdb/target-delegates.c:34
#15 0x0000000000694141 in target_detach (args=0x0, from_tty=1) at .../src/gdb/target.c:2241
#16 0x0000000000630582 in detach_command (args=0x0, from_tty=1) at .../src/gdb/infcmd.c:2975
...
Tested on x86-64 Fedora 23. Also confirmed the test passes against
gdbserver with "maint set target-non-stop".
gdb/ChangeLog:
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* linux-nat.c (attach_proc_task_lwp_callback): Mark the lwp
resumed, and add the thread to GDB's thread list.
testsuite/ChangeLog:
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* gdb.threads/clone-attach-detach.c: New file.
* gdb.threads/clone-attach-detach.exp: New file.
Working on the fix for gdb/19828, I saw
gdb.threads/attach-many-short-lived-threads.exp fail once in an
unusual way. Unfortunately I didn't keep debug logs, but it's an
issue similar to what's been fixed in remote.c a while ago --
linux-nat.c was not fetching the pending status from the right place.
gdb/ChangeLog:
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* linux-nat.c (get_pending_status): If the thread reported the
event to the core and it's pending, use the pending status signal
number.
Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens, and running gdb under
perf, I saw that GDB was spending most of the time in find_lwp_pid:
- captured_main
- 93.61% catch_command_errors
- 87.41% attach_command
- 87.40% linux_nat_attach
- 87.40% linux_proc_attach_tgid_threads
- 82.38% attach_proc_task_lwp_callback
- 81.01% find_lwp_pid
5.30% ptid_get_lwp
+ 0.10% ptid_lwp_p
+ 0.64% add_thread
+ 0.26% set_running
+ 0.24% set_executing
0.12% ptid_get_lwp
+ 0.01% ptrace
+ 0.01% add_lwp
attach_proc_task_lwp_callback is called once for each LWP that we
attach to, found by listing the /proc/PID/task/ directory. In turn,
attach_proc_task_lwp_callback calls find_lwp_pid to check whether the
LWP we're about to try to attach to is already known. Since
find_lwp_pid does a linear walk over the whole LWP list, this becomes
quadratic. We do the /proc/PID/task/ listing until we get two
iterations in a row where we found no new threads. So the second and
following times we walk the /proc/PID/task/ dir, we're going to take
an even worse find_lwp_pid hit.
Fix this by adding a hash table keyed by LWP PID, for fast lookup.
The linked list embedded in the LWP structure itself is kept, and made
a double-linked list, so that removals from that list are O(1). An
earlier version of this patch got rid of this list altogether, but
that revealed hidden dependencies / assumptions on how the list is
sorted. For example, killing a process and then waiting for all the
LWPs status using iterate_over_lwps only works as is because the
leader LWP is always last in the list. So I thought it better to take
an incremental approach and make this patch concern itself _only_ with
the PID lookup optimization.
gdb/ChangeLog:
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* linux-nat.c (lwp_lwpid_htab): New htab.
(lwp_info_hash, lwp_lwpid_htab_eq, lwp_lwpid_htab_create)
(lwp_lwpid_htab_add_lwp): New functions.
(lwp_list): Tweak comment.
(lwp_list_add, lwp_list_remove, lwp_lwpid_htab_remove_pid): New
functions.
(purge_lwp_list): Rewrite, using htab_traverse_noresize.
(add_initial_lwp): Add lwp to htab too. Use lwp_list_add.
(delete_lwp): Use lwp_list_remove. Remove htab too.
(find_lwp_pid): Search in htab.
(_initialize_linux_nat): Call lwp_lwpid_htab_create.
* linux-nat.h (struct lwp_info) <prev>: New field.
Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens, I saw GDB having trouble
keeping up with threads being spawned too fast, when it tried to stop
them all. This was because while gdb is doing that, it updates the
thread list to make sure no new thread has sneaked in that might need
to be paused. It does this a few times until it sees no-new-threads
twice in a row. The thread listing update itself is not that
expensive, however, in the Linux backend, updating the threads list
calls linux_common_core_of_thread for each LWP to record on which core
each LWP was last seen running, which opens/reads/closes a /proc file
for each LWP which becomes expensive when you need to do it for
thousands of LWPs.
perf shows gdb in linux_common_core_of_thread 44% of the time, in the
stop_all_threads -> update_thread_list path in this use case.
This patch simply makes linux_common_core_of_thread avoid updating the
core the thread is bound to if the thread hasn't run since the last
time we updated that info. This makes linux_common_core_of_thread
disappear into the noise in the perf report.
gdb/ChangeLog:
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* linux-nat.c (linux_resume_one_lwp_throw): Clear the LWP's core
field.
(linux_nat_update_thread_list): Don't fetch the core if already
known.
A following patch (fix for gdb/19828) makes linux-nat.c add threads to
GDB's thread list earlier in the "attach" sequence, and that causes a
surprising regression on
gdb.threads/attach-many-short-lived-threads.exp on my machine. The
extra "thread x exited" handling and traffic slows down that test
enough that GDB core has trouble keeping up with new threads that are
spawned while trying to stop existing ones.
I saw the exact same issue with remote/gdbserver a while ago and fixed
it in 65706a29ba (Remote thread create/exit events) so part of the
fix here is the exact same -- add support for thread created events to
gdb/linux-nat.c. infrun.c:stop_all_threads enables those events when
it tries to stop threads, which ensures that new threads never get a
chance to themselves start new threads, thus fixing the race.
gdb/
2016-05-24 Pedro Alves <palves@redhat.com>
PR gdb/19828
* linux-nat.c (report_thread_events): New global.
(linux_handle_extended_wait): Report
TARGET_WAITKIND_THREAD_CREATED if thread event reporting is
enabled.
(wait_lwp, linux_nat_filter_event): Report all thread exits if
thread event reporting is enabled. Remove comment.
(filter_exit_event): New function.
(linux_nat_wait_1): Use it.
(linux_nat_thread_events): New function.
(linux_nat_add_target): Install it as target_thread_events method.
In non-stop mode, "interrupt" results in a "stop with no signal",
while in all-stop mode, it results in a remote interrupt request /
stop with SIGINT. This is currently implemented in both the Linux and
remote target backends. Move it to the core code instead, making
target_interrupt specifically always about "Interrupting as if with
Ctrl-C", just like it is documented.
gdb/ChangeLog:
2016-04-12 Pedro Alves <palves@redhat.com>
* infcmd.c (interrupt_target_1): Call target_stop is in non-stop
mode.
* linux-nat.c (linux_nat_interrupt): Delete.
(linux_nat_add_target): Don't install linux_nat_interrupt.
* remote.c (remote_interrupt_ns): Change return type to void.
Throw error if interrupting the target is not supported.
(remote_interrupt): Don't call the remote_stop_ns/remote_stop_as.
This unbreaks pending/delayed breakpoints handling, as well as
hardware watchpoints, on MIPS.
Ref: https://sourceware.org/ml/gdb-patches/2016-02/msg00681.html
The MIPS kernel reports SI_KERNEL for all kernel generated traps,
instead of TRAP_BRKPT / TRAP_HWBKPT, but GDB isn't aware of this.
Basically, this commit:
- Folds watchpoints logic into check_stopped_by_breakpoint, and
renames it to save_stop_reason.
- Adds GDB_ARCH_IS_TRAP_HWBKPT.
- Makes MIPS set both GDB_ARCH_IS_TRAP_BRPT and
GDB_ARCH_IS_TRAP_HWBKPT to SI_KERNEL. In save_stop_reason, we
handle the case of the same si_code returning true for both
TRAP_BRPT and TRAP_HWBKPT by looking at what the debug registers
say.
Tested on x86-64 Fedora 20, native and gdbserver.
gdb/ChangeLog:
2016-02-24 Pedro Alves <palves@redhat.com>
* linux-nat.c (save_sigtrap) Delete.
(stop_wait_callback): Call save_stop_reason instead of
save_sigtrap.
(check_stopped_by_breakpoint): Rename to ...
(save_stop_reason): ... this. Bits of save_sigtrap folded here.
Use GDB_ARCH_IS_TRAP_HWBKPT and handle ambiguous
GDB_ARCH_IS_TRAP_BRKPT / GDB_ARCH_IS_TRAP_HWBKPT. Factor out
common code between the USE_SIGTRAP_SIGINFO and
!USE_SIGTRAP_SIGINFO blocks.
(linux_nat_filter_event): Call save_stop_reason instead of
save_sigtrap.
* nat/linux-ptrace.h: Check for both SI_KERNEL and TRAP_BRKPT
si_code for MIPS.
* nat/linux-ptrace.h: Fix "TRAP_HWBPT" typo in x86 table. Add
comments on MIPS behavior.
(GDB_ARCH_IS_TRAP_HWBKPT): Define for all archs.
gdb/gdbserver/ChangeLog:
2016-02-24 Pedro Alves <palves@redhat.com>
* linux-low.c (check_stopped_by_breakpoint): Rename to ...
(save_stop_reason): ... this. Use GDB_ARCH_IS_TRAP_HWBKPT and
handle ambiguous GDB_ARCH_IS_TRAP_BRKPT / GDB_ARCH_IS_TRAP_HWBKPT.
Factor out common code between the USE_SIGTRAP_SIGINFO and
!USE_SIGTRAP_SIGINFO blocks.
(linux_low_filter_event): Call save_stop_reason instead of
check_stopped_by_breakpoint and check_stopped_by_watchpoint.
Update comments.
(linux_wait_1): Update comments.
linux_nat_kill relies on get_last_target_status to determine whether
the current inferior is stopped at a unfollowed fork/vfork event.
This is bad because many things can happen ever since we caught the
fork/vfork event... This commit rewrites that code to instead walk
the thread list looking for unfollowed fork events, similarly to what
was done for remote.c.
New test included. The main idea of the test is make sure that when
the program stops for a fork catchpoint, and the user kills the
parent, gdb also kills the unfollowed fork child. Since the child
hasn't been added as an inferior at that point, we need some other
portable way to detect that the child is gone. The test uses a pipe
for that. The program forks twice, so you have grandparent, child and
grandchild. The grandchild inherits the write side of the pipe. The
grandparent hangs reading from the pipe, since nothing ever writes to
it. If, when GDB kills the child, it also kills the grandchild, then
the grandparent's pipe read returns 0/EOF and the test passes.
Otherwise, if GDB doesn't kill the grandchild, then the pipe read
never returns and the test times out, like:
FAIL: gdb.base/catch-fork-kill.exp: fork-kind=fork: exit-kind=kill: fork: kill parent (timeout)
FAIL: gdb.base/catch-fork-kill.exp: fork-kind=vfork: exit-kind=kill: vfork: kill parent (timeout)
No regressions on x86_64 Fedora 20. New test passes with gdbserver as
well.
gdb/ChangeLog:
2016-01-25 Pedro Alves <palves@redhat.com>
PR gdb/19494
* linux-nat.c (kill_one_lwp): New, factored out from ...
(kill_callback): ... this.
(kill_wait_callback): New, factored out from ...
(kill_wait_one_lwp): ... this.
(kill_unfollowed_fork_children): New function.
(linux_nat_kill): Use it.
gdb/testsuite/ChangeLog:
2016-01-25 Pedro Alves <palves@redhat.com>
PR gdb/19494
* gdb.base/catch-fork-kill.c: New file.
* gdb.base/catch-fork-kill.exp: New file.
Note: this applies on top of:
[PATCH] Remove support for LinuxThreads and vendor 2.4 kernels w/ backported NPTL
https://sourceware.org/ml/gdb-patches/2015-12/msg00214.html
We try to avoid using libthread_db.so to list threads in the inferior
when debugging live processes, but the code that decides whether to
use it decides incorrectly if you have more than one inferior, and the
current inferior doesn't have execution yet. The result is visible
as:
(gdb) add-inferior
Added inferior 2
(gdb) inferior 2
[Switching to inferior 2 [<null>] (<noexec>)]
(gdb) info inferiors
Num Description Executable
1 process 15397 /home/pedro/gdb/tests/threads
* 2 <null>
(gdb) info threads
Cannot find new threads: generic error
(gdb)
Fix this by checking whether each inferior has execution rather than
just the current inferior.
By moving the core updating to linux-nat.c's update_thread_list
implementation, this also ends up fixing the
lwp-last-seen-running-on-core updating in the case we're debugging a
program that uses raw clone rather than pthreads, as linux-thread-db.c
isn't pushed in the target stack in that scenario.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-12-17 Pedro Alves <palves@redhat.com>
PR threads/19354
* linux-nat.c (linux_nat_update_thread_list): Update process cores
each lwp was last seen running on here.
* linux-thread-db.c (update_thread_core): Delete.
(thread_db_update_thread_list_td_ta_thr_iter): Rename to ...
(thread_db_update_thread_list): ... this. Skip inferiors with
execution. Also call the target beneath.
(thread_db_update_thread_list): Delete.
gdb/testsuite/ChangeLog:
2015-12-17 Pedro Alves <palves@redhat.com>
PR threads/19354
* gdb.multi/info-threads.exp: New file.
Since we now rely on PTRACE_EVENT_CLONE being available (added in
Linux 2.5.46), we're relying on NPTL.
This commit removes the support for older LinuxThreads, as well as the
workarounds for vendor 2.4 kernels with NPTL backported.
- Rely on tkill being available.
- Assume gdb doesn't get cancel signals.
- Remove code that checks the LinuxThreads restart and cancel signals
in the inferior.
- Assume that __WALL is available.
- Assume that non-leader threads report WIFEXITED.
- Thus, no longer need to send signal 0 to check whether threads are
still alive.
- Update comments throughout.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/ChangeLog:
* configure.ac: Remove tkill checks.
* configure, config.in: Regenerate.
* linux-nat.c: Remove HAVE_TKILL_SYSCALL check. Update top level
comments.
(linux_nat_post_attach_wait): Remove 'cloned' parameter. Use
__WALL.
(attach_proc_task_lwp_callback): Don't set the cloned flag.
(linux_nat_attach): Adjust.
(kill_lwp): Remove HAVE_TKILL_SYSCALL check. No longer fall back
to 'kill'.
(linux_handle_extended_wait): Use __WALL. Don't set the cloned
flag.
(wait_lwp): Use __WALL. Update comments.
(running_callback, stop_and_resume_callback): Delete.
(linux_nat_filter_event): Don't stop and resume all lwps. Don't
check if the event LWP has previously exited.
(check_zombie_leaders): Update comments.
(linux_nat_wait_1): Use __WALL.
(kill_wait_callback): Don't handle clone processes separately.
Use __WALL instead.
(linux_thread_alive): Delete.
(linux_nat_thread_alive): Return true as long as the LWP is in the
LWP list.
(linux_nat_update_thread_list): Assume the kernel supports
PTRACE_EVENT_CLONE.
(get_signo): Delete.
(lin_thread_get_thread_signals): Remove LinuxThreads references.
No longer check __pthread_sig_restart / __pthread_sig_cancel in
the inferior.
* linux-nat.h (struct lwp_info) <cloned>: Delete field.
* linux-thread-db.c: Update comments.
(_initialize_thread_db): Remove LinuxThreads references.
* nat/linux-waitpid.c (my_waitpid): No longer emulate __WALL.
Pass down flags unmodified.
* linux-waitpid.h (my_waitpid): Update documentation.
gdb/gdbserver/ChangeLog:
* linux-low.c (linux_kill_one_lwp): Remove references to
LinuxThreads.
(kill_lwp): Remove HAVE_TKILL_SYSCALL check. No longer fall back
to 'kill'.
(linux_init_signals): Delete.
(initialize_low): Adjust.
* thread-db.c (thread_db_init): Remove LinuxThreads reference.
Before, on systems that did not support PTRACE_EVENT_CLONE, both GDB and
GDBServer coordinated with libthread_db.so to insert breakpoints at magic
locations in libpthread.so, in order to break at thread creation and
thread death.
Support for thread events was removed from GDBServer as patch:
https://sourceware.org/ml/gdb-patches/2015-11/msg00466.html
This patch removes support for thread events in GDB.
No regressions found on Ubuntu 14.04 x86_64.
gdb/ChangeLog:
* breakpoint.c (remove_thread_event_breakpoints): Remove.
* breakpoint.h (remove_thread_event_breakpoints): Remove
declaration.
* linux-nat.c (in_pid_list_p): Remove.
(lin_lwp_attach_lwp): Remove.
* linux-nat.h (lin_lwp_attach_lwp): Remove declaration.
* linux-thread-db.c (thread_db_use_events): Remove.
(struct thread_db_info) <td_create_bp_addr>: Remove.
<td_death_bp_addr>: Likewise.
<td_ta_event_addr_p>: Likewise.
<td_ta_set_event_p>: Likewise.
<td_ta_clear_event_p>: Likewise.
<td_ta_event_getmsg_p>: Likewise.
<td_thr_event_enable_p>: Likewise.
(attach_thread): Likewise.
(detach_thread): Likewise.
(have_threads_callback): Likewise.
(have_threads): Likewise.
(enable_thread_event): Likewise.
(enable_thread_event_reporting): Likewise.
(try_thread_db_load_1): Remove td_ta_event_addr, td_ta_set_event,
td_ta_clear_event, td_ta_event_getmsg, td_thr_event_enable
initializations.
(try_thread_db_load_1): Remove enable_thread_event_reporting call.
(disable_thread_event_reporting): Remove.
(record_thread): Adapt to thread_db_use_event removal.
(detach_thread): Remove.
(thread_db_detach): Adapt to thread_db_use_event removal.
(check_event): Remove.
(thread_db_wait): Adapt to thread events support removal.
(thread_db_mourn_inferior): Likewise.
(find_new_threads_callback): Likewise.
(find_new_threads_once): Likewise.
(thread_db_update_thread_list): Likewise.
This patch adds support for thread names in the remote protocol, and
updates gdb/gdbserver to use it. The information is added to the XML
description sent in response to the qXfer:threads:read packet.
gdb/ChangeLog:
* linux-nat.c (linux_nat_thread_name): Replace implementation by call
to linux_proc_tid_get_name.
* nat/linux-procfs.c (linux_proc_tid_get_name): New function,
implementation inspired by linux_nat_thread_name.
* nat/linux-procfs.h (linux_proc_tid_get_name): New declaration.
* remote.c (struct private_thread_info) <name>: New field.
(free_private_thread_info): Free name field.
(remote_thread_name): New function.
(thread_item_t) <name>: New field.
(clear_threads_listing_context): Free name field.
(start_thread): Get name xml attribute.
(thread_attributes): Add "name" attribute.
(remote_update_thread_list): Copy name field.
(init_remote_ops): Assign remote_thread_name callback.
* target.h (target_thread_name): Update comment.
* NEWS: Mention remote thread name support.
gdb/gdbserver/ChangeLog:
* linux-low.c (linux_target_ops): Use linux_proc_tid_get_name.
* server.c (handle_qxfer_threads_worker): Refactor to include thread
name in reply.
* target.h (struct target_ops) <thread_name>: New field.
(target_thread_name): New macro.
gdb/doc/ChangeLog:
* gdb.texinfo (Thread List Format): Mention thread names.
Since this code path returns a string owned by the target (we don't know how
it's allocated, could be a static read-only string), it's safer if we return
a constant string. If, for some reasons, the caller wishes to modify the
string, it should make itself a copy.
gdb/ChangeLog:
* linux-nat.c (linux_nat_thread_name): Constify return value.
* target.h (struct target_ops) <to_thread_name>: Likewise.
(target_thread_name): Likewise.
* target.c (target_thread_name): Likewise.
* target-delegates.c (debug_thread_name): Regenerate.
* python/py-infthread.c (thpy_get_name): Constify local variables.
* thread.c (print_thread_info): Likewise.
(thread_find_command): Likewise.
The existing logic was simply to flip syscall entry/return state when a
syscall trap was seen, and even then only with active 'catch syscall'.
That can get out of sync if 'catch syscall' is toggled at odd times.
This patch updates the entry/return state for all syscall traps,
regardless of catching state, and also updates known syscall state for
other kinds of traps. Almost all PTRACE_EVENT stops are delivered from
the middle of a syscall, so this can act like an entry. Every other
kind of ptrace stop is only delivered outside of syscall event pairs, so
marking them ignored ensures the next syscall trap looks like an entry.
Three new test scenarios are added to catch-syscall.exp:
- Disable 'catch syscall' from an entry to deliberately miss the return
event, then re-enable to make sure a new entry is recognized.
- Enable 'catch syscall' for the first time from a vfork event, which is
a PTRACE_EVENT_VFORK in the middle of the syscall. Make sure the next
syscall event is recognized as the return.
- Make sure entry and return are recognized for an ENOSYS syscall. This
is to defeat a common x86 hack that uses the pre-filled ENOSYS return
value as a sign of being on the entry side.
gdb/ChangeLog:
2015-10-19 Josh Stone <jistone@redhat.com>
* linux-nat.c (linux_handle_syscall_trap): Always update entry/
return state, even when not actively catching syscalls at all.
(linux_handle_extended_wait): Mark syscall_state like an entry.
(wait_lwp): Set syscall_state ignored for other traps.
(linux_nat_filter_event): Likewise.
gdb/testsuite/ChangeLog:
2015-10-19 Josh Stone <jistone@redhat.com>
* gdb.base/catch-syscall.c: Include <sched.h>.
(unknown_syscall): New variable.
(main): Trigger a vfork and an unknown syscall.
* gdb.base/catch-syscall.exp (vfork_syscalls): New variable.
(unknown_syscall_number): Likewise.
(check_call_to_syscall): Accept an optional syscall pattern.
(check_return_from_syscall): Likewise.
(check_continue): Likewise.
(test_catch_syscall_without_args): Check for vfork and ENOSYS.
(test_catch_syscall_skipping_return): New test toggling off 'catch
syscall' to step over the syscall return, then toggling back on.
(test_catch_syscall_mid_vfork): New test turning on 'catch syscall'
during a PTRACE_EVENT_VFORK stop, in the middle of a vfork syscall.
(do_syscall_tests): Call test_catch_syscall_without_args and
test_catch_syscall_mid_vfork.
(test_catch_syscall_without_args_noxml): Check for vfork and ENOSYS.
(fill_all_syscalls_numbers): Initialize unknown_syscall_number.
Since the record-btrace target now supports non-stop mode, we no
longer need to force-disable as-ns on x86.
gdb/ChangeLog:
2015-09-30 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_nat_always_non_stop_p): Always return 1.
* x86-linux-nat.c (x86_linux_always_non_stop_p): Delete.
(x86_linux_create_target): Don't install
x86_linux_always_non_stop_p.
This patch makes the execution control code use largely the same
mechanisms in both sync- and async-capable targets. This means using
continuations and use the event loop to react to target events on sync
targets as well. The trick is to immediately mark infrun's event loop
source after resume instead of calling wait_for_inferior. Then
fetch_inferior_event is adjusted to do a blocking wait on sync
targets.
Tested on x86_64 Fedora 20, native and gdbserver, with and without
"maint set target-async off".
gdb/ChangeLog:
2015-09-09 Pedro Alves <palves@redhat.com>
* breakpoint.c (bpstat_do_actions_1, until_break_command): Don't
check whether the target can async.
* inf-loop.c (inferior_event_handler): Only call target_async if
the target can async.
* infcall.c: Include top.h and interps.h.
(run_inferior_call): For the interpreter to sync mode while
running the infcall. Call wait_sync_command_done instead of
wait_for_inferior plus normal_stop.
* infcmd.c (prepare_execution_command): Don't check whether the
target can async when running in the foreground.
(step_1): Delete synchronous case handling.
(step_once): Always install a continuation, even in sync mode.
(until_next_command, finish_forward): Don't check whether the
target can async.
(attach_command_post_wait, notice_new_inferior): Always install a
continuation, even in sync mode.
* infrun.c (mark_infrun_async_event_handler): New function.
(proceed): In sync mode, mark infrun's event source instead of
waiting for events here.
(fetch_inferior_event): If the target can't async, do a blocking
wait.
(prepare_to_wait): In sync mode, mark infrun's event source.
(infrun_async_inferior_event_handler): No longer bail out if the
target can't async.
* infrun.h (mark_infrun_async_event_handler): New declaration.
* linux-nat.c (linux_nat_wait_1): Remove calls to
set_sigint_trap/clear_sigint_trap.
(linux_nat_terminal_inferior): No longer check whether the target
can async.
* mi/mi-interp.c (mi_on_sync_execution_done): Update and simplify
comment.
(mi_execute_command_input_handler): No longer check whether the
target is async. Update and simplify comment.
* target.c (default_target_wait): New function.
* target.h (struct target_ops) <to_wait>: Now defaults to
default_target_wait.
(default_target_wait): Declare.
* top.c (wait_sync_command_done): New function, factored out from
...
(maybe_wait_sync_command_done): ... this.
* top.h (wait_sync_command_done): Declare.
* target-delegates.c: Regenerate.
The Linux target and gdbserver now check the siginfo si_code
reported on a SIGTRAP to detect whether the trap indicates
a software breakpoint was hit.
Unfortunately, on Cell/B.E., the kernel uses an si_code value
of TRAP_BRKPT when a SW breakpoint was hit in PowerPC code,
but a si_code value of SI_KERNEL when a SW breakpoint was
hit in SPU code.
This patch updates Linux target and gdbserver to accept both
si_code values to indicate SW breakpoint on PowerPC.
ChangeLog:
* nat/linux-ptrace.h (GDB_ARCH_TRAP_BRKPT): Replace by ...
(GDB_ARCH_IS_TRAP_BRKPT): ... this. Add __powerpc__ case.
* linux-nat.c (check_stopped_by_breakpoint): Use
GDB_ARCH_IS_TRAP_BRKPT instead of GDB_ARCH_TRAP_BRKPT.
gdbserver/ChangeLog:
* linux-low.c (check_stopped_by_breakpoint): Use
GDB_ARCH_IS_TRAP_BRKPT instead of GDB_ARCH_TRAP_BRKPT.
GDB provides no indicator of progress during file operations, and can
appear to have locked up during slow remote transfers. This commit
updates GDB to print a warning each time a file is accessed over RSP.
An additional message detailing how to avoid remote transfers is
printed for the first transfer only.
gdb/ChangeLog:
* target.h (struct target_ops) <to_fileio_open>: New argument
warn_if_slow. Update comment. All implementations updated.
(target_fileio_open_warn_if_slow): New declaration.
* target.c (target_fileio_open): Renamed as...
(target_fileio_open_1): ...this. New argument warn_if_slow.
Pass warn_if_slow to implementation. Update debug printing.
(target_fileio_open): New function.
(target_fileio_open_warn_if_slow): Likewise.
* gdb_bfd.c (gdb_bfd_iovec_fileio_open): Use new function
target_fileio_open_warn_if_slow.
gdb/testsuite/ChangeLog:
* gdb.trace/pending.exp: Cope with remote transfer warnings.
Markus reported that ASNS breaks target record-btrace. In particular,
the gdb.btrace/multi-thread-step.exp test fails (both with BTS and PT
tracing) with a crash in py-inferior.c:
Program received signal SIGSEGV, Segmentation fault.
0x00000000006aa40d in add_thread_object (tp=0x27d32d0)
at /users/mmetzger/team/gdb/git/gdb/python/py-inferior.c:337
337 entry->next = inf_obj->threads;
My machine doesn't support BTS nor PT, so I missed this...
Disabling ASNS temporarily on x86 until this is addressed.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-08-18 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_nat_always_non_stop_p): If the linux_ops
target implements to_always_non_stop_p, call it.
* x86-linux-nat.c (x86_linux_always_non_stop_p): New function.
(x86_linux_create_target): Install it as to_always_non_stop_p
method.
The testsuite shows no regressions with this forced on, on:
- Native x86_64 Fedora 20, with and output "set displaced off".
- Native x86_64 Fedora 20, on top of x86 software single-step series.
- PPC64 Fedora 18.
- S/390 RHEL 7.1.
Let's try making it the default.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_nat_always_non_stop_p): Return 1.
With "maint set target-non-stop on" we get:
@@ -66,13 +66,16 @@ Continuing.
interrupt
(gdb) PASS: gdb.base/interrupt-noterm.exp: interrupt
-Program received signal SIGINT, Interrupt.
-PASS: gdb.base/interrupt-noterm.exp: inferior received SIGINT
-testcase src/gdb/testsuite/gdb.base/interrupt-noterm.exp completed in 0 seconds
+[process 12119] #1 stopped.
+0x0000003615ebc6d0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
+81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
+FAIL: gdb.base/interrupt-noterm.exp: inferior received SIGINT (timeout)
+testcase src/gdb/testsuite/gdb.base/interrupt-noterm.exp completed in 10 seconds
That is, we get "[$thread] #1 stopped" instead of SIGINT.
The issue is that we don't currently distinguish send
"interrupt/ctrl-c" to target terminal vs "stop/pause" thread well;
both cases go through "target_stop".
And then, the native Linux backend (linux-nat.c) implements
target_stop with SIGSTOP in non-stop mode, and SIGINT in all-stop
mode. Since "maint set target-non-stop on" forces the backend to be
always running in non-stop mode, even though the user-visible behavior
is "set non-stop" is "off", "interrupt" causes a SIGSTOP instead of
the SIGINT the test expects.
Fix this by introducing a target_interrupt method to use in the
"interrupt/ctrl-c" case, so "set non-stop off" can always work the
same irrespective of "maint set target-non-stop on/off". I'm
explictly considering changing the "set non-stop on" behavior as out
of scope here.
Most of the patch is an across-the-board rename of to_stop hook
implementations to to_interrupt. The only targets where something
more than a rename is being done are linux-nat.c and remote.c, which
are the only targets that support async, and thus are the only ones
the core side calls target_stop on.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* darwin-nat.c (darwin_stop): Rename to ...
(darwin_interrupt): ... this.
(_initialize_darwin_inferior): Adjust.
* gnu-nat.c (gnu_stop): Delete.
(gnu_target): Don't install gnu_stop.
* inf-ptrace.c (inf_ptrace_stop): Rename to ...
(inf_ptrace_interrupt): ... this.
(inf_ptrace_target): Adjust.
* infcmd.c (interrupt_target_1): Use target_interrupt instead of
target_stop.
* linux-nat (linux_nat_stop): Rename to ...
(linux_nat_interrupt): ... this.
(linux_nat_stop): Reimplement.
(linux_nat_add_target): Install linux_nat_interrupt.
* nto-procfs.c (nto_interrupt_twice): Rename to ...
(nto_handle_sigint_twice): ... this.
(nto_interrupt): Rename to ...
(nto_handle_sigint): ... this. Call target_interrupt instead of
target_stop.
(procfs_wait): Adjust.
(procfs_stop): Rename to ...
(procfs_interrupt): ... this.
(init_procfs_targets): Adjust.
* procfs.c (procfs_stop): Rename to ...
(procfs_interrupt): ... this.
(procfs_target): Adjust.
* remote-m32r-sdi.c (m32r_stop): Rename to ...
(m32r_interrupt): ... this.
(init_m32r_ops): Adjust.
* remote-sim.c (gdbsim_stop_inferior): Rename to ...
(gdbsim_interrupt_inferior): ... this.
(gdbsim_stop): Rename to ...
(gdbsim_interrupt): ... this.
(gdbsim_cntrl_c): Adjust.
(init_gdbsim_ops): Adjust.
* remote.c (sync_remote_interrupt): Adjust comments.
(remote_stop_as): Rename to ...
(remote_interrupt_as): ... this.
(remote_stop): Adjust comment.
(remote_interrupt): New function.
(init_remote_ops): Install remote_interrupt.
* target.c (target_interrupt): New function.
* target.h (struct target_ops) <to_interrupt>: New field.
(target_interrupt): New declaration.
* windows-nat.c (windows_stop): Rename to ...
(windows_interrupt): ... this.
* target-delegates.c: Regenerate.
This finally implements user-visible all-stop mode running with the
target_ops backend always in non-stop mode. This is a stepping stone
towards finer-grained control of threads, being able to do interesting
things like thread groups, associating groups with breakpoints, etc.
From the user's perspective, all-stop mode is really just a special
case of being able to stop and resume specific sets of threads, so it
makes sense to do this step first.
With this, even in all-stop, the target is no longer in charge of
stopping all threads before reporting an event to the core -- the core
takes care of it when it sees fit. For example, when "next"- or
"step"-ing, we can avoid stopping and resuming all threads at each
internal single-step, and instead only stop all threads when we're
about to present the stop to the user.
The implementation is almost straight forward, as the heavy lifting
has been done already in previous patches. Basically, we replace
checks for "set non-stop on/off" (the non_stop global), with calls to
a new target_is_non_stop_p function. In a few places, if "set
non-stop off", we stop all threads explicitly, and in a few other
places we resume all threads explicitly, making use of existing
methods that were added for teaching non-stop to step over breakpoints
without displaced stepping.
This adds a new "maint set target-non-stop on/off/auto" knob that
allows both disabling the feature if we find problems, and
force-enable it for development (useful when teaching a target about
this. The default is "auto", which means the feature is enabled if a
new target method says it should be enabled. The patch implements the
method in linux-nat.c, just for illustration, because it still returns
false. We'll need a few follow up fixes before turning it on by
default. This is a separate target method from indicating regular
non-stop support, because e.g., while e.g., native linux-nat.c is
close to regression free with all-stop-non-stop (with following
patches will fixing the remaining regressions), remote.c+gdbserver
will still need more fixing, even though it supports "set non-stop
on".
Tested on x86_64 Fedora 20, native, with and without "set displaced
off", and with and without "maint set target-non-stop on"; and also
against gdbserver.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* NEWS: Mention "maint set/show target-non-stop".
* breakpoint.c (update_global_location_list): Check
target_is_non_stop_p instead of non_stop.
* infcmd.c (attach_command_post_wait, attach_command): Likewise.
* infrun.c (show_can_use_displaced_stepping)
(can_use_displaced_stepping_p, start_step_over_inferior):
Likewise.
(internal_resume_ptid): New function.
(resume): Use it.
(proceed): Check target_is_non_stop_p instead of non_stop. If in
all-stop mode but the target is always in non-stop mode, start all
the other threads that are implicitly resumed too.
(for_each_just_stopped_thread, fetch_inferior_event)
(adjust_pc_after_break, stop_all_threads): Check
target_is_non_stop_p instead of non_stop.
(handle_inferior_event): Likewise. Handle detach-fork in all-stop
with the target always in non-stop mode.
(handle_signal_stop) <random signal>: Check target_is_non_stop_p
instead of non_stop.
(switch_back_to_stepped_thread): Check target_is_non_stop_p
instead of non_stop.
(keep_going_stepped_thread): Use internal_resume_ptid.
(stop_waiting): If in all-stop mode, and the target is in non-stop
mode, stop all threads.
(keep_going_pass): Likewise, when starting a new in-line step-over
sequence.
* linux-nat.c (get_pending_status, select_event_lwp)
(linux_nat_filter_event, linux_nat_wait_1, linux_nat_wait): Check
target_is_non_stop_p instead of non_stop.
(linux_nat_always_non_stop_p): New function.
(linux_nat_stop): Check target_is_non_stop_p instead of non_stop.
(linux_nat_add_target): Install linux_nat_always_non_stop_p.
* target-delegates.c: Regenerate.
* target.c (target_is_non_stop_p): New function.
(target_non_stop_enabled, target_non_stop_enabled_1): New globals.
(maint_set_target_non_stop_command)
(maint_show_target_non_stop_command): New functions.
(_initilize_target): Install "maint set/show target-non-stop"
commands.
* target.h (struct target_ops) <to_always_non_stop_p>: New field.
(target_non_stop_enabled): New declaration.
(target_is_non_stop_p): New declaration.
gdb/doc/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.texinfo (Maintenance Commands): Document "maint set/show
target-non-stop".
The new gdb.threads/fork-plus-threads.exp test exposes one more
problem. When one types "info inferiors" after running the program,
one see's a couple inferior left still, while there should only be
inferior #1 left. E.g.:
(gdb) info inferiors
Num Description Executable
4 process 8393 /home/pedro/bugs/src/test
2 process 8388 /home/pedro/bugs/src/test
* 1 <null> /home/pedro/bugs/src/test
(gdb) info threads
Calling prune_inferiors() manually at this point (from a top gdb) does
not remove them, because they still have inf->pid != 0 (while they
shouldn't). This suggests that we never mourned those inferiors.
Enabling logs (master + previous patch) we see:
...
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped)
WL: Handling extended status 0x03057f
LHEW: Got clone event from LWP 9513, new child is LWP 9579
[New Thread 0x7ffff37b8700 (LWP 9579)]
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited)
WL: Thread 0x7ffff7fc2740 (LWP 9508) exited.
^^^^^^^^
[Thread 0x7ffff7fc2740 (LWP 9508) exited]
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited)
WL: Thread 0x7ffff7fc2740 (LWP 9499) exited.
[Thread 0x7ffff7fc2740 (LWP 9499) exited]
RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0
...
(gdb) info inferiors
Num Description Executable
5 process 9508 /home/pedro/bugs/src/test
^^^^
4 process 9503 /home/pedro/bugs/src/test
3 process 9500 /home/pedro/bugs/src/test
2 process 9499 /home/pedro/bugs/src/test
* 1 <null> /home/pedro/bugs/src/test
(gdb)
...
Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line.
That's this in wait_lwp:
/* Check if the thread has exited. */
if (WIFEXITED (status) || WIFSIGNALED (status))
{
thread_dead = 1;
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
target_pid_to_str (lp->ptid));
}
}
That was the leader thread reporting an exit, meaning the whole
process is gone. So the problem is that this code doesn't understand
that an WIFEXITED status of the leader LWP should be reported to
infrun as process exit.
gdb/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
PR threads/18600
* linux-nat.c (wait_lwp): Report to the core when thread group
leader exits.
gdb/testsuite/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
PR threads/18600
* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
only shows inferior 1.
When a program forks and another process start threads while gdb is
handling the fork event, newly created threads are left stuck stopped
by gdb, even though gdb presents them as "running", to the user.
This can be seen with the test added by this patch. The test has the
inferior fork a certain number of times and waits for all children to
exit. Each fork child spawns a number of threads that do nothing and
joins them immediately. Normally, the program should run unimpeded
(from the point of view of the user) and exit very quickly. Without
this fix, it doesn't because of some threads left stopped by gdb, so
inferior 1 never exits.
The program triggers when a new clone thread is found while inside the
linux_stop_and_wait_all_lwps call in linux-thread-db.c:
linux_stop_and_wait_all_lwps ();
ALL_LWPS (lp)
if (ptid_get_pid (lp->ptid) == pid)
thread_from_lwp (lp->ptid);
linux_unstop_all_lwps ();
Within linux_stop_and_wait_all_lwps, we reach
linux_handle_extended_wait with the "stopping" parameter set to 1, and
because of that we don't mark the new lwp as resumed. As consequence,
the subsequent resume_stopped_resumed_lwps, called from
linux_unstop_all_lwps, never resumes the new LWP.
There's lots of cruft in linux_handle_extended_wait that no longer
makes sense. On systems with CLONE events support, we don't rely on
libthread_db for thread listing anymore, so the code that preserves
stop_requested and the handling of last_resume_kind is all dead.
So the fix is to remove all that, and simply always mark the new LWP
as resumed, so that resume_stopped_resumed_lwps re-resumes it.
gdb/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
Simon Marchi <simon.marchi@ericsson.com>
PR threads/18600
* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
mark the new thread as resumed. Remove STOPPING parameter.
(wait_lwp): Adjust call to linux_handle_extended_wait.
(linux_nat_filter_event): Adjust call to
linux_handle_extended_wait.
(resume_stopped_resumed_lwps): Add debug output.
gdb/testsuite/ChangeLog:
2015-07-30 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR threads/18600
* gdb.threads/fork-plus-threads.c: New file.
* gdb.threads/fork-plus-threads.exp: New file.
If a non-leader thread exits the process while all other threads are
ptrace-stopped, native gdb fails an assertion. The test added by this
commit catches it:
/home/pedro/gdb/mygit/build/../src/gdb/linux-nat.c:3198: internal-error: linux_nat_filter_event: Assertion `lp->resumed' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
FAIL: gdb.threads/non-leader-exit-process.exp: program exits normally (GDB internal error)
The fix is just to remove the assertion.
With that out of the way, neither GDB not GDBserver handle this
perfectly though, so I'm adding a KFAIL:
(gdb) continue
Continuing.
[Thread 0x7ffff7fc0700 (LWP 15350) exited]
No unwaited-for children left.
Couldn't get registers: No such process.
(gdb) KFAIL: gdb.threads/non-ldr-exit.exp: program exits normally (PRMS: gdb/18717)
gdb/ChangeLog:
2015-07-24 Pedro Alves <palves@redhat.com>
PR gdb/18717
* linux-nat.c (linux_nat_filter_event): Don't assert that the lwp
is resumed, and extend the debug log.
gdb/testsuite/ChangeLog:
2015-07-24 Pedro Alves <palves@redhat.com>
PR gdb/18717
* gdb.threads/non-ldr-exit.c: New file.
* gdb.threads/non-ldr-exit.exp: New file.
have_ptrace_getregset is a tri-state variable (-1, 0, 1), and we have
some conditions like "if (have_ptrace_getregset)", which is not correct.
I'll explain why it is not correct in the following example. This fix
to this problem to replace the test (have_ptrace_getregset) to test
(have_ptrace_getregset == 1) or (have_ptrace_getregset == -1) etc.
However Doug thinks it hinders readability
https://sourceware.org/ml/gdb-patches/2015-05/msg00692.html so I decide
to add a new enum tribool and change have_ptrace_getregset to it, in
order to make these tests more readable.
have_ptrace_getregset is initialised to -1, and is adjusted to 0 or 1 in
$ARCH_linux_read_description according to the capability of the kernel.
However, it is possible that have_ptrace_getregset is used before it is
set to 0 or 1, which means it is still -1. This is shown below.
(gdb) run
Starting program: gdb/testsuite/gdb.base/break
Breakpoint 2, amd64_linux_fetch_inferior_registers (ops=0xceaa80, regcache=0xe72000, regnum=16) at git/gdb/amd64-linux-nat.c:128
128 {
top?p have_ptrace_getregset
$1 = TRIBOOL_UNKNOWN
top?c
Continuing.
Breakpoint 2, amd64_linux_fetch_inferior_registers (ops=0xceaa80, regcache=0xe72000, regnum=16) at git/gdb/amd64-linux-nat.c:128
128 {
top?c
Continuing.
Breakpoint 1, x86_linux_read_description (ops=0xceaa80) at git/gdb/x86-linux-nat.c:117
117 {
PTRACE_GETREGSET command is used even GDB doesn't know whether
PTRACE_GETREGSET is supported or not. It is wrong, but works on x86.
However it doesn't work on arm-linux if the kernel doesn't support
PTRACE_GETREGSET at all. We'll get:
(gdb) run
Starting program: gdb/testsuite/gdb.base/break
warning: Unable to fetch general register.
PC register is not available
gdb:
2015-06-23 Yao Qi <yao.qi@linaro.org>
* amd64-linux-nat.c (amd64_linux_fetch_inferior_registers):
Check whether have_ptrace_getregset is TRIBOOL_TRUE explicitly.
(amd64_linux_store_inferior_registers): Likewise.
* arm-linux-nat.c (fetch_fpregister): Likewise.
(fetch_fpregs, store_fpregister): Likewise.
(store_fpregister, store_fpregs): Likewise.
(fetch_register, fetch_regs): Likewise.
(store_register, store_regs): Likewise.
(fetch_vfp_regs, store_vfp_regs): Likewise.
(arm_linux_read_description): Check have_ptrace_getregset is
TRIBOOL_UNKNOWN. Set have_ptrace_getregset to TRIBOOL_TRUE
or TRIBOOL_FALSE.
* i386-linux-nat.c (fetch_xstateregs): Check
have_ptrace_getregset is not TRIBOOL_TRUE.
(store_xstateregs): Likewise.
* linux-nat.c (have_ptrace_getregset): Change its type to
enum tribool.
* linux-nat.h (tribool): New enum.
* x86-linux-nat.c (x86_linux_read_description): Use enum tribool.
Check whether have_ptrace_getregset is TRIBOOL_TRUE.
This commit allows GDB to access executables and shared libraries
on native Linux targets where GDB and the inferior have different
mount namespaces.
gdb/ChangeLog:
* linux-nat.c (nat/linux-namespaces.h): New include.
(fileio.h): Likewise.
(linux_nat_filesystem_is_local): New function.
(linux_nat_fileio_pid_of): Likewise.
(linux_nat_fileio_open): Likewise.
(linux_nat_fileio_readlink): Likewise.
(linux_nat_fileio_unlink): Likewise.
(linux_nat_add_target): Initialize to_filesystem_is_local,
to_fileio_open, to_fileio_readlink and to_fileio_unlink.
(_initialize_linux_nat): New "set/show debug linux-namespaces"
commands.
* NEWS: Mention new "set/show debug linux-namespaces" commands.
gdb/doc/ChangeLog:
* gdb.texinfo (Debugging Output): Document the "set/show debug
linux-namespaces" command.
I'll let arm-linux-nat.c to use PTRACE_GETREGSET if kernel supports,
so this patch is to move have_ptrace_getregset from x86-linux-nat.c
to linux-nat.c.
gdb:
2015-06-01 Yao Qi <yao.qi@linaro.org>
* x86-linux-nat.c (have_ptrace_getregset): Move it to ...
* linux-nat.c: ... here.
* x86-linux-nat.h (have_ptrace_getregset): Move the declaration
to ...
* linux-nat.h: ... here.
This patch implements basic support for follow-fork and detach-on-fork on
extended-remote Linux targets. Only 'fork' is supported in this patch;
'vfork' support is added n a subsequent patch. This patch depends on
the previous patches in the patch series.
Sufficient extended-remote functionality has been implemented here to pass
gdb.base/multi-forks.exp, as well as gdb.base/foll-fork.exp with the
catchpoint tests commented out. Some other fork tests fail with this
patch because it doesn't provide the architecture support needed for
watchpoint inheritance or fork catchpoints.
The implementation follows the same general structure as for the native
implementation as much as possible.
This implementation includes:
* enabling fork events in linux-low.c in initialize_low and
linux_enable_extended_features
* handling fork events in gdbserver/linux-low.c:handle_extended_wait
- when a fork event occurs in gdbserver, we must do the full creation
of the new process, thread, lwp, and breakpoint lists. This is
required whether or not the new child is destined to be
detached-on-fork, because GDB will make target calls that require all
the structures. In particular we need the breakpoint lists in order
to remove the breakpoints from a detaching child. If we are not
detaching the child we will need all these structures anyway.
- as part of this event handling we store the target_waitstatus in a new
member of the parent lwp_info structure, 'waitstatus'. This
is used to store extended event information for reporting to GDB.
- handle_extended_wait is given a return value, denoting whether the
handled event should be reported to GDB. Previously it had only
handled clone events, which were never reported.
* using a new predicate in gdbserver to control handling of the fork event
(and eventually all extended events) in linux_wait_1. The predicate,
extended_event_reported, checks a target_waitstatus.kind for an
extended ptrace event.
* implementing a new RSP 'T' Stop Reply Packet stop reason: "fork", in
gdbserver/remote-utils.c and remote.c.
* implementing new target and RSP support for target_follow_fork with
target extended-remote. (The RSP components were actually defined in
patch 1, but they see their first use here).
- remote target routine remote_follow_fork, which just sends the 'D;pid'
detach packet to detach the new fork child cleanly. We can't just
call target_detach because the data structures for the forked child
have not been allocated on the host side.
Tested on x64 Ubuntu Lucid, native, remote, extended-remote.
gdb/gdbserver/ChangeLog:
* linux-low.c (handle_extended_wait): Implement return value,
rename argument 'event_child' to 'event_lwp', handle
PTRACE_EVENT_FORK, call internal_error for unrecognized event.
(linux_low_ptrace_options): New function.
(linux_low_filter_event): Call linux_low_ptrace_options,
use different argument fo linux_enable_event_reporting,
use return value from handle_extended_wait.
(extended_event_reported): New function.
(linux_wait_1): Call extended_event_reported and set
status to report fork events.
(linux_write_memory): Add pid to debug message.
(reset_lwp_ptrace_options_callback): New function.
(linux_handle_new_gdb_connection): New function.
(linux_target_ops): Initialize new structure member.
* linux-low.h (struct lwp_info) <waitstatus>: New member.
* lynx-low.c: Initialize new structure member.
* remote-utils.c (prepare_resume_reply): Implement stop reason
"fork" for "T" stop message.
* server.c (handle_query): Call handle_new_gdb_connection.
* server.h (report_fork_events): Declare global flag.
* target.h (struct target_ops) <handle_new_gdb_connection>:
New member.
(target_handle_new_gdb_connection): New macro.
* win32-low.c: Initialize new structure member.
gdb/ChangeLog:
* linux-nat.c (linux_nat_ptrace_options): New function.
(linux_init_ptrace, wait_lwp, linux_nat_filter_event):
Call linux_nat_ptrace_options and use different argument to
linux_enable_event_reporting.
(_initialize_linux_nat): Delete call to
linux_ptrace_set_additional_flags.
* nat/linux-ptrace.c (current_ptrace_options): Rename to
supported_ptrace_options.
(additional_flags): Delete variable.
(linux_check_ptrace_features): Use supported_ptrace_options.
(linux_test_for_tracesysgood, linux_test_for_tracefork):
Likewise, and remove additional_flags check.
(linux_enable_event_reporting): Change 'attached' argument to
'options'. Use supported_ptrace_options.
(ptrace_supports_feature): Change comment. Use
supported_ptrace_options.
(linux_ptrace_set_additional_flags): Delete function.
* nat/linux-ptrace.h (linux_ptrace_set_additional_flags):
Delete function prototype.
* remote.c (remote_fork_event_p): New function.
(remote_detach_pid): New function.
(remote_detach_1): Call remote_detach_pid, don't mourn inferior
if doing detach-on-fork.
(remote_follow_fork): New function.
(remote_parse_stop_reply): Handle new "T" stop reason "fork".
(remote_pid_to_str): Print "process" strings for pid/0/0 ptids.
(init_extended_remote_ops): Initialize to_follow_fork.
This commit introduces a new function linux_proc_pid_to_exec_file
that shared Linux code can use to discover the filename of the
executable that was run to create a process on the system.
gdb/ChangeLog:
* nat/linux-procfs.h (linux_proc_pid_to_exec_file):
New declaration.
* nat/linux-procfs.c (linux_proc_pid_to_exec_file):
New function, factored out from...
* linux-nat.c (linux_child_pid_to_exec_file): ...here.
On GNU/Linux, if the running kernel supports clone events, then
linux-thread-db.c defers thread listing to the target beneath:
static void
thread_db_update_thread_list (struct target_ops *ops)
{
...
if (target_has_execution && !thread_db_use_events ())
ops->beneath->to_update_thread_list (ops->beneath);
else
thread_db_update_thread_list_td_ta_thr_iter (ops);
...
}
However, when live debugging, the target beneath, linux-nat.c, does
not implement the to_update_thread_list method. The result is that if
a thread is marked exited (because it can't be deleted right now,
e.g., it was the selected thread), then it won't ever be deleted,
until the process exits or is killed/detached.
A similar thing happens with the remote.c target. Because its
target_update_thread_list implementation skips exited threads when it
walks the current thread list looking for threads that no longer exits
on the target side, using ALL_NON_EXITED_THREADS_SAFE, stale exited
threads are never deleted.
This is not a big deal -- I can't think of any way this might be user
visible, other than gdb's memory growing a tiny bit whenever a thread
gets stuck in exited state. Still, might as well clean things up
properly.
All other targets use prune_threads, so are unaffected.
The fix adds a ALL_THREADS_SAFE macro, that like
ALL_NON_EXITED_THREADS_SAFE, walks the thread list and allows deleting
the iterated thread, and uses that in places that are walking the
thread list in order to delete threads. Actually, after converting
linux-nat.c and remote.c to use this, we find the only other user of
ALL_NON_EXITED_THREADS_SAFE is also walking the list to delete
threads. So we convert that too, and end up deleting
ALL_NON_EXITED_THREADS_SAFE.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/ChangeLog
2015-04-07 Pedro Alves <palves@redhat.com>
* gdbthread.h (ALL_NON_EXITED_THREADS_SAFE): Rename to ...
(ALL_THREADS_SAFE): ... this, and don't skip exited threads.
(delete_exited_threads): New declaration.
* infrun.c (follow_exec): Use ALL_THREADS_SAFE.
* linux-nat.c (linux_nat_update_thread_list): New function.
(linux_nat_add_target): Install it.
* remote.c (remote_update_thread_list): Use ALL_THREADS_SAFE.
* thread.c (prune_threads): Use ALL_THREADS_SAFE.
(delete_exited_threads): New function.
My all-stop-on-top-of-non-stop series manages to trip on a bug in the
linux-nat.c backend while running the testsuite. If a thread is
discovered while threads are being momentarily paused (without the
core's intervention), the thread ends up stuck in THREAD_STOPPED
state, even though from the user's perspective, the thread is running
even while it is paused.
From inspection, in the current sources, this can happen if we call
stop_and_resume_callback, though there's no way to test that with
current Linux kernels.
(While trying to come up with test to exercise this, I stumbled on:
https://sourceware.org/ml/gdb-patches/2015-03/msg00850.html
... which does include a non-trivial test, so I think I can still
claim I come out net positive. :-) )
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_handle_extended_wait): Always call set_running.
All callers of target_async pass it the same callback
(inferior_event_handler). Since both common code and target backends
need to be able to put the target in and out of target async mode at
any given time, there's really no way that a different callback could
be passed. This commit simplifies things, and removes the indirection
altogether. Bonus: with this, gdb's target_async method ends up with
the same signature as gdbserver's.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/ChangeLog:
2015-03-25 Pedro Alves <palves@redhat.com>
* target.h <to_async>: Replace 'callback' and 'context' parameters
with boolean 'enable' parameter.
(target_async): Replace CALLBACK and CONTEXT parameters with
boolean ENABLE parameter.
* inf-loop.c (inferior_event_handler): Adjust.
* linux-nat.c (linux_nat_attach, linux_nat_resume)
(linux_nat_resume): Adjust.
(async_client_callback, async_client_context): Delete.
(handle_target_event): Call inferior_event_handler directly.
(linux_nat_async): Replace 'callback' and 'context' parameters
with boolean 'enable' parameter. Adjust. Remove references to
async_client_callback and async_client_context.
(linux_nat_close): Adjust.
* record-btrace.c (record_btrace_async): Replace 'callback' and
'context' parameters with boolean 'enable' parameter. Adjust.
(record_btrace_resume): Adjust.
* record-full.c (record_full_async): Replace 'callback' and
'context' parameters with boolean 'enable' parameter. Adjust.
(record_full_resume, record_full_core_resume): Adjust.
* remote.c (struct remote_state) <async_client_callback,
async_client_context>: Delete fields.
(remote_start_remote, extended_remote_attach_1, remote_resume)
(extended_remote_create_inferior): Adjust.
(remote_async_serial_handler): Call inferior_event_handler
directly.
(remote_async): Replace 'callback' and 'context' parameters with
boolean 'enable' parameter. Adjust.
* top.c (gdb_readline_wrapper_cleanup, gdb_readline_wrapper):
Adjust.
* target-delegates.c: Regenerate.
This adds/tweaks a few debug logs I found useful recently.
gdb/gdbserver/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* linux-low.c (check_stopped_by_breakpoint): Tweak debug log
output. Also dump TRAP_TRACE.
(linux_low_filter_event): In debug output, distinguish a
resume_stop SIGSTOP from a delayed SIGSTOP.
gdb/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_nat_resume): Output debug logs before trying
to resume the event lwp. Use the lwp's ptid instead of the passed
in (maybe wildcard) ptid.
(stop_wait_callback): Tweak debug log output.
(check_stopped_by_breakpoint): Tweak debug log output. Also dump
TRAP_TRACE.
(linux_nat_filter_event): In debug output, distinguish a
resume_stop SIGSTOP from a delayed SIGSTOP. Output debug logs
before trying to resume the lwp.
This commit moves the code to handle lwp_info.arch_private for
Linux x86 into a new shared file, nat/x86-linux.c.
gdb/ChangeLog:
* nat/x86-linux.h: New file.
* nat/x86-linux.c: Likewise.
* Makefile.in (HFILES_NO_SRCDIR): Add nat/x86-linux.h.
(x86-linux.o): New rule.
* config/i386/linux.mh (NATDEPFILES): Add x86-linux.o.
* config/i386/linux64.mh (NATDEPFILES): Likewise.
* nat/linux-nat.h (struct arch_lwp_info): New forward declaration.
(lwp_set_arch_private_info): New declaration.
(lwp_arch_private_info): Likewise.
* linux-nat.c (lwp_set_arch_private_info): New function.
(lwp_arch_private_info): Likewise.
* x86-linux-nat.c: Include nat/x86-linux.h.
(arch_lwp_info): Removed structure.
(update_debug_registers_callback):
Use lwp_set_debug_registers_changed.
(x86_linux_prepare_to_resume): Use lwp_debug_registers_changed
and lwp_set_debug_registers_changed.
(x86_linux_new_thread): Use lwp_set_debug_registers_changed.
gdb/gdbserver/ChangeLog:
* Makefile.in (x86-linux.o): New rule.
* configure.srv: Add x86-linux.o to relevant targets.
* linux-low.c (lwp_set_arch_private_info): New function.
(lwp_arch_private_info): Likewise.
* linux-x86-low.c: Include nat/x86-linux.h.
(arch_lwp_info): Removed structure.
(update_debug_registers_callback):
Use lwp_set_debug_registers_changed.
(x86_linux_prepare_to_resume): Use lwp_debug_registers_changed
and lwp_set_debug_registers_changed.
(x86_linux_new_thread): Use lwp_set_debug_registers_changed.
This commit introduces three accessors that shared Linux code can
use to access fields of struct lwp_info. The GDB and gdbserver
Linux x86 code is modified to use them.
gdb/ChangeLog:
* nat/linux-nat.h (ptid_of_lwp): New declaration.
(lwp_is_stopped): Likewise.
(lwp_stop_reason): Likewise.
* linux-nat.c (ptid_of_lwp): New function.
(lwp_is_stopped): Likewise.
(lwp_is_stopped_by_watchpoint): Likewise.
* x86-linux-nat.c (update_debug_registers_callback):
Use lwp_is_stopped.
(x86_linux_prepare_to_resume): Use ptid_of_lwp and
lwp_stop_reason.
gdb/gdbserver/ChangeLog:
* linux-low.c (ptid_of_lwp): New function.
(lwp_is_stopped): Likewise.
(lwp_stop_reason): Likewise.
* linux-x86-low.c (update_debug_registers_callback):
Use lwp_is_stopped.
(x86_linux_prepare_to_resume): Use ptid_of_lwp and
lwp_stop_reason.
This commit introduces a new function, iterate_over_lwps, that
shared Linux code can use to call a function for each LWP that
matches certain criteria. This function already existed in GDB
and was in use by GDB's various low-level Linux x86 debug register
setters. An equivalent was written for gdbserver and gdbserver's
low-level Linux x86 debug register setters were modified to use
it.
gdb/ChangeLog:
* linux-nat.h: Include nat/linux-nat.h.
(iterate_over_lwps): Move declaration to nat/linux-nat.h.
* nat/linux-nat.h (struct lwp_info): New forward declaration.
(iterate_over_lwps_ftype): New typedef.
(iterate_over_lwps): New declaration.
* linux-nat.h (iterate_over_lwps): Update comment. Use
iterate_over_lwps_ftype. Update callback return value check.
gdb/gdbserver/ChangeLog:
* linux-low.h: Include nat/linux-nat.h.
* linux-low.c (iterate_over_lwps_args): New structure.
(iterate_over_lwps_filter): New function.
(iterate_over_lwps): Likewise.
* linux-x86-low.c (update_debug_registers_callback):
Update signature to what iterate_over_lwps expects.
Remove PID check that iterate_over_lwps now performs.
(x86_dr_low_set_addr): Use iterate_over_lwps.
(x86_dr_low_set_control): Likewise.
This commit introduces a new function, current_lwp_ptid, that
shared Linux code can use to obtain the ptid of the current
lightweight process.
gdb/ChangeLog:
* nat/linux-nat.h (current_lwp_ptid): New declaration.
* linux-nat.c (current_lwp_ptid): New function.
* x86-linux-nat.c: Include nat/linux-nat.h.
(x86_linux_dr_get_addr): Use current_lwp_ptid.
(x86_linux_dr_get_control): Likewise.
(x86_linux_dr_get_status): Likewise.
(x86_linux_dr_set_control): Likewise.
(x86_linux_dr_set_addr): Likewise.
gdb/gdbserver/ChangeLog:
* linux-low.c (current_lwp_ptid): New function.
* linux-x86-low.c: Include nat/linux-nat.h.
(x86_dr_low_get_addr): Use current_lwp_ptid.
(x86_dr_low_get_control): Likewise.
(x86_dr_low_get_status): Likewise.
On GNU/Linux, this test sometimes FAILs like this:
(gdb) run
Starting program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/killed
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
ptrace: No such process.
(gdb)
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
FAIL: gdb.threads/killed.exp: run program to completion (timeout)
Note the suspicious "No such process" line (that's errno==ESRCH).
Adding debug output we see:
linux_nat_wait: [process -1], [TARGET_WNOHANG]
LLW: enter
LNW: waitpid(-1, ...) returned 18465, ERRNO-OK
LLW: waitpid 18465 received Stopped (signal) (stopped)
LNW: waitpid(-1, ...) returned 18461, ERRNO-OK
LLW: waitpid 18461 received Trace/breakpoint trap (stopped)
LLW: Handling extended status 0x03057f
LHEW: Got clone event from LWP 18461, new child is LWP 18465
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: resuming stopped-resumed LWP LWP 18465 at 0x3b36af4b51: step=0
RSRL: resuming stopped-resumed LWP LWP 18461 at 0x3b36af4b51: step=0
sigchld
ptrace: No such process.
(gdb) linux_nat_wait: [process -1], [TARGET_WNOHANG]
LLW: enter
LNW: waitpid(-1, ...) returned 18465, ERRNO-OK
LLW: waitpid 18465 received Killed (terminated)
LLW: LWP 18465 exited.
LNW: waitpid(-1, ...) returned 18461, No child processes
LLW: waitpid 18461 received Killed (terminated)
Process 18461 exited
LNW: waitpid(-1, ...) returned -1, No child processes
LLW: exit
sigchld
infrun: target_wait (-1, status) =
infrun: 18461 [process 18461],
infrun: status->kind = signalled, signal = GDB_SIGNAL_KILL
infrun: TARGET_WAITKIND_SIGNALLED
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
infrun: stop_waiting
FAIL: gdb.threads/killed.exp: run program to completion (timeout)
The issue is that here:
RSRL: resuming stopped-resumed LWP LWP 18465 at 0x3b36af4b51: step=0
RSRL: resuming stopped-resumed LWP LWP 18461 at 0x3b36af4b51: step=0
The first line shows we had just resumed LWP 18465, which does:
void *
child_func (void *dummy)
{
kill (pid, SIGKILL);
exit (1);
}
So if the kernel manages to schedule that thread fast enough, the
process may be killed before GDB has a chance to resume LWP 18461.
GDBserver has code at the tail end of linux_resume_one_lwp to cope
with this:
~~~
ptrace (step ? PTRACE_SINGLESTEP : PTRACE_CONT, lwpid_of (thread),
(PTRACE_TYPE_ARG3) 0,
/* Coerce to a uintptr_t first to avoid potential gcc warning
of coercing an 8 byte integer to a 4 byte pointer. */
(PTRACE_TYPE_ARG4) (uintptr_t) signal);
current_thread = saved_thread;
if (errno)
{
/* ESRCH from ptrace either means that the thread was already
running (an error) or that it is gone (a race condition). If
it's gone, we will get a notification the next time we wait,
so we can ignore the error. We could differentiate these
two, but it's tricky without waiting; the thread still exists
as a zombie, so sending it signal 0 would succeed. So just
ignore ESRCH. */
if (errno == ESRCH)
return;
perror_with_name ("ptrace");
}
~~~
However, that's not a complete fix, because between starting to handle
the resume request and getting that PTRACE_CONTINUE, we run other
ptrace calls that can also fail with ESRCH, and that end up throwing
an error (with perror_with_name).
In the case above, I indeed sometimes see resume_stopped_resumed_lwps
fail in the registers read:
resume_stopped_resumed_lwps (struct lwp_info *lp, void *data)
{
...
CORE_ADDR pc = regcache_read_pc (regcache);
Or e.g., in 32-bit mode, i386_linux_resume has several calls that can
throw too.
Whether to ignore ptrace errors or not depends on context that is only
available somewhere up the call chain. So the fix is to let ptrace
errors throw as they do today, and wrap the resume request in a
TRY/CATCH that swallows it iff the lwp that we were trying to resume
is no longer ptrace-stopped.
gdb/gdbserver/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-low.c (linux_resume_one_lwp): Rename to ...
(linux_resume_one_lwp_throw): ... this. Don't handle ESRCH here,
instead call perror_with_name.
(check_ptrace_stopped_lwp_gone): New function.
(linux_resume_one_lwp): Reimplement as wrapper around
linux_resume_one_lwp_throw that swallows errors if the LWP is
gone.
gdb/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_resume_one_lwp): Rename to ...
(linux_resume_one_lwp_throw): ... this. Don't handle ESRCH here,
instead call perror_with_name.
(check_ptrace_stopped_lwp_gone): New function.
(linux_resume_one_lwp): Reimplement as wrapper around
linux_resume_one_lwp_throw that swallows errors if the LWP is
gone.
(resume_stopped_resumed_lwps): Try register reads in TRY/CATCH and
swallows errors if the LWP is gone. Use
linux_resume_one_lwp_throw instead of linux_resume_one_lwp.
Wanting to make sure the new continue-pending-status.exp test tests
both cases of threads 2 and 3 reporting an event, I added counters to
the test, to make it FAIL if events for both threads aren't seen.
Assuming a well behaved backend, and given a reasonable number of
iterations, it should PASS.
However, running that against GNU/Linux gdbserver, I found that
surprisingly, that FAILed. GDBserver always reported the breakpoint
hit for the same thread.
Turns out that I broke gdbserver's thread event randomization
recently, with git commit 582511be ([gdbserver] linux-low.c: better
starvation avoidance, handle non-stop mode too). In that commit I
missed that the thread structure also has a status_pending_p field...
The end result was that count_events_callback always returns 0, and
then if no thread is stepping, select_event_lwp always returns the
event thread. IOW, no randomization is happening at all. Quite
curious how all the other changes in that patch were sufficient to fix
non-stop-fair-events.exp anyway even with that broken.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/gdbserver/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-low.c (count_events_callback, select_event_lwp_callback):
Use the lwp's status_pending_p field, not the thread's.
gdb/testsuite/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* gdb.threads/continue-pending-status.exp (saw_thread_2)
(saw_thread_3): New globals.
(top level): Increment them when an event for the corresponding
thread is seen.
(no thread starvation): New test.
If the linux_nat_resume's short-circuits the resume because the
current thread has a pending status, and, a thread with a higher
number was previously stopped for a breakpoint, GDB internal errors,
like:
/home/pedro/gdb/mygit/src/gdb/linux-nat.c:2590: internal-error: status_callback: Assertion `lp->status != 0' failed.
Fix this by make status_callback bail out earlier. GDBserver is
already doing the same.
New test added that exercises this.
gdb/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-nat.c (status_callback): Return early if the LWP has no
status pending.
gdb/testsuite/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* gdb.threads/continue-pending-status.c: New file.
* gdb.threads/continue-pending-status.exp: New file.