binutils-gdb/gdb/testsuite/gdb.base/gdb-sigterm.exp

95 lines
2.7 KiB
Plaintext
Raw Normal View History

# This testcase is part of GDB, the GNU debugger.
#
# Copyright 2013-2018 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
standard_testfile
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
# The test program exits after a while, in case GDB crashes. Make it
# wait at least as long as we may wait before declaring a time out
# failure.
set options { "additional_flags=-DTIMEOUT=$timeout" debug }
if { [build_executable ${testfile}.exp ${testfile} $srcfile $options] == -1 } {
return -1
}
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
# Return 0 on success, non-zero otherwise.
proc do_test { pass } {
global testfile gdb_prompt binfile pf_prefix
if ![runto_main] {
return -1
}
gdb_breakpoint "${testfile}.c:[gdb_get_line_number "loop-line" ${testfile}.c]" \
temporary
# gdb_continue_to_breakpoint would print a pass message.
gdb_test "continue" "Temporary breakpoint .* loop-line .*" ""
gdb_test_no_output "set range-stepping off" ""
gdb_test_no_output "set debug infrun 1" ""
set test "run a bit #$pass"
set abort 1
gdb_test_multiple "step" $test {
-re "infrun: stepping inside range" {
# Suppress pass $test
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
verbose -log "$pf_prefix $test: ran"
set abort 0
}
}
if $abort {
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
verbose -log "$pf_prefix $test: did not run"
return $abort
}
set gdb_pid [exp_pid -i [board_info host fileid]]
remote_exec host "kill -TERM ${gdb_pid}"
set test "expect eof #$pass"
set abort 1
set stepping 0
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
# If GDB mishandles the SIGTERM and doesn't exit, this should FAIL
# with timeout. We don't expect a GDB prompt, so we see one,
# we'll FAIL too.
gdb_test_multiple "" $test {
eof {
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
verbose -log "$pf_prefix $test: got eof"
set abort 0
}
-re "infrun: stepping inside range" {
incr stepping
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
exp_continue
}
}
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
verbose -log "$pf_prefix $test: stepped $stepping times"
return $abort
}
# Testcase was FAILing approx. on 10th pass with unpatched GDB.
# 50 runs should be approx. a safe number to be sure it is fixed now.
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
set passes 50
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
for {set pass 0} {$pass < $passes} {incr pass} {
clean_restart ${testfile}
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
if { [do_test $pass] != 0 } {
break
}
}
gdb.base/gdb-sigterm.exp: Fix spurious FAILs The buildbot shows that some machines FAIL this test frequently. E.g.: https://sourceware.org/ml/gdb-testers/2015-q1/msg00997.html If I stress my machine, I can sometimes see it fail too. Bumping the 200 limit and tweaking the test to show the step count, I get: ... PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times --> FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times <-- PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 13 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 12 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 9 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 7 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 11 times PASS: gdb.base/gdb-sigterm.exp: SIGTERM stepped 8 times ... Thinking that this might be a problem of SIGTERM reaching GDB, but then the event loop taking too long to handle it, I hacked GDB to print a debug log whenever the SIGTERM handler was called, and, whenever the event loop finally calls the async SIGTERM handler. Here's what I see: infrun: 30011 [Thread 30011], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x4005de --> infrun: got SIGTERM <-- infrun: stepping inside range [0x4005de-0x4005e0] infrun: resume (step=1, signal=GDB_SIGNAL_0), ... infrun: prepare_to_wait --> infrun: handling async SIGTERM <-- Cannot execute this command while the target is running. Use the "interrupt" command to stop the target and then try again. gdb.base/gdb-sigterm.exp: expect eof #27 FAIL: gdb.base/gdb-sigterm.exp: SIGTERM stepped 228 times So, no delay on the GDB side. It just happens that occasionally it takes more than 200 single-steps before SIGTERM even reaches GDB. This just looks like a kernel/scheduling issue --- some extra usage spike in the system (e.g., an I/O spike) might cause it for me. For the build slaves, I'm guessing they're frequently busy enough to trip on this often. Particularly more so now that we're having them run tests in parallel mode. The fix is to detect failure by timeout instead of counting single steps. This should be more reliable. Indeed for me, after this commit, I couldn't trigger a FAIL anymore, even after letting the test run for an hour. By timeout is also nicer in that a board file for a slow host/target can increase it (like, e.g., an embedded GNU/Linux board). Tested on x86_64 Fedora 20, native, gdbserver, and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.base/gdb-sigterm.c (main): Use the TIMEOUT define to determine how many seconds to pass to 'alarm'. * gdb.base/gdb-sigterm.exp (top level): Build program with -DTIMEOUT=$timeout. (do_test): Return success/failure indication. Add more verbose logging. Don't fail if 200 single steps are seen. Instead, fail when the test times out. (passes): New global. (top level): Break the testing loop if testing fails on any iteration. Use gdb_assert.
2015-02-06 11:09:42 +01:00
gdb_assert {$pass == $passes} "$passes SIGTERM passes"