This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is
defined twice in plugin-nvptx.c) are minimized, to make it obvious that the
definitions are used only in the lib-cuda.def include.
Build on x86_64 with nvptx accelerator and reg-tested libgomp.
2018-08-07 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put
CUDA_ONE_CALL defines right before the cuda-lib.def include, and the
corresponding undefs right after.
From-SVN: r263345
* tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Pass down
the vr_values instance to cprop_into_stmt.
(cprop_into_stmt): Pass vr_values instance down to cprop_operand.
(cprop_operand): Also query EVRP to determine if OP is a constant.
From-SVN: r263342
"make selftest-valgrind" shows:
187 bytes in 1 blocks are definitely lost in loss record 567 of 669
at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x1F08260: xcalloc (xmalloc.c:162)
by 0xB24F32: init_emit() (emit-rtl.c:5843)
by 0xC10080: prepare_function_start() (function.c:4803)
by 0xC10254: init_function_start(tree_node*) (function.c:4877)
by 0x1CDF92A: selftest::test_expansion_to_rtl() (function-tests.c:595)
by 0x1CE007C: selftest::function_tests_c_tests() (function-tests.c:676)
by 0x1E010E7: selftest::run_tests() (selftest-run-tests.c:98)
by 0x1062D1E: toplev::run_self_tests() (toplev.c:2225)
by 0x1062F40: toplev::main(int, char**) (toplev.c:2303)
by 0x1E5B90A: main (main.c:39)
The allocation in question is:
crtl->emit.regno_pointer_align
= XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length);
This patch fixes this leak (and makes the output of
"make selftest-valgrind" clean) by calling free_after_compilation at the
end of the selftest in question.
gcc/ChangeLog:
* function-tests.c (selftest::test_expansion_to_rtl): Call
free_after_compilation.
From-SVN: r263339
gcc/ChangeLog:
2018-08-06 Andreas Krebbel <krebbel@linux.ibm.com>
* config/s390/s390.c (s390_loop_unroll_adjust): Prevent small
loops with memory block operations from getting unrolled.
gcc/testsuite/ChangeLog:
2018-08-06 Andreas Krebbel <krebbel@linux.ibm.com>
* gcc.target/s390/nomemloopunroll-1.c: New test.
From-SVN: r263336
The SPU processor is not affected by speculation, so this macro can
safely be defined as speculation_safe_value_not_needed.
gcc/ChangeLog:
PR target/86807
* config/spu/spu.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_safe_value_not_needed.
From-SVN: r263335
Ensure clobber high is a register expression.
Info is passed through for the error case.
gcc/
* emit-rtl.c (verify_rtx_sharing): Check for CLOBBER_HIGH.
(copy_insn_1): Likewise.
(gen_hard_reg_clobber_high): New gen function.
* genconfig.c (walk_insn_part): Check for CLOBBER_HIGH.
* genemit.c (gen_exp): Likewise.
(gen_emit_seq): Pass through info.
(gen_insn): Check for CLOBBER_HIGH.
(gen_expand): Pass through info.
(gen_split): Likewise.
(output_add_clobbers): Likewise.
* genrecog.c (validate_pattern): Check for CLOBBER_HIGH.
(remove_clobbers): Likewise.
* rtl.h (gen_hard_reg_clobber_high): New declaration.
From-SVN: r263327
Zlib is not a dependency of libbacktrace, and so it shouldn't be added
to LIBS.
libbacktrace/
* configure.ac: Move define of HAVE_ZLIB into check for -lz.
* Makefile.in: Regenerate.
* config.h.in: Likewise.
* configure: Likewise.
From-SVN: r263320
cfun->machine->max_used_stack_alignment is used to decide how stack frame
should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit.
It is always safe to compute max_used_stack_alignment. We compute it only
if 128-bit aligned load/store may be generated on misaligned stack slot
which will lead to segfault.
gcc/
PR target/86386
* config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
cfun->machine->max_used_stack_alignment if needed.
gcc/testsuite/
PR target/86386
* gcc.target/i386/pr86386.c: New file.
From-SVN: r263317
Using libgomp configure option --with-cuda-driver=<dir> we can indicate what
cuda driver to use to build the libgomp nvptx plugin. Without such an option,
the system cuda driver is used, if available. If not availabe, a dlopen
interface is used instead.
However, when we use --without-cuda-driver (or the equivalent
--with-cuda-driver=no) the system cuda driver is still used if available.
This patch fixes that, making sure that --without-cuda-driver selects the dlopen
interface.
Build on x86_64 with nvptx accelerator and tested libgomp testsuite, with and
without option --without-cuda-driver.
2018-08-04 Tom de Vries <tdevries@suse.de>
* plugin/configfrag.ac: For --without-cuda-driver, set
CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB to no. Handle
CUDA_DRIVER_INCLUDE == no and CUDA_DRIVER_LIB == no.
* configure: Regenerate.
From-SVN: r263310
gcc/cp/ChangeLog:
PR c++/85523
* decl.c: Include "gcc-rich-location.h".
(add_return_star_this_fixit): New function.
(finish_function): When warning about missing return statements in
functions returning non-void, add a "return *this;" fix-it hint for
assignment operators.
gcc/testsuite/ChangeLog:
PR c++/85523
* g++.dg/pr85523.C: New test.
Co-Authored-By: Jonathan Wakely <jwakely@redhat.com>
From-SVN: r263298
If a struct contains an anonymous union and both have a field with the
same name, detect_field_duplicates_hash() will replace one of them
with NULL. If compilation doesn't stop immediately, it may later call
lookup_field() on the union, which falsely assumes the union's
LANG_SPECIFIC array is sorted, and may loop indefinitely because of
this.
2018-08-03 Bogdan Harjoc <harjoc@gmail.com>
PR c/86690
gcc/c:
* c-typeck.c (lookup_field): Do not use TYPE_LANG_SPECIFIC after
errors.
gcc/testsuite:
* gcc.dg/union-duplicate-field.c: New test.
From-SVN: r263294
This partially reverts r262482, at it broke canadian builds.
2018-08-03 Pierre-Marie de Rodat <derodat@adacore.com>
gcc/ada/
Reverts
2018-07-06 Jim Wilson <jimw@sifive.com>
* Make-generated.in (treeprs.ads): Use $(GNATMAKE) instead of gnatmake.
(einfo.h, sinfo.h, stamp-snames, stamp-nmake): Likewise.
* gcc-interface/Makefile.in (xoscons): Likewise.
From-SVN: r263291
We couldn't vectorise:
for (int j = 0; j < n; ++j)
{
for (int i = 0; i < 16; ++i)
a[i] = (b[i] + c[i]) >> 1;
a += step;
b += step;
c += step;
}
at -O3 because cunrolli unrolled the inner loop and SLP couldn't handle
AVG_FLOOR patterns (see also PR86504). The problem was some overly
strict checking of pattern statements compared to normal statements
in vect_get_and_check_slp_defs:
switch (gimple_code (def_stmt))
{
case GIMPLE_PHI:
case GIMPLE_ASSIGN:
break;
default:
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"unsupported defining stmt:\n");
return -1;
}
The easy fix would have been to add GIMPLE_CALL to the switch,
but I don't think the switch is doing anything useful. We only create
pattern statements that the rest of the vectoriser can handle, and the
other checks in this function and elsewhere check whether SLP is possible.
I'm also not sure why:
if (!first && !oprnd_info->first_pattern
/* Allow different pattern state for the defs of the
first stmt in reduction chains. */
&& (oprnd_info->first_dt != vect_reduction_def
is necessary. All that should matter is that the statements in the
node are "similar enough". It turned out to be quite hard to find a
convincing example that used a mixture of pattern and non-pattern
statements, so bb-slp-pow-1.c is the best I could come up with.
But it does show that the combination of "xi * xi" statements and
"pow (xj, 2) -> xj * xj" patterns are handled correctly.
The patch therefore just removes the whole if block.
The loop also needed commutative swapping to be extended to at least
AVG_FLOOR.
This gives +3.9% on 525.x264_r at -O3.
2018-08-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* internal-fn.h (first_commutative_argument): Declare.
* internal-fn.c (first_commutative_argument): New function.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Remove extra
restrictions for pattern statements. Use first_commutative_argument
to look for commutative operands in calls to internal functions.
gcc/testsuite/
* gcc.dg/vect/bb-slp-over-widen-1.c: Expect AVG_FLOOR to be used
on vect_avg_qi targets.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/bb-slp-pow-1.c: New test.
* gcc.dg/vect/vect-avg-15.c: Likewise.
From-SVN: r263290
* src/c++11/system_error.cc
(system_error_category::default_error_condition): Add workaround for
ENOTEMPTY and EEXIST having the same value on AIX.
* testsuite/19_diagnostics/error_category/system_category.cc: Add
extra testcases for EDOM, EILSEQ, ERANGE, EEXIST and ENOTEMPTY.
From-SVN: r263289
If a target does not support exceptions, it can indicate this by returning
UI_NONE in TARGET_EXCEPT_UNWIND_INFO. Currently the compiler still emits
exception tables for such a target.
This patch makes sure that no exception tables are emitted if the target does
not support exceptions. This allows us to remove a workaround in
TARGET_ASM_BYTE_OP in the nvptx port.
Build on x86_64 with nvptx accelerator, and tested libgomp.
Build and reg-tested on x86_64.
2018-08-03 Tom de Vries <tdevries@suse.de>
* common/config/nvptx/nvptx-common.c (nvptx_except_unwind_info): Return
UI_NONE.
* config/nvptx/nvptx.c (TARGET_ASM_BYTE_OP): Remove define.
* except.c (output_function_exception_table): Do early exit if
targetm_common.except_unwind_info (&global_options) == UI_NONE.
From-SVN: r263287
There was a typo in the pipeline description where DUP was assigned to
the vector pipes for quad mode ops when it really only uses the VTOG
pipes. Fixing this does not show any noticeable difference in
performance (there's a very small bump of 1.7% in x264 but that's
about it) in my tests but is the more precise description of operations
for falkor.
* config/aarch64/falkor.md (falkor_am_1_vxvy_vxvy): Move
neon_dup_q to...
(falkor_am_1_gtov_gtov): ... a new insn reservation.
From-SVN: r263285
We were rather sloppy about handling the ownership of prefixes for
pretty_printer, and this lead to a memory leak for any time a
diagnostic_show_locus call emits multiple line spans.
This showed up in "make selftest-valgrind" as:
3,976 bytes in 28 blocks are definitely lost in loss record 632 of 669
at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x1F08227: xmalloc (xmalloc.c:147)
by 0x1F083E6: xvasprintf (xvasprintf.c:58)
by 0x1E7EC7D: build_message_string(char const*, ...) (diagnostic.c:78)
by 0x1E7F438: diagnostic_get_location_text(diagnostic_context*, expanded_location) (diagnostic.c:328)
by 0x1E7FD54: default_diagnostic_start_span_fn(diagnostic_context*, expanded_location) (diagnostic.c:626)
by 0x1EB3508: selftest::test_diagnostic_context::start_span_cb(diagnostic_context*, expanded_location) (selftest-diagnostic.c:57)
by 0x1E89215: diagnostic_show_locus(diagnostic_context*, rich_location*, diagnostic_t) (diagnostic-show-locus.c:1992)
by 0x1E8ECAD: selftest::test_fixit_insert_containing_newline_2(selftest::line_table_case const&) (diagnostic-show-locus.c:3044)
by 0x1EB0606: selftest::for_each_line_table_case(void (*)(selftest::line_table_case const&)) (input.c:3525)
by 0x1E8F3F5: selftest::diagnostic_show_locus_c_tests() (diagnostic-show-locus.c:3164)
by 0x1E010BF: selftest::run_tests() (selftest-run-tests.c:88)
4,004 bytes in 28 blocks are definitely lost in loss record 633 of 669
at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x1F08227: xmalloc (xmalloc.c:147)
by 0x1F083E6: xvasprintf (xvasprintf.c:58)
by 0x1E7EC7D: build_message_string(char const*, ...) (diagnostic.c:78)
by 0x1E7F438: diagnostic_get_location_text(diagnostic_context*, expanded_location) (diagnostic.c:328)
by 0x1E7FD54: default_diagnostic_start_span_fn(diagnostic_context*, expanded_location) (diagnostic.c:626)
by 0x1EB3508: selftest::test_diagnostic_context::start_span_cb(diagnostic_context*, expanded_location) (selftest-diagnostic.c:57)
by 0x1E89215: diagnostic_show_locus(diagnostic_context*, rich_location*, diagnostic_t) (diagnostic-show-locus.c:1992)
by 0x1E8B373: selftest::test_diagnostic_show_locus_fixit_lines(selftest::line_table_case const&) (diagnostic-show-locus.c:2500)
by 0x1EB0606: selftest::for_each_line_table_case(void (*)(selftest::line_table_case const&)) (input.c:3525)
by 0x1E8F3B9: selftest::diagnostic_show_locus_c_tests() (diagnostic-show-locus.c:3159)
by 0x1E010BF: selftest::run_tests() (selftest-run-tests.c:88)
This patch fixes the leaks by ensuring that the pretty_printer "owns"
the prefix if it's non-NULL, freeing it in the dtor and in pp_set_prefix.
gcc/cp/ChangeLog:
* error.c (cxx_print_error_function): Duplicate "file" before
passing it to pp_set_prefix.
(cp_print_error_function): Use pp_take_prefix when saving the
existing prefix.
gcc/ChangeLog:
* diagnostic-show-locus.c (diagnostic_show_locus): Use
pp_take_prefix when saving the existing prefix.
* diagnostic.c (diagnostic_append_note): Likewise.
* langhooks.c (lhd_print_error_function): Likewise.
* pretty-print.c (pp_set_prefix): Drop the "const" from "prefix"
param's type. Free the existing prefix.
(pp_take_prefix): New function.
(pretty_printer::pretty_printer): Drop the prefix parameter.
Rename the length parameter to match the comment.
(pretty_printer::~pretty_printer): Free the prefix.
* pretty-print.h (pretty_printer::pretty_printer): Drop the prefix
parameter.
(struct pretty_printer): Drop the "const" from "prefix" field's
type and clarify memory management.
(pp_set_prefix): Drop the "const" from the 2nd param.
(pp_take_prefix): New decl.
From-SVN: r263275