The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.
This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.
Build on x86_64 with nvptx accelerator and ran libgomp testsuite.
2018-08-13 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
PR target/85590
* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
(cuOccupancyMaxPotentialBlockSize): Declare.
* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
CUoccupancyB2DSize and declare
cuOccupancyMaxPotentialBlockSize.
(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
default num_gangs and num_workers when the driver supports it.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r263505
2018-08-12 Paul Thomas <pault@gcc.gnu.org>
PR fortran/66679
* trans-intrinsic.c (gfc_conv_intrinsic_transfer): Class array
elements are returned as references to the data element. Get
the class expression by stripping back the references. Use this
for the element size.
2018-08-12 Paul Thomas <pault@gcc.gnu.org>
PR fortran/66679
* gfortran.dg/transfer_class_3.f90: New test.
From-SVN: r263499
2018-08-12 Paul Thomas <pault@gcc.gnu.org>
PR fortran/86906
* resolve.c (resolve_fl_variable_derived): Check if the derived
type is use associated before checking for the host association
error.
2018-08-12 Paul Thomas <pault@gcc.gnu.org>
PR fortran/86906
* gfortran.dg/use_rename_9.f90: New test.
From-SVN: r263494
PR tree-optimization/86835
* tree-ssa-math-opts.c (insert_reciprocals): Even when inserting
new_stmt after def_gsi, make sure to insert new_square_stmt after
that stmt, not 2 stmts before it.
* gcc.dg/pr86835.c: New test.
From-SVN: r263487
Ensure that nothrow versions of new and delete call the ordinary
versions of new or delete, instead of calling malloc or free directly.
These files are all compiled with -std=gnu++14 so can use noexcept and
nullptr to make the code more readable.
PR libstdc++/68210
* doc/xml/manual/intro.xml: Document LWG 206 change.
* libsupc++/del_op.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opa.cc: Likewise.
* libsupc++/del_opant.cc: Likewise.
* libsupc++/del_opnt.cc: Likewise. Call operator delete(ptr) instead
of free(ptr).
* libsupc++/del_ops.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opsa.cc: Likewise.
* libsupc++/del_opva.cc: Likewise.
* libsupc++/del_opvant.cc: Likewise.
* libsupc++/del_opvnt.cc: Likewise. Call operator delete[](ptr)
instead of operator delete(ptr).
* libsupc++/del_opvs.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opvsa.cc: Likewise.
* libsupc++/new_op.cc: Use __builtin_expect in check for zero size.
* libsupc++/new_opa.cc: Use nullptr instead of literal 0.
* libsupc++/new_opant.cc: Likewise. Replace _GLIBCXX_USE_NOEXCEPT
with noexcept.
* libsupc++/new_opnt.cc: Likewise. Call operator new(sz) instead of
malloc(sz).
* libsupc++/new_opvant.cc: Use nullptr and noexcept.
* libsupc++/new_opvnt.cc: Likewise. Call operator new[](sz) instead of
operator new(sz, nothrow).
* testsuite/18_support/new_nothrow.cc: New test.
From-SVN: r263478
2018-08-10 Janus Weil <janus@gcc.gnu.org>
PR fortran/57160
* invoke.texi (frontend-optimize): Mention short-circuiting.
* options.c (gfc_post_options): Disable -ffrontend-optimize with -Og.
* resolve.c (resolve_operator): Warn about short-circuiting only with
-ffrontend-optimize.
* trans-expr.c (gfc_conv_expr_op): Use short-circuiting operators only
with -ffrontend-optimize. Without that flag, make sure that both
operands are evaluated.
2018-08-10 Janus Weil <janus@gcc.gnu.org>
PR fortran/57160
* gfortran.dg/actual_pointer_function_1.f90: Fix invalid test case.
* gfortran.dg/inline_matmul_23.f90: Add option "-ffrontend-optimize".
* gfortran.dg/short_circuiting_2.f90: New test case.
* gfortran.dg/short_circuiting_3.f90: New test case.
From-SVN: r263471
2018-08-10 Martin Liska <mliska@suse.cz>
PR target/83610
* builtin-types.def (BT_FN_LONG_LONG_LONG_DOUBLE): Add new
function type.
* builtins.c (expand_builtin_expect_with_probability):
New function.
(expand_builtin_expect_with_probability): New function.
(build_builtin_expect_predicate): Add new argumnet probability
for BUILT_IN_EXPECT_WITH_PROBABILITY.
(fold_builtin_expect):
(fold_builtin_2):
(fold_builtin_3):
* builtins.def (BUILT_IN_EXPECT_WITH_PROBABILITY):
* builtins.h (fold_builtin_expect): Set new argument.
* doc/extend.texi: Document __builtin_expect_with_probability.
* doc/invoke.texi: Likewise.
* gimple-fold.c (gimple_fold_call): Pass new argument.
* ipa-fnsummary.c (find_foldable_builtin_expect): Handle
also BUILT_IN_EXPECT_WITH_PROBABILITY.
* predict.c (get_predictor_value): New function.
(expr_expected_value): Add new argument probability. Assume
that predictor and probability are always non-null.
(expr_expected_value_1): Likewise. For __builtin_expect and
__builtin_expect_with_probability set probability. Handle
combination in binary expressions.
(tree_predict_by_opcode): Simplify code by simply calling
get_predictor_value.
(pass_strip_predict_hints::execute): Add handling of
BUILT_IN_EXPECT_WITH_PROBABILITY.
* predict.def (PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Add
new predictor.
* tree.h (DECL_BUILT_IN_P): New function.
2018-08-10 Martin Liska <mliska@suse.cz>
PR target/83610
* gcc.dg/predict-17.c: New test.
* gcc.dg/predict-18.c: New test.
* gcc.dg/predict-19.c: New test.
From-SVN: r263466
2018-08-10 Martin Liska <mliska@suse.cz>
PR tree-optimization/85799
* passes.def: Add argument for pass_strip_predict_hints.
* predict.c (class pass_strip_predict_hints): Add new argument
early_p.
(strip_predictor_early): New function.
(pass_strip_predict_hints::execute): Call the function to
strip predictors.
(strip_predict_hints): New function.
* predict.def: Fix comment.
2018-08-10 Martin Liska <mliska@suse.cz>
PR tree-optimization/85799
* gcc.dg/pr85799.c: New test.
From-SVN: r263465
When tm.texi.in is updated in the source tree, the following message
gets displayed:
Verify that you have permission to grant a GFDL license for all
new text in tm.texi, then copy it to <gcc src dir>/gcc/doc/tm.texi.
Having been myself and some colleagues confused several time by that
message as to what tm.texi to copy, I think it would be clearer to
indicate the absolute path for the source as well. This patch achieves
that.
2018-08-10 Thomas Preud'homme <thomas.preudhomme@linaro.org>
gcc/
* Makefile.in: Clarify which tm.texi to copy over to assert the
right to grant a GFDL license for all.
From-SVN: r263464
While building for Newlib, some configure checks must be hard coded.
The aligned_alloc() is supported since 2015 in Newlib.
libstdc++-v3/
PR target/85904
* configure.ac: Define HAVE_ALIGNED_ALLOC if building for
Newlib.
* configure: Regenerate.
From-SVN: r263461
These aliases are placed in the top-level header, e.g. <vector> not
<bits/stl_vector.h>. This ensures that they refer to whichever of
std::vector or __debug::vector or __profile::vector is in use when the
header is included.
* include/std/deque (std::pmr::deque): Declare alias.
* include/std/forward_list (std::pmr::forward_list): Likewise.
* include/std/list (std::pmr::list): Likewise.
* include/std/map (std::pmr::map, std::pmr::multimap): Likewise.
* include/std/regex (std::pmr::match_results, std::pmr::cmatch)
(std::pmr::smatch, std::pmr::wcmatch, std::pmr::wsmatch): Likewise.
* include/std/set (std::pmr::set, std::pmr::multiset): Likewise.
* include/std/string (std::pmr::basic_string, std::pmr::string)
(std::pmr::u16string, std::pmr::u32string, std::pmr::wstring):
Likewise.
* include/std/unordered_map (std::pmr::unordered_map)
(std::pmr::unordered_multimap): Likewise.
* include/std/unordered_set (std::pmr::unordered_set)
(std::pmr::unordered_multiset): Likewise.
* include/std/vector (std::pmr::vector): Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc: New test.
* testsuite/23_containers/deque/types/pmr_typedefs.cc: New test.
* testsuite/23_containers/forward_list/pmr_typedefs.cc: New test.
* testsuite/23_containers/list/pmr_typedefs.cc: New test.
* testsuite/23_containers/map/pmr_typedefs.cc: New test.
* testsuite/23_containers/multimap/pmr_typedefs.cc: New test.
* testsuite/23_containers/multiset/pmr_typedefs.cc: New test.
* testsuite/23_containers/set/pmr_typedefs.cc: New test.
* testsuite/23_containers/unordered_map/pmr_typedefs.cc: New test.
* testsuite/23_containers/unordered_multimap/pmr_typedefs.cc: New
test.
* testsuite/23_containers/unordered_multiset/pmr_typedefs.cc: New
test.
* testsuite/23_containers/unordered_set/pmr_typedefs.cc: New test.
* testsuite/23_containers/vector/pmr_typedefs.cc: New test.
* testsuite/28_regex/match_results/pmr_typedefs.cc: New test.
From-SVN: r263456
I nearly missed this patch for my accumulated back-porting list since
it didn't have the PR number in it.
Just adding it so that I can track things properly. The original
commit landed as r263301.
From-SVN: r263453
While working on PR 86871, I noticed we were being overly restrictive
when handling variable-length vectors. For:
for (i : ...)
{
res = ...;
for (j : ...)
res op= ...;
a[i] = res;
}
we don't need a reduction operation (although we do for double
reductions like:
res = ...;
for (i : ...)
for (j : ...)
res op= ...;
a[i] = res;
which must still be rejected).
2018-08-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
reductions for variable-length vectors.
gcc/testsuite/
* gcc.target/aarch64/sve/reduc_8.c: New test.
From-SVN: r263451
This patch adds a left margin to the lines of source (and annotations)
printed by diagnostic_show_locus, so that e.g. rather than:
test.c: In function 'test':
test.c:12:15: error: 'struct foo' has no member named 'm_bar'; did you mean 'bar'?
return ptr->m_bar;
^~~~~
bar
we print:
test.c: In function 'test':
test.c:12:15: error: 'struct foo' has no member named 'm_bar'; did you mean 'bar'?
12 | return ptr->m_bar;
| ^~~~~
| bar
Similarly, for a multiline case (in C++ this time), this:
bad-binary-ops.C: In function 'int test_2()':
bad-binary-ops.C:26:4: error: no match for 'operator+' (operand types are 's' and 't')
return (some_function ()
~~~~~~~~~~~~~~~~
+ some_other_function ());
^~~~~~~~~~~~~~~~~~~~~~~~
becomes:
bad-binary-ops.C: In function 'int test_2()':
bad-binary-ops.C:26:4: error: no match for 'operator+' (operand types are 's' and 't')
25 | return (some_function ()
| ~~~~~~~~~~~~~~~~
26 | + some_other_function ());
| ^~~~~~~~~~~~~~~~~~~~~~~~
I believe this slightly improves the readability of the output, in that it:
- distinguishes between the user's source code vs the annotation lines
that we're adding (the underlinings and fix-it hints here)
- shows the line numbers in another place (potentially helpful for
multiline diagnostics, where the user can see the line numbers directly,
rather than have to figure them out relative to the caret: in the 2nd
example, note how the diagnostic is reported at line 26, but the first
line printed is actually line 25)
I'm not sure that this is the precise format we want to go with [1], but
I think it's an improvement over the status quo, and we're in stage 1
of gcc 9, so there's plenty of time to shake out issues.
I've turned it on by default; it can be disabled via
-fno-diagnostics-show-line-numbers (it's also turned off in the testsuite, to
avoid breaking numerous existing test cases).
[1] Some possible variants:
- maybe just "LL|" rather than "LL | "
- maybe ':' rather than '|'
- maybe we should have some leading indentation, to better split up
the diagnostics visually via the left-hand column
- etc
gcc/ChangeLog:
PR other/84889
* common.opt (fdiagnostics-show-line-numbers): New option.
* diagnostic-show-locus.c (class layout): Add fields
"m_show_line_numbers_p" and "m_linenum_width";
(num_digits): New function.
(test_num_digits): New function.
(layout::layout): Initialize new fields. Update m_x_offset
logic to handle any left margin.
(layout::print_source_line): Print line number when requested.
(layout::start_annotation_line): New member function.
(layout::print_annotation_line): Call it.
(layout::print_leading_fixits): Likewise.
(layout::print_trailing_fixits): Likewise. Update calls to
move_to_column for new parameter.
(layout::get_x_bound_for_row): Add "add_left_margin" param and use
it to potentially call start_annotation_line.
(layout::show_ruler): Call start_annotation_line.
(selftest::test_line_numbers_multiline_range): New selftest.
(selftest::diagnostic_show_locus_c_tests): Call test_num_digits
and selftest::test_line_numbers_multiline_range.
* diagnostic.c (diagnostic_initialize): Initialize
show_line_numbers_p.
* diagnostic.h (struct diagnostic_context): Add field
"show_line_numbers_p".
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
-fno-diagnostics-show-line-numbers.
* dwarf2out.c (gen_producer_string): Add
OPT_fdiagnostics_show_line_numbers to the ignored options.
* lto-wrapper.c (merge_and_complain): Likewise to the "pick
one setting" options.
(append_compiler_options): Likewise to the dropped options.
(append_diag_options): Likewise to the passed-on options.
* opts.c (common_handle_option): Handle the new option.
* toplev.c (general_init): Set up global_dc->show_line_numbers_p.
gcc/testsuite/ChangeLog:
PR other/84889
* gcc.dg/plugin/diagnostic-test-show-locus-bw-line-numbers.c: New
test.
* gcc.dg/plugin/diagnostic-test-show-locus-color-line-numbers.c:
New test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new tests.
* lib/prune.exp: Add -fno-diagnostics-show-line-numbers to
TEST_ALWAYS_FLAGS.
From-SVN: r263450
gcc/ChangeLog:
2018-08-09 Kelvin Nilsen <kelvin@gcc.gnu.org>
* doc/extend.texi (PowerPC AltiVec Built-in Functions Available on
ISA 2.07): Correct spelling of bcdsub to be __builtin_bcdsub. Add
third argument of type "const signed char" to descriptions of
__builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq,
__builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub,
__builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt,
__builtin_bcdsub_ov functions.
From-SVN: r263449
The series to remove vinfo_for_stmt also removed tests of
flow_bb_inside_loop_p if the call was simply testing whether the
statement was in the vectorisation region. I'd tried to keep calls
that were testing whether the statement was in a particular loop
(inner or outer), but messed up in vect_is_simple_reduction and
removed calls that were actually needed. This patch restores them.
I double-checked the other removed calls and I think these are
the only ones affected.
2018-08-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/86858
* tree-vect-loop.c (vect_is_simple_reduction): Restore
flow_bb_inside_loop_p calls.
gcc/testsuite/
PR tree-optimization/86858
* gcc.dg/vect/pr86858.c: New test.
From-SVN: r263448
The handling of outer-loop uses of inner-loop definitions assumed
that anything that wasn't a PHI would be a gassign. It's also
possible for it to be a gcall.
2018-08-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/86871
* tree-vect-stmts.c (vect_transform_stmt): Use gimple_get_lhs
instead of gimple_assign_lhs.
gcc/testsuite/
PR tree-optimization/86871
* gcc.dg/vect/pr86871.c: New test.
From-SVN: r263447
Some of the carryin insn patterns are missing a register constraint.
That means that the register allocator can pick practically anything
to hold that value, including memory locations, or registers of the
wrong class.
PR target/86887
* config/aarch64/aarch64.md (add<mode>3_carryinC_zero): Add missing
register constraint to operand 0.
(add<mode>3_carryinC): Likewise.
(add<mode>3_carryinV_zero, add<mode>3_carryinV): Likewise.
From-SVN: r263446
2018-08-09 Martin Liska <mliska@suse.cz>
* params.def (PARAM_ALIGN_LOOP_ITERATIONS): Remove double dots
at the end of a line, make first letter capital and end up
a sentence with a dot.
(PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): Likewise.
(PARAM_LOOP_BLOCK_TILE_SIZE): Likewise.
(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Likewise.
(PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP): Likewise.
(PARAM_MAX_ISL_OPERATIONS): Likewise.
(PARAM_GRAPHITE_ALLOW_CODEGEN_ERRORS): Likewise.
(PARAM_PROFILE_FUNC_INTERNAL_ID): Likewise.
(PARAM_INDIR_CALL_TOPN_PROFILE): Likewise.
(PARAM_SLP_MAX_INSNS_IN_BB): Likewise.
(PARAM_IPA_CP_EVAL_THRESHOLD): Likewise.
(PARAM_IPA_CP_RECURSION_PENALTY): Likewise.
(PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
(PARAM_IPA_CP_LOOP_HINT_BONUS): Likewise.
(PARAM_IPA_CP_ARRAY_INDEX_HINT_BONUS): Likewise.
(PARAM_TREE_REASSOC_WIDTH): Likewise.
(PARAM_HSA_GEN_DEBUG_STORES): Likewise.
(PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS): Likewise.
(PARAM_MAX_VRP_SWITCH_ASSERTIONS): Likewise.
From-SVN: r263442
Our implementation of the stack probe requires the probe interval to
be used as displacement in an address operand. The maximum probe
interval currently is 64k. This would exceed short displacements.
Trim that value down to 4k if that happens. This might result in too
many probes being generated only on the oldest supported machine level
z900.
gcc/ChangeLog:
2018-08-09 Andreas Krebbel <krebbel@linux.ibm.com>
PR target/84332
* config/s390/s390.c (s390_option_override_internal): Reduce the
stack-clash-protection-probe-interval param if it would be too big
for z900.
gcc/testsuite/ChangeLog:
2018-08-09 Andreas Krebbel <krebbel@linux.ibm.com>
PR target/84332
* gcc.target/s390/pr84332.c: New testcase.
From-SVN: r263441
PR target/46179
* config/m68k/m68k.h (FINAL_PRESCAN_INSN): Don't define.
* config/m68k/m68k.c (handle_move_double): Don't call
m68k_final_prescan_insn.
(m68k_adjust_decorated_operand): Renamed from
m68k_final_prescan_insn, remove first and third operand and
simplify.
(print_operand): Call it.
(print_operand_address): Call it.
PR target/46179
* gcc.target/m68k/tls-dimode.c: New file.
From-SVN: r263432
If configure fails to detect aligned_alloc we will try to define our
own in new_opa.cc but that could clash with the libcversion in
<stdlib.h>. Use a namespace to keep them distinct.
* libsupc++/new_opa.cc (aligned_alloc): Declare inside namespace to
avoid clashing with an ::aligned_alloc function that was not detected
by configure.
From-SVN: r263409
Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting
version 5.5. In version 6.5, they are remapped onto _v2 versions.
The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it
won't work with a cuda driver with driver api version lower than 6.5.
This patch fixes the problem by testing for the presence of the _v2 versions,
and falling back to the original versions in case of absence of the _v2
versions.
Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.
2018-08-08 Tom de Vries <tdevries@suse.de>
* plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare.
(cuLinkAddData_v2, cuLinkCreate_v2): Declare.
(link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions
are not found.
From-SVN: r263408