Many thanks to Richard Biener for approving the midde-end
patch that cleared the way for this one. This nvptx patch
defines the target hook TARGET_TRULY_NOOP_TRUNCATION to
false, indicating that integer truncations require explicit
instructions. nvptx.c already defines TARGET_MODES_TIEABLE_P
and TARGET_CAN_CHANGE_MODE_CLASS to false, and as (previously)
documented that may require TARGET_TRULY_NOOP_TRUNCATION to
be defined likewise.
This patch decreases the number of unexpected failures in
the testsuite by 10, and increases the number of expected
passes by 4, including these previous FAILs/ICEs:
gcc.c-torture/compile/opout.c
gcc.dg/torture/pr79125.c
gcc.dg/tree-ssa/pr92085-1.c
Unfortunately there is one testsuite failure that used to
pass gcc.target/nvptx/v2si-cvt.c, but this isn't an ICE or
incorrect code. This regression has been filed as PR96403,
and the failing scan-assembler directives have been replaced
by a reference to the PR.
This patch has been tested on nvptx-none hosted on
x86_64-pc-linux-gnu with "make" and "make check" with
fewer ICEs and no wrong code regressions.
2020-07-31 Roger Sayle <roger@nextmovesoftware.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
PR target/90928
* config/nvptx/nvptx.c (nvptx_truly_noop_truncation): Implement.
(TARGET_TRULY_NOOP_TRUNCATION): Define.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/v2si-cvt.c: Simplify source. Remove
scan-assembler directives. Mention PR96403.
This was probably copied from a std::filesystem test and the -std option
wasn't removed.
libstdc++-v3/ChangeLog:
* testsuite/experimental/filesystem/filesystem_error/cons.cc:
Remove -std=gnu++17 option.
libstdc++-v3/ChangeLog:
* testsuite/20_util/is_aggregate/value.cc: Adjust for changes to
definition of aggregates in C++20.
* testsuite/20_util/optional/requirements.cc: Adjust for
defaulted comparisons in C++20.
libstdc++-v3/ChangeLog:
* testsuite/20_util/tuple/78939.cc: Suppress warnings about
deprecation of volatile-qualified structured bindings in C++20.
* testsuite/20_util/variable_templates_for_traits.cc: Likewise
for deprecation of is_pod in C++20
Also add an effective target to clarify it should only run for C++17 and
later.
libstdc++-v3/ChangeLog:
* testsuite/20_util/time_point_cast/rounding.cc: Remove
duplicate dg-do directive and add c++17 effective target.
The majority of tests in runnable are really compilable/ICE tests, and
have have dg-do adjusted where necessary. Tests that had a dependency
on Phobos have also been reproduced and reduced with all imports
stripped from the test.
The end result is a collection of tests that only check the compiler bug
that was being fixed, rather than the library, and a reduction in time
spent running all tests.
gcc/testsuite/ChangeLog:
* gdc.dg/compilable.d: Removed.
* gdc.dg/gdc108.d: New test.
* gdc.dg/gdc115.d: New test.
* gdc.dg/gdc121.d: New test.
* gdc.dg/gdc122.d: New test.
* gdc.dg/gdc127.d: New test.
* gdc.dg/gdc131.d: New test.
* gdc.dg/gdc133.d: New test.
* gdc.dg/gdc141.d: New test.
* gdc.dg/gdc142.d: New test.
* gdc.dg/gdc15.d: New test.
* gdc.dg/gdc17.d: New test.
* gdc.dg/gdc170.d: New test.
* gdc.dg/gdc171.d: New test.
* gdc.dg/gdc179.d: New test.
* gdc.dg/gdc183.d: New test.
* gdc.dg/gdc186.d: New test.
* gdc.dg/gdc187.d: New test.
* gdc.dg/gdc19.d: New test.
* gdc.dg/gdc191.d: New test.
* gdc.dg/gdc194.d: New test.
* gdc.dg/gdc196.d: New test.
* gdc.dg/gdc198.d: New test.
* gdc.dg/gdc200.d: New test.
* gdc.dg/gdc204.d: New test.
* gdc.dg/gdc210.d: New test.
* gdc.dg/gdc212.d: New test.
* gdc.dg/gdc213.d: New test.
* gdc.dg/gdc218.d: New test.
* gdc.dg/gdc223.d: New test.
* gdc.dg/gdc231.d: New test.
* gdc.dg/gdc239.d: New test.
* gdc.dg/gdc24.d: New test.
* gdc.dg/gdc240.d: New test.
* gdc.dg/gdc241.d: New test.
* gdc.dg/gdc242a.d: New test.
* gdc.dg/gdc242b.d: New test.
* gdc.dg/gdc248.d: New test.
* gdc.dg/gdc250.d: New test.
* gdc.dg/gdc251.d: New test.
* gdc.dg/gdc253a.d: New test.
* gdc.dg/gdc253b.d: New test.
* gdc.dg/gdc255.d: New test.
* gdc.dg/gdc256.d: New test.
* gdc.dg/gdc261.d: New test.
* gdc.dg/gdc27.d: New test.
* gdc.dg/gdc273.d: New test.
* gdc.dg/gdc280.d: New test.
* gdc.dg/gdc284.d: New test.
* gdc.dg/gdc285.d: New test.
* gdc.dg/gdc286.d: New test.
* gdc.dg/gdc300.d: New test.
* gdc.dg/gdc309.d: New test.
* gdc.dg/gdc31.d: New test.
* gdc.dg/gdc35.d: New test.
* gdc.dg/gdc36.d: New test.
* gdc.dg/gdc37.d: New test.
* gdc.dg/gdc4.d: New test.
* gdc.dg/gdc43.d: New test.
* gdc.dg/gdc47.d: New test.
* gdc.dg/gdc51.d: New test.
* gdc.dg/gdc57.d: New test.
* gdc.dg/gdc66.d: New test.
* gdc.dg/gdc67.d: New test.
* gdc.dg/gdc71.d: New test.
* gdc.dg/gdc77.d: New test.
* gdc.dg/imports/gdc239.d: Remove phobos dependency.
* gdc.dg/imports/gdc241a.d: Updated imports.
* gdc.dg/imports/gdc241b.d: Likewise.
* gdc.dg/imports/gdc251a.d: Likewise.
* gdc.dg/imports/gdc253.d: Rename to...
* gdc.dg/imports/gdc253a.d: ...this.
* gdc.dg/imports/gdc253b.d: New.
* gdc.dg/imports/gdc36.d: New.
* gdc.dg/imports/runnable.d: Removed.
* gdc.dg/link.d: Removed.
* gdc.dg/runnable.d: Removed.
* gdc.dg/runnable2.d: Removed.
* gdc.dg/simd.d: Remove phobos dependency.
For 32-bit btr(), BIT_NOT_EXPR was being generated twice, something that
was not seen with the 64-bit variant. Removed the second call to fix
the generated code.
gcc/d/ChangeLog:
PR d/96393
* intrinsics.cc (expand_intrinsic_bt): Don't generate BIT_NOT_EXPR for
btr32 intrinsic.
Resolves:
PR c++/96003 spurious -Wnonnull calling a member on the result of static_cast
gcc/c-family/ChangeLog:
PR c++/96003
* c-common.c (check_function_arguments_recurse): Return early when
no-warning bit is set.
gcc/cp/ChangeLog:
PR c++/96003
* class.c (build_base_path): Set no-warning bit on the synthesized
conditional expression in static_cast.
gcc/testsuite/ChangeLog:
PR c++/96003
* g++.dg/warn/Wnonnull7.C: New test.
This makes sure to emit full declaration DIEs including
formal parameters for used external functions. This helps
debugging when debug information of the external entity is
not available and also helps external tools cross-checking
ABI compatibility which was the bug reporters use case.
For cc1 this affects debug information size as follows:
VM SIZE FILE SIZE
++++++++++++++ GROWING ++++++++++++++
[ = ] 0 .debug_info +1.63Mi +1.3%
[ = ] 0 .debug_str +263Ki +3.4%
[ = ] 0 .debug_abbrev +101Ki +4.9%
[ = ] 0 .debug_line +5.71Ki +0.0%
+44% +16 [Unmapped] +48 +1.2%
-------------- SHRINKING --------------
[ = ] 0 .debug_loc -213 -0.0%
-0.0% -48 .text -48 -0.0%
[ = ] 0 .debug_ranges -16 -0.0%
-0.0% -32 TOTAL +1.99Mi +0.6%
and DWARF compression via DWZ can only shave off minor bits
here.
Previously we emitted no DIEs for external functions at all
unless they were referenced via DW_TAG_GNU_call_site which
for some GCC revs caused a regular DIE to appear and since
GCC 4.9 only a stub without formal parameters. This means
at -O0 we did not emit any DIE for external functions
but with optimization we emitted stubs.
2020-07-30 Richard Biener <rguenther@suse.de>
PR debug/96383
* langhooks-def.h (lhd_finalize_early_debug): Declare.
(LANG_HOOKS_FINALIZE_EARLY_DEBUG): Define.
(LANG_HOOKS_INITIALIZER): Amend.
* langhooks.c: Include cgraph.h and debug.h.
(lhd_finalize_early_debug): Default implementation from
former code in finalize_compilation_unit.
* langhooks.h (lang_hooks::finalize_early_debug): Add.
* cgraphunit.c (symbol_table::finalize_compilation_unit):
Call the finalize_early_debug langhook.
gcc/c-family/
* c-common.h (c_common_finalize_early_debug): Declare.
* c-common.c: Include debug.h.
(c_common_finalize_early_debug): finalize_early_debug langhook
implementation generating debug for extern declarations.
gcc/c/
* c-objc-common.h (LANG_HOOKS_FINALIZE_EARLY_DEBUG):
Define to c_common_finalize_early_debug.
gcc/cp/
* cp-objcp-common.h (LANG_HOOKS_FINALIZE_EARLY_DEBUG):
Define to c_common_finalize_early_debug.
gcc/testsuite/
* gcc.dg/debug/dwarf2/pr96383-1.c: New testcase.
* gcc.dg/debug/dwarf2/pr96383-2.c: Likewise.
libstdc++-v3/
* testsuite/20_util/assume_aligned/3.cc: Use -g0.
This adds a ! marker to result expressions that should simplify
(and if not fail the simplification). This can for example be
used like
(simplify
(plus (vec_cond:s @0 @1 @2) @3)
(vec_cond @0 (plus! @1 @3) (plus! @2 @3)))
to make the simplification only apply in case both plus operations
in the result end up simplified to a simple operand.
2020-07-31 Richard Biener <rguenther@suse.de>
* genmatch.c (expr::force_leaf): Add and initialize.
(expr::gen_transform): Honor force_leaf by passing
NULL as sequence argument to maybe_push_res_to_seq.
(parser::parse_expr): Allow ! marker on result expression
operations.
* doc/match-and-simplify.texi: Amend.
Currently vectorizer cost modeling counts branch taken costs for
prologue and epilogue if the number of iterations is unknown.
But it isn't sensible if there are no peeled iterations.
This patch is to guard them under peel_iters_prologue > 0 or
peel_iters_epilogue > 0.
Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch
taken costs for prologue and epilogue if they don't exist.
(vect_estimate_min_profitable_iters): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model
change.
We noticed that when analyzing LTRANS memory peak that happens right
after the start:
https://gist.github.com/marxin/223890df4d8d8e490b6b2918b77dacad
In case of chrome, we have symtab->order == 200M, so we allocate
16B * 200M = 3.2GB.
gcc/ChangeLog:
* cgraph.h: Remove leading empty lines.
* cgraphunit.c (enum cgraph_order_sort_kind): Remove
ORDER_UNDEFINED.
(struct cgraph_order_sort): Add constructors.
(cgraph_order_sort::process): New.
(cgraph_order_cmp): New.
(output_in_order): Simplify and push nodes to vector.
This makes the special case of constant evaluated LHS for a
short-circuiting or/and explicit rather than doing range
merging and eventually exposing a side-effect that shouldn't be
evaluated.
2020-07-31 Richard Biener <rguenther@suse.de>
PR middle-end/96369
* fold-const.c (fold_range_test): Special-case constant
LHS for short-circuiting operations.
* c-c++-common/pr96369.c: New testcase.
This builds upon the rev_post_order_and_mark_dfs_back_seme improvements
and makes vt_find_locations iterate over the dataflow problems for
each toplevel SCC separately, improving memory locality and avoiding
to process nodes after the SCC before the SCC itself stabilized.
On the asan_interceptors.cc testcase this for example reduces the
number of visited blocks from 3751 to 2867. For stage3-gcc
this reduces the number of visited blocks by ~4%.
2020-07-28 Richard Biener <rguenther@suse.de>
PR debug/78288
* var-tracking.c (vt_find_locations): Use
rev_post_order_and_mark_dfs_back_seme and separately iterate
over toplevel SCCs.
This produces a more optimal RPO order for iteration processing
by making sure that SCC exits are processed before SCC members
themselves.. This avoids iterating blocks unrelated to the current
iteration for RPO VN and has the chance to improve code-generation
for the non-iterative mode of RPO VN. The patch also exposes toplevel
SCCs and gets rid of the ad-hoc max_rpo computation in RPO VN.
For simplicity it also removes the odd reverse ordering of the RPO
array returned from rev_post_order_and_mark_dfs_back_seme.
Overall reduction in the number of visited blocks isn't spectacular
for bootstrap (~2.5%) but single cases see up to a 10% reduction.
The same function can be used to optimize var-tracking iteration order
as seen in the followup.
2020-07-28 Richard Biener <rguenther@suse.de>
* cfganal.h (rev_post_order_and_mark_dfs_back_seme): Adjust
prototype.
* cfganal.c (rpoamdbs_bb_data): New struct with pre BB data.
(tag_header): New helper.
(cmp_edge_dest_pre): Likewise.
(rev_post_order_and_mark_dfs_back_seme): Compute SCCs,
find SCC exits and perform a DFS walk with extra edges to
compute a RPO with adjacent SCC members when requesting an
iteration optimized order and populate the toplevel SCC array.
* tree-ssa-sccvn.c (do_rpo_vn): Remove ad-hoc computation
of max_rpo and fill it in from SCC extent info instead.
* gcc.dg/torture/20200727-0.c: New testcase.
In the testcase from the PR we're seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression. We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, and from there decl_constant_value
makes an unshared copy of the VAR_DECL's initializer. But unsharing
here is unnecessary because callers of cxx_eval_constant_expression
already unshare its result when necessary.
To fix this excessive unsharing, this patch adds a new defaulted
parameter unshare_p to decl_really_constant_value and
decl_constant_value so that callers can control whether to unshare.
As a simplification, we can also move the call to unshare_expr in
constant_value_1 outside of the loop, since doing unshare_expr on a
DECL_P is a no-op.
Now that we no longer unshare the result of decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression, memory use
during constexpr evaluation for the testcase from the PR falls from ~5GB
to 15MB according to -ftime-report.
gcc/cp/ChangeLog:
PR c++/96197
* constexpr.c (cxx_eval_constant_expression) <case CONST_DECL>:
Pass false to decl_constant_value and decl_really_constant_value
so that they don't unshare their result.
* cp-tree.h (decl_constant_value): New declaration with an added
bool parameter.
(decl_really_constant_value): Add bool parameter defaulting to
true to existing declaration.
* init.c (constant_value_1): Add bool parameter which controls
whether to unshare the initializer before returning. Call
unshare_expr at most once.
(scalar_constant_value): Pass true to constant_value_1's new
bool parameter.
(decl_really_constant_value): Add bool parameter and forward it
to constant_value_1.
(decl_constant_value): Likewise, but instead define a new
overload with an added bool parameter.
gcc/testsuite/ChangeLog:
PR c++/96197
* g++.dg/cpp1y/constexpr-array8.C: New test.
Associative array literal keys with alignment holes are now filled using
memset() prior to usage, with LTR evaluation of side-effects enforced.
gcc/d/ChangeLog:
PR d/96152
* d-codegen.cc (build_array_from_exprs): New function.
* d-tree.h (build_array_from_exprs): Declare.
* expr.cc (ExprVisitor::visit (AssocArrayLiteralExp *)): Use
build_array_from_exprs to generate key and value arrays.
gcc/testsuite/ChangeLog:
PR d/96152
* gdc.dg/pr96152.d: New test.
The D front-end has C-style variadic functions and va_start/va_arg, so
it is right to also have warnings for inproper use.
gcc/d/ChangeLog:
PR d/96154
* gdc.texi (Warnings): Document -Wvarargs.
* lang.opt: Add -Wvarargs
gcc/testsuite/ChangeLog:
PR d/96154
* gdc.dg/pr96154a.d: New test.
* gdc.dg/pr96154b.d: New test.
Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.
gcc/d/ChangeLog:
PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().
gcc/testsuite/ChangeLog:
PR d/96140
* gdc.dg/pr96140.d: New test.
When compiled as C++20 the COW std::string fails due to assuming that
the allocator always defines size_type and difference_type. That has
been incorrect since C++11, but we got away with it for specializations
using std::allocator until those members were removed in C++20.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (size_type, difference_type):
Use allocator_traits to obtain the allocator's size_type and
difference_type.
On broken systems we only have strtod, not strtof and strtold. Just use
strtod for all types, even though that will produce incorrect results in
some cases.
Similarly, if _GLIBCXX_USE_C99_MATH is not defined then std::isinf won't
be declared. Just refer to it unqualified, which should find the C
library's isinf macro if that hasn't been #undef'd by <cmath>.
libstdc++-v3/ChangeLog:
* src/c++17/floating_from_chars.cc (from_chars_impl): Use
isinf unqualified.
[!_GLIBCXX_USE_C99_STDLIB]: Use strtod for float and long
double.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc:
Use allocator with the correct value type.
* testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc:
Likewise.
Add support for new instructions to test LSB by Byte.
2020-07-29 Will Schmidt <will_schmidt@vnet.ibm.com>
gcc/ChangeLog:
* config/rs6000/altivec.h (vec_test_lsbb_all_ones): New define.
(vec_test_lsbb_all_zeros): New define.
* config/rs6000/rs6000-builtin.def (BU_P10_VSX_1): New built-in
handling macro.
(XVTLSBB_ZEROS, XVTLSBB_ONES): New builtin defines.
(xvtlsbb_all_zeros, xvtlsbb_all_ones): New builtin overloads.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_XVTLSBB_ZEROS,
P10_BUILTIN_VEC_XVTLSBB_ONES): New altivec_builtin_types entries.
* config/rs6000/rs6000.md (UNSPEC_XVTLSBB): New unspec.
* config/rs6000/vsx.md (*xvtlsbb_internal): New instruction define.
(xvtlsbbo, xvtlsbbz): New instruction expands.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/lsbb-runnable.c: New test.
* gcc.target/powerpc/lsbb.c: New test.
This optimizes the code generation of simple array assignments, inlining
the array bounds checking code so there is no reliance on the library
routine _d_arraycopy(), which also deals with postblit and copy
constructors for non-trivial arrays.
gcc/d/ChangeLog:
* expr.cc (ExprVisitor::visit (AssignExp *)): Inline bounds checking
for simple array assignments.
gcc/testsuite/ChangeLog:
* gdc.dg/array1.d: New test.
Generating calls to memset, memcpy, and memcmp is frequent enough that
it becomes beneficial to put them into their own routine. All parts of
the front-end have been updated to call the new helper functions instead
of doing it themselves.
gcc/d/ChangeLog:
* d-codegen.cc (build_memcmp_call): New function.
(build_memcpy_call): New function.
(build_memset_call): New function.
(build_float_identity): Call build_memcmp_call.
(lower_struct_comparison): Likewise.
(build_struct_comparison): Likewise.
* d-tree.h (build_memcmp_call): Declare.
(build_memcpy_call): Declare.
(build_memset_call): Declare.
* expr.cc (ExprVisitor::visit (EqualExp *)): Call build_memcmp_call.
(ExprVisitor::visit (AssignExp *)): Call build_memset_call.
(ExprVisitor::visit (ArrayLiteralExp *)): Call build_memcpy_call.
(ExprVisitor::visit (StructLiteralExp *)): Call build_memset_call.
None of these functions need access to the context pointer of the
visitor class, so have been made free standing.
gcc/d/ChangeLog:
* expr.cc (needs_postblit): Move out of ExprVisitor as a static
function. Update all callers.
(needs_dtor): Likewise.
(lvalue_p): Likewise.
(binary_op): Likewise.
(binop_assignment): Likewise.
The COW string doesn't accept const_iterator arguments in insert and
related member functions. Pass a mutable iterator instead.
libstdc++-v3/ChangeLog:
* testsuite/20_util/from_chars/4.cc: Pass non-const iterator
to string::insert.
With --enable-cet, require CET support only for the final GCC build.
Don't enable CET without CET support for non-bootstrap build, in stage1
nor for build support.
config/
PR bootstrap/96202
* cet.m4 (GCC_CET_HOST_FLAGS): Don't enable CET without CET
support in stage1 nor for build support.
gcc/
PR bootstrap/96202
* configure: Regenerated.
libbacktrace/
PR bootstrap/96202
* configure: Regenerated.
libcc1/
PR bootstrap/96202
* configure: Regenerated.
libcpp/
PR bootstrap/96202
* configure: Regenerated.
libdecnumber/
PR bootstrap/96202
* configure: Regenerated.
libiberty/
PR bootstrap/96202
* configure: Regenerated.
lto-plugin/
PR bootstrap/96202
* configure: Regenerated.
Previously it was not possible to add -fno-exceptions to the testsuite
flags, because some files that are compiled by the v3-build_support
procedure failed with exceptions disabled.
This adjusts those files to still compile without exceptions (with
degraded functionality in some cases).
The sole testcase that explicitly checks for -fno-exceptions has also
been adjusted to use the more robust exceptions_enabled effective-target
keyword from gcc/testsuite/lib/target-supports.exp.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/bool/72847.cc: Use the
exceptions_enabled effective-target keyword instead of
checking for an explicit -fno-exceptions option.
* testsuite/util/testsuite_abi.cc (examine_symbol): Remove
redundant try-catch.
* testsuite/util/testsuite_allocator.h [!__cpp_exceptions]:
Do not define check_allocate_max_size and memory_resource.
* testsuite/util/testsuite_containers.h: Replace comment with
#error if wrong standard dialect used.
* testsuite/util/testsuite_shared.cc: Likewise.
Intrinsics are now matched explicitly, rather than through a common
alias where there are multiple overrides for a common intrinsic.
Where there is a corresponding DECL_FUNCTION_CODE, that is now stored in
the D intrinsic array. All run-time std.math intrinsics have been
removed, as the library implementation already forwards to core.math.
gcc/d/ChangeLog:
* d-tree.h (DEF_D_INTRINSIC): Rename second argument from A to B.
* intrinsics.cc (intrinsic_decl): Add built_in field.
(DEF_D_INTRINSIC): Rename second argument from ALIAS to BUILTIN.
(maybe_set_intrinsic): Handle new intrinsic codes.
(expand_intrinsic_bt): Likewise.
(expand_intrinsic_checkedint): Likewise.
(expand_intrinsic_bswap): Remove.
(expand_intrinsic_sqrt): Remove.
(maybe_expand_intrinsic): Group together intrinsic cases that map
directly to gcc built-ins.
* intrinsics.def (DEF_D_BUILTIN): Rename second argument from A to B.
Update all callers to pass equivalent DECL_FUNCTION_CODE.
(DEF_CTFE_BUILTIN): Likewise.
(STD_COS): Remove intrinsic.
(STD_FABS): Remove intrinsic.
(STD_LDEXP): Remove intrinsic.
(STD_RINT): Remove intrinsic.
(STD_RNDTOL): Remove intrinsic.
(STD_SIN): Remove intrinsic.
(STD_SQRTF): Remove intrinsic.
(STD_SQRT): Remove intrinsic.
(STD_SQRTL): Remove intrinsic.
gcc/testsuite/ChangeLog:
* gdc.dg/intrinsics.d: New test.
In the face of the more complex tricks in reassoc with respect
to negate processing it can happen that the expression rewrite
is fooled to recurse on a leaf and pick up a bogus expression
code. The following patch makes the expression rewrite more
robust in providing the expression code to it directly since
it is the same for all operations in a chain.
2020-07-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/96370
* tree-ssa-reassoc.c (rewrite_expr_tree): Add operation
code parameter and use it instead of picking it up from
the stmt that is being rewritten.
(reassociate_bb): Pass down the operation code.
* gcc.dg/pr96370.c: New testcase.
This patch provides standard vec_extract and vec_set patterns to the
nvptx backend, to extract an element from a PTX vector and set an
element of a PTX vector respectively. PTX vectors (I hesitate to
call them SIMD vectors) may contain up to four elements, so vector
modes up to size four are supported by this patch even though the
nvptx backend currently only allows V2SI and V2DI, i.e. two out
of the ten possible vector modes.
As an example of the improvement, the following C function:
typedef int __v2si __attribute__((__vector_size__(8)));
int foo (__v2si arg) { return arg[0]+arg[1]; }
previously generated this code using a shift:
mov.u64 %r25, %ar0;
ld.v2.u32 %r26, [%r25];
mov.b64 %r28, %r26;
shr.s64 %r30, %r28, 32;
cvt.u32.u32 %r31, %r26.x;
cvt.u32.u64 %r32, %r30;
add.u32 %value, %r31, %r32;
but with this patch now generates:
mov.u64 %r25, %ar0;
ld.v2.u32 %r26, [%r25];
mov.u32 %r28, %r26.x;
mov.u32 %r29, %r26.y;
add.u32 %value, %r28, %r29;
I've implemented these getters and setters as their own instructions
instead of attempting the much more intrusive patch of changing the
backend's definition of register_operand. Given the limited utility
of PTX vectors, I'm not convinced that attempting to support them as
operands in every instruction would be worth the effort involved.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with "make" and "make check" with no new regressions.
2020-07-15 Roger Sayle <roger@nextmovesoftware.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
* config/nvptx/nvptx.md (nvptx_vector_index_operand): New predicate.
(VECELEM): New mode attribute for a vector's uppercase element mode.
(Vecelem): New mode attribute for a vector's lowercase element mode.
(*vec_set<mode>_0, *vec_set<mode>_1, *vec_set<mode>_2)
(*vec_set<mode>_3): New instructions.
(vec_set<mode>): New expander to generate one of the above insns.
(vec_extract<mode><Vecelem>): New instruction.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/v2si-vec-set-extract.c: New test.
Based on the collected numbers in PR95435, I suggest the following
tuning changes:
gcc/ChangeLog:
PR target/95435
* config/i386/x86-tune-costs.h: Use libcall for large sizes for
-m32. Start using libcall from 128+ bytes.
In the testcase below, template argument deduction for the call
g(id<int>) goes wrong because the functions in the overload set id<int>
each have a yet-undeduced auto return type, and this undeduced return
type makes try_one_overload fail to match up any of the overloads with
g's parameter type, leading to g's template argument going undeduced and
to the overload set going unresolved.
This patch fixes this issue by performing return type deduction via
instantiation before doing try_one_overload, in a manner similar to what
resolve_address_of_overloaded_function does.
gcc/cp/ChangeLog:
PR c++/64194
* pt.c (resolve_overloaded_unification): If the function
template specialization has a placeholder return type,
then instantiate it before attempting unification.
gcc/testsuite/ChangeLog:
PR c++/64194
* g++.dg/cpp1y/auto-fn60.C: New test.
In the below testcase, we're ICEing from alias_ctad_tweaks ultimately
because the implied deduction guide for X's user-defined constructor
already has constraints associated with it. We then carry over these
constraints to 'fprime', the overlying deduction guide for the alias
template Y, via tsubst_decl from alias_ctad_tweaks. Later in
alias_ctad_tweaks we call get_constraints followed by set_constraints
without doing remove_constraints in between, which triggers the !found
assert in set_constraints.
This patch fixes this issue by adding an intervening call to
remove_constraints.
gcc/cp/ChangeLog:
PR c++/95486
* pt.c (alias_ctad_tweaks): Call remove_constraints before
calling set_constraints.
gcc/testsuite/ChangeLog:
PR c++/95486
* g++.dg/cpp2a/class-deduction-alias3.C: New test.
In the below testcase, duplicate_decls wasn't merging the tsubsted
friend declaration for 'void add(auto)' with its definition, because
reduce_template_parm_level (during tsubst_friend_function) lost the
DECL_VIRTUAL_P flag on the auto's invented template parameter, which
caused template_heads_equivalent_p to deem the two template heads as not
equivalent in C++20 mode.
This patch makes reduce_template_parm_level carry over the
DECL_VIRTUAL_P flag from the original TEMPLATE_PARM_DECL.
gcc/cp/ChangeLog:
PR c++/96106
* pt.c (reduce_template_parm_level): Propagate DECL_VIRTUAL_P
from the original TEMPLATE_PARM_DECL to the new lowered one.
gcc/testsuite/ChangeLog:
PR c++/96106
* g++.dg/concepts/abbrev7.C: New test.
When considering to instantiate a member of a class template as part of
an explicit instantiation of the class template, we need to first check
the member's constraints before proceeding with the instantiation of the
member.
gcc/cp/ChangeLog:
PR c++/96164
* constraint.cc (constraints_satisfied_p): Return true if
!flags_concepts.
* pt.c (do_type_instantiation): Update a paragraph taken from
[temp.explicit] to reflect the latest specification. Don't
instantiate a member with unsatisfied constraints.
gcc/testsuite/ChangeLog:
PR c++/96164
* g++.dg/cpp2a/concepts-explicit-inst5.C: New test.
The following patch addds support for PTX's rcp.rn.f32 and rcp.rn.f64
instructions. Note that the "rcp.rn" forms of this instruction
calculate the fully IEEE compliant result for the reciprocal, unlike
the rcp.approx variants that just provide fast approximations.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with "make" and "make check" with no new regressions.
2020-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog:
* config/nvptx/nvptx.md (recip<mode>2): New instruction.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/recip-1.c: New test.