Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/uniform_int_dist.h (uniform_int_distribution::_S_nd):
New member function implementing Lemire's "nearly divisionless"
algorithm.
(uniform_int_distribution::operator()): Use _S_nd when the range
of the URBG is the full width of the result type.
We usually export variables in recipes this way. I'm not sure it's
necessary, but it's consistent.
libstdc++-v3/ChangeLog:
* testsuite/Makefile.am: Set and export variable separately.
* testsuite/Makefile.in: Regenerate.
It looks like our check-performance target runs completely unoptimized,
which is a bit silly. This exports the CXXFLAGS from the parent make
process to the check_performance script.
libstdc++-v3/ChangeLog:
* scripts/check_performance: Use gnu++11 instead of gnu++0x.
* testsuite/Makefile.am (check-performance): Export CXXFLAGS to
child process.
* testsuite/Makefile.in: Regenerate.
This tests std::uniform_int_distribution with various parameters and
engines.
libstdc++-v3/ChangeLog:
* testsuite/performance/26_numerics/random_dist.cc: New test.
For sources which can't use any vector instructions, <x86intrin.h> and
<immintrin.h> cannot be included for compiler intrinsics:
$ echo "#include <x86intrin.h>" | gcc -S -O2 -mno-sse -mno-mmx -x c -
In file included from /usr/include/stdlib.h:1013,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/mm_malloc.h:27,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/xmmintrin.h:34,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/immintrin.h:29,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/x86intrin.h:32,
from <stdin>:1:
/usr/include/bits/stdlib-float.h: In function ‘atof’:
/usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
26 | {
| ^
$
libgcc/config/i386/shadow-stack-unwind.h has a workaround:
/* NB: We need _get_ssp and _inc_ssp from <cetintrin.h>. But we can't
include <x86intrin.h> which ends up including <mm_malloc.h>, which
includes <stdlib.h> and <errno.h> unconditionally. But we can't
include any libc system headers unconditionally from libgcc. Avoid
including <mm_malloc.h> here by defining _IMMINTRIN_H_INCLUDED. */
#define _IMMINTRIN_H_INCLUDED
#include <cetintrin.h>
#undef _IMMINTRIN_H_INCLUDED
Add a standalone intrinsic header file, <x86gprintrin.h>, to provide
integer only intrinsics. All integer only intrinsics are placed in
<x86gprintrin.h>. <x86intrin.h> and <immintrin.h> simply include
<x86gprintrin.h>.
gcc/
PR target/97148
* config.gcc (extra_headers): Add x86gprintrin.h.
* config/i386/adxintrin.h: Check _X86GPRINTRIN_H_INCLUDED for
<x86gprintrin.h>.
* config/i386/bmi2intrin.h: Likewise.
* config/i386/bmiintrin.h: Likewise.
* config/i386/cetintrin.h: Likewise.
* config/i386/cldemoteintrin.h: Likewise.
* config/i386/clflushoptintrin.h: Likewise.
* config/i386/clwbintrin.h: Likewise.
* config/i386/enqcmdintrin.h: Likewise.
* config/i386/fxsrintrin.h: Likewise.
* config/i386/ia32intrin.h: Likewise.
* config/i386/lwpintrin.h: Likewise.
* config/i386/lzcntintrin.h: Likewise.
* config/i386/movdirintrin.h: Likewise.
* config/i386/pconfigintrin.h: Likewise.
* config/i386/pkuintrin.h: Likewise.
* config/i386/rdseedintrin.h: Likewise.
* config/i386/rtmintrin.h: Likewise.
* config/i386/serializeintrin.h: Likewise.
* config/i386/tbmintrin.h: Likewise.
* config/i386/tsxldtrkintrin.h: Likewise.
* config/i386/waitpkgintrin.h: Likewise.
* config/i386/wbnoinvdintrin.h: Likewise.
* config/i386/xsavecintrin.h: Likewise.
* config/i386/xsaveintrin.h: Likewise.
* config/i386/xsaveoptintrin.h: Likewise.
* config/i386/xsavesintrin.h: Likewise.
* config/i386/xtestintrin.h: Likewise.
* config/i386/immintrin.h: Include <x86gprintrin.h> instead of
<fxsrintrin.h>, <xsaveintrin.h>, <xsaveoptintrin.h>,
<xsavesintrin.h>, <xsavecintrin.h>, <lzcntintrin.h>,
<bmiintrin.h>, <bmi2intrin.h>, <xtestintrin.h>, <cetintrin.h>,
<movdirintrin.h>, <sgxintrin.h, <pconfigintrin.h>,
<waitpkgintrin.h>, <cldemoteintrin.h>, <enqcmdintrin.h>,
<serializeintrin.h>, <tsxldtrkintrin.h>, <adxintrin.h>,
<clwbintrin.h>, <clflushoptintrin.h>, <wbnoinvdintrin.h> and
<pkuintrin.h>.
(_wbinvd): Moved to config/i386/x86gprintrin.h.
(_rdrand16_step): Likewise.
(_rdrand32_step): Likewise.
(_rdpid_u32): Likewise.
(_readfsbase_u32): Likewise.
(_readfsbase_u64): Likewise.
(_readgsbase_u32): Likewise.
(_readgsbase_u64): Likewise.
(_writefsbase_u32): Likewise.
(_writefsbase_u64): Likewise.
(_writegsbase_u32): Likewise.
(_writegsbase_u64): Likewise.
(_rdrand64_step): Likewise.
(_ptwrite64): Likewise.
(_ptwrite32): Likewise.
* config/i386/x86gprintrin.h: New file.
* config/i386/x86intrin.h: Include <x86gprintrin.h>. Don't
include <ia32intrin.h>, <lwpintrin.h>, <tbmintrin.h>,
<popcntintrin.h>, <mwaitxintrin.h> and <clzerointrin.h>.
gcc/testsuite/
* gcc.target/i386/avx-1.c (__builtin_ia32_lwpval32): New to
support <lwpintrin.h> included in <x86gprintrin.h>.
(__builtin_ia32_lwpval64): Likewise.
(__builtin_ia32_lwpins32): Likewise.
(__builtin_ia32_lwpins64): Likewise.
(__builtin_ia32_bextri_u32): New to support <tbmintrin.h>
included in <x86gprintrin.h>.
(__builtin_ia32_bextri_u64): Likewise.
* gcc.target/i386/x86gprintrin-1.c: New test.
* gcc.target/i386/x86gprintrin-2.c: Likewise.
* gcc.target/i386/x86gprintrin-3.c: Likewise.
* gcc.target/i386/x86gprintrin-4.c: Likewise.
* gcc.target/i386/x86gprintrin-4a.c: Likewise.
* gcc.target/i386/x86gprintrin-5.c: Likewise.
* gcc.target/i386/x86gprintrin-5a.c: Likewise.
* gcc.target/i386/x86gprintrin-5b.c: Likewise.
* gcc.target/i386/x86gprintrin-6.c: Likewise.
libgcc/
PR target/97148
* config/i386/shadow-stack-unwind.h: Include <x86gprintrin.h>
instead of <cetintrin.h>.
The nvptx-as assembler verifies the ptx code using ptxas, if there's any
in the PATH.
The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.
Consequently we cannot build gcc against that release (although we should
still be able to build without any cuda release).
Fix this by setting -misa=sm_35 by default.
Tested check-gcc on nvptx.
Tested libgomp on x86_64-linux with nvpx accelerator.
Both build again cuda 9.1.
gcc/ChangeLog:
2020-10-09 Tom de Vries <tdevries@suse.de>
PR target/97348
* config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
default is used.
* config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.
The following adds a effective target to properly allow
the gcc.dg/vect/pr65947-3.c expected vectorization to be adjusted
when run with, say, -march=cascadelake.
2020-10-09 Richard Biener <rguenther@suse.de>
gcc/
* doc/sourcebuild.texi (vect_masked_load): Document.
gcc/testsuite
* lib/target-supports.exp (check_effective_target_vect_masked_load):
New effective target.
* gcc.dg/vect/pr65947-3.c: Update.
We're running into a multiplication with one unvectorizable
operand we expect to build from scalars but SLP discovery
fatally fails the build of both since one stmt is commutated:
_60 = _58 * _59;
_63 = _59 * _62;
_66 = _59 * _65;
...
where _59 is the "bad" operand. The following patch makes the
case work where the first stmt has a good operand by not fatally
failing the SLP build for the operand but communicating upwards
how to commutate.
2020-10-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97334
* tree-vect-slp.c (vect_build_slp_tree_1): Do not fatally
fail lanes other than zero when BB vectorizing.
* gcc.dg/vect/bb-slp-pr65935.c: Amend.
Just use edge insertion which will appropriately handle the situation
from botan.
2020-10-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97347
* tree-vect-slp.c (vect_create_constant_vectors): Use
edge insertion when inserting on the fallthru edge,
appropriately insert at the start of BBs when inserting
after PHIs.
* g++.dg/vect/pr97347.cc: New testcase.
gcc/ChangeLog:
PR tree-optimization/97317
* range-op.cc (operator_cast::op1_range): Handle casts where the precision
of the RHS is only 1 greater than the precision of the LHS.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97317.c: New test.
This fixes leaks discovered checking whether I introduced new ones
with the last vectorizer changes.
2020-10-09 Richard Biener <rguenther@suse.de>
* cgraphunit.c (expand_all_functions): Free tp_first_run_order.
* ipa-modref.c (pass_ipa_modref::execute): Free order.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free
loop body.
* tree-vect-data-refs.c (vect_find_stmt_data_reference): Free
data references upon failure.
* tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs
array of the original loop.
* tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for
dataref_groups to release its memory.
> Perhaps another way out of this would be document and enforce that
> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2
The following patch implements that, i.e. __builtin_c?z* now take full
advantage of them being UB at zero, while the ifns are well defined at zero
if *_DEFINED_VALUE_AT_ZERO (*) == 2. That is what fixes PR94801.
Furthermore, to fix PR97312, if it is well defined at zero and the value at
zero is prec, we don't lower the maximum unless the argument is known to be
non-zero.
For gimple-range.cc I guess we could improve it if needed e.g. by returning
a [0,7][32,32] range for .CTZ of e.g. [0,137], but for now it (roughly)
matches what vr-values.c does.
2020-10-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94801
PR target/97312
* vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ,
CASE_CFN_CTZ>: When stmt is not an internal-fn call or
C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero
and thus use [0, prec-1] range unless it can be further improved.
For CTZ, don't update maxi from upper bound if it was previously prec.
* gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ,
CASE_CFN_CTZ>: Likewise.
* gcc.dg/tree-ssa/pr94801.c: New test.
And no testcase was included, I'm including one below.
Anyway, this PR and the other CTZ related discussions led me to discover a
bug I've made earlier, CLZ/CTZ builtins have unsigned arguments and e.g.
both the vr-values.cc and now gimple-range.cc code heavily relies on that,
but __builtin_ffs has a signed operand and this optimization was incorrectly
making the operand signed too, so I guess it would greatly confuse VRP in
some cases.
2020-10-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97325
* match.pd (FFS(nonzero) -> CTZ(nonzero) + 1): Cast argument to
corresponding unsigned type.
* gcc.c-torture/execute/pr97325.c: New test.
This fixes a vector CTOR insertion issue when we try to insert after
a PHI node.
2020-10-09 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_create_constant_vectors): Properly insert
after PHIs.
* gcc.dg/vect/bb-slp-phis-1.c: New testcase.
This rewrites ranges::construct_at in terms of std::construct_at so
that we can piggyback on the compiler's existing support for
intercepting placement new within std::construct_at during constexpr
evaluation, instead of having to additionally teach the compiler about
ranges::construct_at.
While we're making changes to ranges::construct_at, this patch also
declares it conditionally noexcept and qualifies the calls to declval in
its requires-clause.
libstdc++-v3/ChangeLog:
PR libstdc++/95788
* include/bits/ranges_uninitialized.h:
(__construct_at_fn::operator()): Rewrite in terms of
std::construct_at. Declare it conditionally noexcept. Qualify
calls to declval in its requires-clause.
* testsuite/20_util/specialized_algorithms/construct_at/95788.cc:
New test.
Here we're trying to push into a<T>::c<N> in order to instantiate t<N>, but
were building a TYPENAME_TYPE for it because a<T> isn't open yet. Don't
do that when we know we're trying to enter the scope.
gcc/cp/ChangeLog:
PR c++/96805
PR c++/96199
* pt.c (tsubst_aggr_type): Don't build a TYPENAME_TYPE when
entering_scope.
(tsubst_template_decl): Use tsubst_aggr_type.
gcc/testsuite/ChangeLog:
PR c++/96805
* g++.dg/cpp0x/alias-decl-pr96805.C: New test.
This is a first step towards enabling the sincos optimization in Ada.
The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries passed as
arguments to SIN and COS intrinsics. In Ada, different float types
may be used but, despite their representation equivalence, their
distinctness causes the optimization to be skipped, because they are
not the types that mathfn_built_in expects.
This patch introduces a function that maps intrinsics to the type
they're associated with, and uses that type, obtained from the
intrinsics used in calls to be optimized, to look up the correspoding
CEXPI intrinsic.
For the sake of defensive programming, when using the type obtained
from the intrinsic, it now checks that, if different types are found
for the used argument, or for other calls that use it, that the types
are interchangeable.
for gcc/ChangeLog
* builtins.c (mathfn_built_in_type): New.
* builtins.h (mathfn_built_in_type): Declare.
* tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to
obtain the type expected by the intrinsic.
Using the tokenizer to sniff for an initial line marker for
preprocessed input is a little brittle, particularly with
-fdirectives-only. If there is no marker we'll happily munch initial
comments. This patch directly sniffs the buffer. This is safe
because the initial line marker was machine generated and must be
right at the beginning of the file. Anything else is not such a line
marker. The same is true for the initial directory marker. For that
tokenizing the string is simplest, but at that point it's either a
regular line marker or a directory marker. If it's a regular marker,
unwinding tokens is fine.
libcpp/
* internal.h (enum include_type): Rename IT_MAIN_INJECT to
IT_PRE_MAIN.
* init.c (cpp_read_main_file): If there is no line marker, adjust
the initial line marker.
(read_original_filename): Return bool, peek the buffer directly
before trying to tokenize.
(read_original_directory): Likewise. Directly prod the string
literal.
* files.c (_cpp_stack_file): Adjust for IT_PRE_MAIN change.
Rename our BU_P10_MISC_2 built-in define macro to be
BU_P10_POWERPC64_MISC_2. This more accurately reflects
that the macro includes the RS6000_BTM_POWERPC64 entry,
and matches the style we used for the P7 equivalent.
gcc/ChangeLog:
* config/rs6000/rs6000-builtin.def (BU_P10_MISC_2): Rename
to BU_P10_POWERPC64_MISC_2.
CFUGED, CNTLZDM, CNTTZDM, PDEPD, PEXTD): Call renamed macro.
When gas outputs DWARF5 .debug_line[_str] then we have to tell it the
comp_dir and main file name for the zero entry line table. Otherwise
gas has to guess at the CU compilation directory and file.
Before a gcc -gdwarf-5 ../src/hello.c line table looked like:
Directory table:
0 ../src (24)
1 ../src (24)
2 /usr/include (31)
File name table:
0 hello.c (16), 0
1 hello.c (16), 1
2 stdio.h (44), 2
With this patch it looks like:
Directory table:
0 /tmp/obj (0)
1 ../src (24)
2 /usr/include (31)
File name table:
0 ../src/hello.c (9), 0
1 hello.c (16), 1
2 stdio.h (44), 2
gcc/ChangeLog:
* dwarf2out.c (dwarf2out_finish): Emit .file 0 entry when
generating DWARF5 .debug_line table through gas.
These three distributions all require 0 < S where S is the sum of the
weights. When the sum is zero there's an undefined FP division by zero.
Add assertions to help users diagnose the problem.
libstdc++-v3/ChangeLog:
PR libstdc++/82584
* include/bits/random.tcc
(discrete_distribution::param_type::_M_initialize)
(piecewise_constant_distribution::param_type::_M_initialize)
(piecewise_linear_distribution::param_type::_M_initialize):
Add assertions for positive sums..
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line.
__arm_vcvtnq_u32_f32 was missing from arm_mve.h, although the s32_f32 and
[su]16_f16 versions were present.
This patch adds the missing version and testcase, which are
cut-and-paste from the other versions.
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (__arm_vcvtnq_u32_f32): New.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c: New test.
This work from Martin Liska was motivated by gcc.dg/vect/bb-slp-22.c
which shows how poorly we currently BB vectorize code like
a0 = in[0] + 23;
a1 = in[1] + 142;
a2 = in[2] + 2;
a3 = in[3] + 31;
if (x > y)
{
b[0] = a0;
b[1] = a1;
b[2] = a2;
b[3] = a3;
}
else
{
out[0] = a0 * (x + 1);
out[1] = a1 * (y + 1);
out[2] = a2 * (x + 1);
out[3] = a3 * (y + 1);
}
namely by vectorizing the stores but not the common load (and add)
they are feeded with.
Thus with the following patch we change the BB vectorizer from
operating on a single basic-block at a time to consider somewhat
larger regions (but not the whole function yet because of issues
with vector size iteration).
I took the opportunity to remove the fancy region iterations again
now that we operate on BB granularity and in the end need to visit
PHI nodes as well.
2020-10-08 Martin Liska <mliska@suse.cz>
Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_bb_vec_info::const_iterator): Remove.
(_bb_vec_info::const_reverse_iterator): Likewise.
(_bb_vec_info::region_stmts): Likewise.
(_bb_vec_info::reverse_region_stmts): Likewise.
(_bb_vec_info::_bb_vec_info): Adjust.
(_bb_vec_info::bb): Remove.
(_bb_vec_info::region_begin): Remove.
(_bb_vec_info::region_end): Remove.
(_bb_vec_info::bbs): New vector of BBs.
(vect_slp_function): Declare.
* tree-vect-patterns.c (vect_determine_precisions): Use
regular stmt iteration.
(vect_pattern_recog): Likewise.
* tree-vect-slp.c: Include cfganal.h, tree-eh.h and tree-cfg.h.
(vect_build_slp_tree_1): Properly refuse to vectorize
volatile and throwing stmts.
(vect_build_slp_tree_2): Pass group-size down to
get_vectype_for_scalar_type.
(_bb_vec_info::_bb_vec_info): Use regular stmt iteration,
adjust for changed region specification.
(_bb_vec_info::~_bb_vec_info): Likewise.
(vect_slp_check_for_constructors): Likewise.
(vect_slp_region): Likewise.
(vect_slp_bbs): New worker operating on a vector of BBs.
(vect_slp_bb): Wrap it.
(vect_slp_function): New function splitting the function
into multi-BB regions.
(vect_create_constant_vectors): Handle the case of inserting
after a throwing def.
(vect_schedule_slp_instance): Adjust.
* tree-vectorizer.c (vec_info::remove_stmt): Simplify again.
(vec_info::insert_seq_on_entry): Adjust.
(pass_slp_vectorize::execute): Also init PHIs. Call
vect_slp_function.
* gcc.dg/vect/bb-slp-22.c: Adjust.
* gfortran.dg/pr68627.f: Likewise.
The new constructors that C++11 added to std::ios_base::failure were
missing for the old ABI. This adds them, but just ignores the
std::error_code argument (because there's nowhere to store it).
This also adds a code() member, which should be provided by the
std::system_error base class, but that base class isn't present in the
old ABI.
This allows the old ios::failure to be used in code that expects the new
API, although with reduced functionality.
libstdc++-v3/ChangeLog:
* include/bits/ios_base.h (ios_base::failure): Add constructors
takeing error_code argument. Add code() member function.
* testsuite/27_io/ios_base/failure/cxx11.cc: Allow test to
run for the old ABI but do not check for derivation from
std::system_error.
* testsuite/27_io/ios_base/failure/error_code.cc: New test.
My previous attempt to fix this only worked when m is a power of two.
There is still a bug when a=00 and !has_single_bit(m).
Instead of trying to make _Mod work for a==0 this change ensures that we
never instantiate it with a==0. For C++17 we can use if-constexpr, but
otherwise we need to use a different multipler. It doesn't matter what
we use, as it won't actually be called, only instantiated.
libstdc++-v3/ChangeLog:
* include/bits/random.h (__detail::_Mod): Revert last change.
(__detail::__mod): Do not use _Mod for a==0 case.
* testsuite/26_numerics/random/linear_congruential_engine/operators/call.cc:
Check other cases with a==0. Also check runtime results.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line.
This fixes bad placement of sunk loads.
2020-10-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/97330
* tree-ssa-sink.c (statement_sink_location): Avoid skipping
PHIs when they dominate the insert location.
* gcc.dg/torture/pr97330-1.c: New testcase.
* gcc.dg/torture/pr97330-2.c: Likewise.
The arm target hook for divmod wasn't prepared to handle constants passed to
the function.
2020-10-08 Jakub Jelinek <jakub@redhat.com>
PR target/97322
* config/arm/arm.c (arm_expand_divmod_libfunc): Pass mode instead of
GET_MODE (op0) or GET_MODE (op1) to emit_library_call_value.
* gcc.dg/pr97322.c: New test.
gcc/ChangeLog:
PR tree-optimization/97315
* range-op.cc (value_range_with_overflow): Change any
non-overflow calculation in which both bounds are
overflow/underflow to be undefined.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97315-2.c: New test.
gcc/ChangeLog:
PR tree-optimization/97315
* gimple-ssa-evrp.c (hybrid_folder::choose_value): Removes the
trap and instead annotates the listing.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97315-1.c: New test.
The following testcase FAILs, because we don't mark the child OpenMP function
as cfun->calls_alloca when it does call alloca. When optimizing, during DCE we
reset those flags and recompute them again, but with -O0 DCE is not performed.
Fixed by calling notice_special_calls when moving insns to the child function.
cfun->calls_alloca is normally set during gimplification and most of the
alloca calls omp-low.c does go through the gimplifier, but one spot didn't
and built the gcall directly, so that one needs to set calls_alloca too.
2020-10-08 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/97294
* tree-cfg.c (move_block_to_fn): Call notice_special_calls on
call stmts being moved into dest_cfun.
* omp-low.c (lower_rec_input_clauses): Set cfun->calls_alloca when
adding __builtin_alloca_with_align call without gimplification.
* gcc.dg/asan/pr97294.c: New test.
Using this patch, when using GOMP_DEBUG=1 and launching a kernel in
GOMP_OFFLOAD_run (used by the omp implementation), we see the kernel launch
dimensions:
...
GOMP_OFFLOAD_run: kernel main$_omp_fn$0: \
launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 1), 1]
...
Build on x86_64-linux with nvptx accelerator, tested libgomp.
libgomp/ChangeLog:
2020-10-08 Tom de Vries <tdevries@suse.de>
PR libgomp/81802
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_run): Report launch
dimensions.
This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
get here during processing of the first 'auto' parameter of an
abbreviated function template, or if we're processing the type
constraint of a non-templated variable. In the testcase below, this
leads to an ICE when coerce_template_parms instantiates C's dependent
default template argument.
gcc/cp/ChangeLog:
PR c++/97052
* constraint.cc (build_type_constraint): Temporarily increment
processing_template_decl before calling build_concept_check.
* pt.c (make_constrained_placeholder_type): Likewise.
gcc/testsuite/ChangeLog:
PR c++/97052
* g++.dg/cpp2a/concepts-defarg2.C: New test.
In the testcase below, during processing (at parse time) of Y's base
class X<Y>, convert_template_argument calls is_compatible_template_arg
to check if the template argument Y is no more constrained than the
parameter P. But at this point we haven't yet set Y's constraints, so
get_normalized_constraints_from_decl yields NULL_TREE as the normal form
and caches this result into the normalized_map.
We set Y's constraints later in cp_parser_class_specifier_1 but the
stale normal form in the normalized_map remains. This ultimately causes
us to miss the constraint failure for Y<Z> because according to the
cached normal form, Y is not constrained.
This patch fixes this issue by moving up the call to
associate_classtype_constraints so that we set constraints before we
start processing a class's bases.
gcc/cp/ChangeLog:
PR c++/96229
* parser.c (cp_parser_class_specifier_1): Move call to
associate_classtype_constraints from here to ...
(cp_parser_class_head): ... here.
* pt.c (is_compatible_template_arg): Correct documentation to
say "argument is _no_ more constrained than the parameter".
gcc/testsuite/ChangeLog:
PR c++/96229
* g++.dg/cpp2a/concepts-class2.C: New test.
My recent changes to std::exception_ptr moved some members to be inline
in the header but didn't replace the variable names with reserved names.
The "tmp" variable must be fixed. The "other" parameter is actually a
reserved name because of std::allocator<T>::rebind<U>::other but should
be fixed anyway.
There are also some bad uses of "ForwardIterator" in <ranges>.
There's also a "il" parameter in a std::seed_seq constructor in <random>
which is only reserved since C++14.
libstdc++-v3/ChangeLog:
* include/bits/random.h (seed_seq(initializer_list<T>)): Rename
parameter to use reserved name.
* include/bits/ranges_algo.h (shift_left, shift_right): Rename
template parameters to use reserved name.
* libsupc++/exception_ptr.h (exception_ptr): Likewise for
parameters and local variables.
* testsuite/17_intro/names.cc: Check "il". Do not check "d" and
"y" in C++20 mode.
To quickly recap, P0846 says that a name is also considered to refer to
a template if it is an unqualified-id followed by a < and name lookup
finds either one or more functions or finds nothing.
In a template, when parsing a function call that has type-dependent
arguments, we can't perform ADL right away so we set KOENIG_LOOKUP_P in
the call to remember to do it when instantiating the call
(tsubst_copy_and_build/CALL_EXPR). When the called function is a
function template, we represent the call with a TEMPLATE_ID_EXPR;
usually the operand is an OVERLOAD.
In the P0846 case though, the operand can be an IDENTIFIER_NODE, when
name lookup found nothing when parsing the template name. But we
weren't handling this correctly in tsubst_copy_and_build. First
we need to pass the FUNCTION_P argument from <case TEMPLATE_ID_EXPR> to
<case IDENTIFIER_NODE>, otherwise we give a bogus error. And then in
<case CALL_EXPR> we need to perform ADL. The rest of the changes is to
give better errors when ADL didn't find anything.
gcc/cp/ChangeLog:
PR c++/97010
* pt.c (tsubst_copy_and_build) <case TEMPLATE_ID_EXPR>: Call
tsubst_copy_and_build explicitly instead of using the RECUR macro.
Handle a TEMPLATE_ID_EXPR with an IDENTIFIER_NODE as its operand.
<case CALL_EXPR>: Perform ADL for a TEMPLATE_ID_EXPR with an
IDENTIFIER_NODE as its operand.
gcc/testsuite/ChangeLog:
PR c++/97010
* g++.dg/cpp2a/fn-template21.C: New test.
* g++.dg/cpp2a/fn-template22.C: New test.