This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.
Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.
PR middle-end/92120
gcc/cp/ChangeLog:
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL. Remove case of rejecting
'this' when not in declare simd.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses. Remove case of
rejecting 'this' when not in declare simd.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.
gcc/ChangeLog:
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map<tree_operand, tree> *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-lambda-2.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
* g++.dg/gomp/target-this-5.C: New testcase.
* g++.dg/gomp/this-2.C: Adjust testcase.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
libgomp/ChangeLog:
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-23.C: New testcase.
* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.
* testsuite/libgomp.c++/target-this-5.C: New testcase.
This rewrites _Sp_counted_base::_M_release to skip the two atomic
instructions that decrement each of the use count and the weak count
when both are 1.
Benefits: Save the cost of the last atomic decrements of each of the use
count and the weak count in _Sp_counted_base. Atomic instructions are
significantly slower than regular loads and stores across major
architectures.
How current code works: _M_release() atomically decrements the use
count, checks if it was 1, if so calls _M_dispose(), atomically
decrements the weak count, checks if it was 1, and if so calls
_M_destroy().
How the proposed algorithm works: _M_release() loads both use count and
weak count together atomically (assuming suitable alignment, discussed
later), checks if the value corresponds to a 0x1 value in the individual
count members, and if so calls _M_dispose() and _M_destroy().
Otherwise, it follows the original algorithm.
Why it works: When the current thread executing _M_release() finds each
of the counts is equal to 1, then no other threads could possibly hold
use or weak references to this control block. That is, no other threads
could possibly access the counts or the protected object.
There are two crucial high-level issues that I'd like to point out first:
- Atomicity of access to the counts together
- Proper alignment of the counts together
The patch is intended to apply the proposed algorithm only to the case of
64-bit mode, 4-byte counts, and 8-byte aligned _Sp_counted_base.
** Atomicity **
- The proposed algorithm depends on the mutual atomicity among 8-byte
atomic operations and 4-byte atomic operations on each of the 4-byte halves
of the 8-byte aligned 8-byte block.
- The standard does not guarantee atomicity of 8-byte operations on a pair
of 8-byte aligned 4-byte objects.
- To my knowledge this works in practice on systems that guarantee native
implementation of 4-byte and 8-byte atomic operations.
- __atomic_always_lock_free is used to check for native atomic operations.
** Alignment **
- _Sp_counted_base is an internal base class, with a virtual destructor,
so it has a vptr at the beginning of the class, and will be aligned to
alignof(void*) i.e. 8 bytes.
- The first members of the class are the 4-byte use count and 4-byte
weak count, which will occupy 8 contiguous bytes immediately after the
vptr, i.e. they form an 8-byte aligned 8 byte range.
Other points:
- The proposed algorithm can interact correctly with the current algorithm.
That is, multiple threads using different versions of the code with and
without the patch operating on the same objects should always interact
correctly. The intent for the patch is to be ABI compatible with the
current implementation.
- The proposed patch involves a performance trade-off between saving the
costs of atomic instructions when the counts are both 1 vs adding the cost
of loading the 8-byte combined counts and comparison with {0x1, 0x1}.
- I noticed a big difference between the code generated by GCC vs LLVM. GCC
seems to generate noticeably more code and what seems to be redundant null
checks and branches.
- The patch has been in use (built using LLVM) in a large environment for
many months. The performance gains outweigh the losses (roughly 10 to 1)
across a large variety of workloads.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/c++config (_GLIBCXX_TSAN): Define macro
indicating that TSan is in use.
* include/bits/shared_ptr_base.h (_Sp_counted_base::_M_release):
Replace definition in primary template with explicit
specializations for _S_mutex and _S_atomic policies.
(_Sp_counted_base<_S_mutex>::_M_release): New specialization.
(_Sp_counted_base<_S_atomic>::_M_release): New specialization,
using a single atomic load to access both reference counts at
once.
(_Sp_counted_base::_M_release_last_use): New member function.
Add support for architectures such as AMD GCN, in which the pointer size is
larger than the register size. This allows the CFI information to include
multi-register locations for the stack pointer, frame pointer, and return
address.
This patch was originally posted by Andrew Stubbs in
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552873.html
It has now been re-worked according to the review comments. It does not use
DW_OP_piece or DW_OP_LLVM_piece_end. Instead it uses
DW_OP_bregx/DW_OP_shl/DW_OP_bregx/DW_OP_plus to build the CFA from multiple
consecutive registers. Here is how .debug_frame looks before and after this
patch:
$ cat factorial.c
int factorial(int n) {
if (n == 0) return 1;
return n * factorial (n - 1);
}
$ amdgcn-amdhsa-gcc -g factorial.c -O0 -c -o fac.o
$ llvm-dwarfdump -debug-frame fac.o
*** without this patch (edited for brevity)***
00000000 00000014 ffffffff CIE
DW_CFA_def_cfa: reg48 +0
DW_CFA_register: reg16 reg50
00000018 0000002c 00000000 FDE cie=00000000 pc=00000000...000001ac
DW_CFA_advance_loc4: 96
DW_CFA_offset: reg46 0
DW_CFA_offset: reg47 4
DW_CFA_offset: reg50 8
DW_CFA_offset: reg51 12
DW_CFA_offset: reg16 8
DW_CFA_advance_loc4: 4
DW_CFA_def_cfa_sf: reg46 -16
*** with this patch (edited for brevity)***
00000000 00000024 ffffffff CIE
DW_CFA_def_cfa_expression: DW_OP_bregx SGPR49+0, DW_OP_const1u 0x20, DW_OP_shl, DW_OP_bregx SGPR48+0, DW_OP_plus
DW_CFA_expression: reg16 DW_OP_bregx SGPR51+0, DW_OP_const1u 0x20, DW_OP_shl, DW_OP_bregx SGPR50+0, DW_OP_plus
00000028 0000003c 00000000 FDE cie=00000000 pc=00000000...000001ac
DW_CFA_advance_loc4: 96
DW_CFA_offset: reg46 0
DW_CFA_offset: reg47 4
DW_CFA_offset: reg50 8
DW_CFA_offset: reg51 12
DW_CFA_offset: reg16 8
DW_CFA_advance_loc4: 4
DW_CFA_def_cfa_expression: DW_OP_bregx SGPR47+0, DW_OP_const1u 0x20, DW_OP_shl, DW_OP_bregx SGPR46+0, DW_OP_plus, DW_OP_lit16, DW_OP_minus
gcc/ChangeLog:
* dwarf2cfi.c (dw_stack_pointer_regnum): Change type to struct cfa_reg.
(dw_frame_pointer_regnum): Likewise.
(new_cfi_row): Use set_by_dwreg.
(get_cfa_from_loc_descr): Use set_by_dwreg. Support register spans.
handle DW_OP_bregx with DW_OP_breg{0-31}. Support DW_OP_lit*,
DW_OP_const*, DW_OP_minus, DW_OP_shl and DW_OP_plus.
(lookup_cfa_1): Use set_by_dwreg.
(def_cfa_0): Update for cfa_reg and support register spans.
(reg_save): Change sreg parameter to struct cfa_reg. Support register
spans.
(dwf_cfa_reg): New function.
(dwarf2out_flush_queued_reg_saves): Use dwf_cfa_reg instead of
dwf_regno.
(dwarf2out_frame_debug_def_cfa): Likewise.
(dwarf2out_frame_debug_adjust_cfa): Likewise.
(dwarf2out_frame_debug_cfa_offset): Likewise. Update reg_save usage.
(dwarf2out_frame_debug_cfa_register): Likewise.
(dwarf2out_frame_debug_expr): Likewise.
(create_pseudo_cfg): Use set_by_dwreg.
(initial_return_save): Use set_by_dwreg and dwf_cfa_reg,
(create_cie_data): Use dwf_cfa_reg.
(execute_dwarf2_frame): Use dwf_cfa_reg.
(dump_cfi_row): Use set_by_dwreg.
* dwarf2out.c (build_span_loc, build_breg_loc): New function.
(build_cfa_loc): Support register spans.
(build_cfa_aligned_loc): Update cfa_reg usage.
(convert_cfa_to_fb_loc_list): Use set_by_dwreg.
* dwarf2out.h (struct cfa_reg): New type.
(struct dw_cfa_location): Use struct cfa_reg.
(build_span_loc): New prototype.
co-authored-By: Hafiz Abid Qadeer <abidh@codesourcery.com>
When hardening compares or conditional branches, we perform redundant
tests, and to prevent them from being optimized out, we use asm
statements that preserve a value used in a compare, but in a way that
the compiler can no longer assume it's the same value, so it can't
optimize the redundant test away.
We used to use +g, but that requires general regs or mem. You might
think that, if a reg constraint can't be satisfied, the register
allocator will fall back to memory, but that's not so: we decide on
matching MEMs very early on, by using the same addressable operand on
both input and output, and only if the constraint does not allow
registers. If it does, we use gimple registers and then pseudos as
inputs and outputs, and then inputs can be substituted by equivalent
expressions, and then, if no register contraint fits (e.g. because
that mode won't fit in general regs, or won't fit in regs at all), the
register allocator will give up before even trying to allocate some
temporary memory to unify input and output.
This patch arranges for us to create and use the temporary stack slot
if we can tell the mode requires memory, or won't otherwise fit in
general regs, and thus to use +m for that asm.
for gcc/ChangeLog
PR middle-end/103149
* gimple-harden-conditionals.cc (detach_value): Use memory if
general regs won't do.
for gcc/testsuite/ChangeLog
PR middle-end/103149
* gcc.target/aarch64/pr103149.c: New.
gcc/fortran/ChangeLog:
PR fortran/103607
* frontend-passes.c (do_subscript): Ensure that array bounds are
of type INTEGER before performing checks on array subscripts.
gcc/testsuite/ChangeLog:
PR fortran/103607
* gfortran.dg/pr103607.f90: New test.
This test was failing on i?86 because of:
warning: width of 'A::l' exceeds its type
so change the type to 'long long' and make the test run only on arches
where sizeof(long long) == 8 to avoid failing like this again.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype-bitfield1.C: Change a type to unsigned
long long. Only run on longlong64 targets.
The new rop_ok effective target test doesn't correctly compute its expression
result because a new line starts a new statement. Solution is to remove
the new line.
2021-12-07 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR testsuite/103556
PR testsuite/103586
* lib/target-supports.exp (check_effective_target_rop_ok): Remove '\n'.
gcc/fortran/ChangeLog:
PR fortran/103588
* array.c (gfc_ref_dimen_size): Do not generate internal error on
failed simplification of stride expression; just return failure.
gcc/testsuite/ChangeLog:
PR fortran/103588
* gfortran.dg/pr103588.f90: New test.
gcc/fortran/ChangeLog:
PR fortran/103591
* match.c (match_case_selector): Check type of upper bound in case
range.
gcc/testsuite/ChangeLog:
PR fortran/103591
* gfortran.dg/select_9.f90: New test.
PR middle-end/103438
gcc/ChangeLog:
* config/s390/s390.c (s390_valid_target_attribute_inner_p):
Use new enum CLVC_INTEGER.
* opt-functions.awk: Use new CLVC_INTEGER.
* opts-common.c (set_option): Likewise.
(option_enabled): Return -1,0,1 for CLVC_INTEGER.
(get_option_state): Use new CLVC_INTEGER.
(control_warning_option): Likewise.
* opts.h (enum cl_var_type): Likewise.
Here, decltype deduces the wrong type for certain expressions involving
bit-fields. Unlike in C, in C++ bit-field width is explicitly not part
of the type, so I think decltype should never deduce to 'int:N'. The
problem isn't that we're not calling unlowered_expr_type--we are--it's
that is_bitfield_expr_with_lowered_type only handles certain codes, but
not others. For example, += works fine but ++ does not.
This also fixes decltype-bitfield2.C where we were crashing (!), but
unfortunately it does not fix 84516 or 70733 where the problem is likely
a missing call to unlowered_expr_type. It occurs to me now that typeof
likely has had the same issue, but this patch should fix that too.
PR c++/95009
gcc/cp/ChangeLog:
* typeck.c (is_bitfield_expr_with_lowered_type) <case MODIFY_EXPR>:
Handle UNARY_PLUS_EXPR, NEGATE_EXPR, NON_LVALUE_EXPR, BIT_NOT_EXPR,
P*CREMENT_EXPR too.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype-bitfield1.C: New test.
* g++.dg/cpp0x/decltype-bitfield2.C: New test.
may_propagate_copy unnecessarily restricts propagating non-abnormals
into places that currently contain an abnormal SSA name but are
not the PHI argument for an abnormal edge. This causes VN to
not elide a CFG path that it assumes is elided, resulting in
released SSA names in the IL.
The fix is to enhance the may_propagate_copy API to specify the
destination is _not_ a PHI argument. I chose to not update only
the relevant caller in VN and the may_propagate_copy_into_stmt API
at this point because this is a regression and needs backporting.
2021-12-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/103596
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
Note we are not propagating into a PHI argument to may_propagate_copy.
* tree-ssa-propagate.h (may_propagate_copy): Add
argument specifying whether we propagate into a PHI arg.
* tree-ssa-propagate.c (may_propagate_copy): Likewise.
When not doing so we can replace an abnormal with
something else.
(may_propagate_into_stmt): Update may_propagate_copy calls.
(replace_exp_1): Move propagation checking code to
propagate_value and rename to ...
(replace_exp): ... this and elide previous wrapper.
(propagate_value): Perform checking with adjusted
may_propagate_copy call and dispatch to replace_exp.
* gcc.dg/torture/pr103596.c: New testcase.
The hash_map::traverse overload taking a non-const Value pointer breaks
if the callback returns false. The other overload should behave the
same.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
gcc/ChangeLog:
* hash-map.h (hash_map::traverse): Let both overloads behave the
same.
* predict.c (assert_is_empty): Return true, thus not changing
behavior.
Newlib has reverted the commit that caused us to require a
workaround. As such we can now revert the workaround.
This reverts commit 0e510ab534.
libstdc++-v3/ChangeLog:
PR libstdc++/103305
* config/os/newlib/ctype_base.h (upper, lower, alpha, digit, xdigit,
space, print, graph, cntrl, punct, alnum, blank): Revert.
MIPS release 6 requires the lw/ld/sw/sd can work with
unaligned address, while it can be implemented by
full hardware or trap&emulate.
Since it doesn't have to be fully done by hardware, we add a
pair of options -m(no-)unaligned-access. Kernels may need them.
gcc/ChangeLog:
* config/mips/mips.h (ISA_HAS_UNALIGNED_ACCESS, STRICT_ALIGNMENT):
R6 can unaligned access.
* config/mips/mips.md (movmisalign<mode>): Likewise.
* config/mips/mips.opt: add -m(no-)unaligned-access
* doc/invoke.texi: Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp: add unaligned-access
* gcc.target/mips/unaligned-2.c: New test.
* gcc.target/mips/unaligned-3.c: New test.
When a basic block A has been annotated with a count and it has only one
successor (or predecessor) B, we can propagate the A's count to B.
The algoritm without this change could leave B without an annotation if B had
other unannotated predecessors (or successors). For example, in the test case I added,
the loop header block was left unannotated, which prevented loop unrolling.
gcc/ChangeLog:
* auto-profile.c (afdo_propagate_edge): Improve count propagation algorithm.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops.
Whilst debugging state explosions seen when enabling taint detection
with -fanalyzer (PR analyzer/103533), I noticed that constraint
manager instances could contain stray, redundant constants, such
as this instance:
constraint_manager:
equiv classes:
ec0: {(int)0 == [m_constant]‘0’}
ec1: {(size_t)4 == [m_constant]‘4’}
constraints:
where there are two equivalence classes, each just containing a
constant, with no constraints using them.
This patch makes constraint_manager::canonicalize more aggressive
about purging state, handling the case of purging a redundant
EC containing just a constant.
gcc/analyzer/ChangeLog:
PR analyzer/103533
* constraint-manager.cc (equiv_class::contains_non_constant_p):
New.
(constraint_manager::canonicalize): Call it when determining
redundant ECs.
(selftest::test_purging): New selftest.
(selftest::run_constraint_manager_tests): Likewise.
* constraint-manager.h (equiv_class::contains_non_constant_p):
New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This patch does a little bit of cleanup by removing some unused
arguments, or marking them as unused. It also removes the function
ctfc_debuginfo_early_finish_p and the corresponding hook macro
definition, which are not used by GCC.
gcc/
* config/bpf/bpf.c (bpf_handle_preserve_access_index_attribute):
Mark arguments `args' and flags' as unused.
(bpf_core_newdecl): Remove unused local `newdecl'.
(bpf_core_newdecl): Remove unused argument `loc'.
(ctfc_debuginfo_early_finish_p): Remove unused function.
(TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P): Remove definition.
(bpf_core_walk): Do not pass a location to bpf_core_newdecl.
When compiling an optabs.ii at -O2 with a release-checking build,
there were 6,643,575 calls to gimple_outgoing_range_stmt_p. 96.8% of
them were for blocks with a single successor, which never have a control
statement that generates new range info. This patch therefore adds a
shortcut for that case.
This gives a ~1% compile-time improvement for the test.
I tried making the function inline (in the header) so that the
single_succ_p didn't need to be repeated, but it seemed to make things
slightly worse.
gcc/
* gimple-range-edge.cc (gimple_outgoing_range::edge_range_p): Add
a shortcut for blocks with single successors.
* gimple-range-gori.cc (gori_map::calculate_gori): Likewise.
When compiling an optabs.ii at -O2 with a release-checking build,
the hottest function in the profile was irange_union. This patch
tries to optimise it a bit. The specific changes are:
- Use quick_push rather than safe_push, since the final number
of entries is known in advance.
- Avoid assigning wi::to_wide & co. to a temporary wide_int,
such as in:
wide_int val_j = wi::to_wide (res[j]);
wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
storage. Assigning the result to wide_int forces an unnecessary
copy to temporary storage.
This is one area where "auto" helps a lot. In the end though,
it seemed more readable to inline the wi::to_*s rather than
use auto.
- Use to_widest_int rather than to_wide_int. Both are functionally
correct, but to_widest_int is more efficient, for three reasons:
- to_wide returns a wide-int representation in which the most
significant element might not be canonically sign-extended.
This is because we want to allow the storage of an INTEGER_CST
like 0x1U << 31 to be accessed directly with both a wide_int view
(where only 32 bits matter) and a widest_int view (where many more
bits matter, and where the 32 bits are zero-extended to match the
unsigned type). However, operating on uncanonicalised wide_int
forms is less efficient than operating on canonicalised forms.
- to_widest_int has a constant rather than variable precision and
there are never any redundant upper bits to worry about.
- Using widest_int avoids the need for an overflow check, since
there is enough precision to add 1 to any IL constant without
wrap-around.
This gives a ~2% compile-time speed up with the test above.
I also tried adding a path for two single-pair ranges, but it
wasn't a win.
gcc/
* value-range.cc (irange::irange_union): Use quick_push rather
than safe_push. Use widest_int rather than wide_int. Avoid
assigning wi::to_* results to wide*_int temporaries.
Before walking the CFG and filling all cache entries, check if the
same information is available in a dominator.
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Check for
a range from dominators before filling the cache.
(ranger_cache::range_from_dom): New.
* gimple-range-cache.h (ranger_cache::range_from_dom): Add prototype.
There are times we only need to know if any edge from a block can calculate
a range.
* gimple-range-gori.h (class gori_compute):: Add prototypes.
* gimple-range-gori.cc (gori_compute::has_edge_range_p): Add alternate
API for basic block. Call for edge alterantive.
(gori_compute::may_recompute_p): Ditto.
Add
commit 70b043845d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Nov 30 05:31:26 2021 -0800
libsanitizer: Use SSE to save and restore XMM registers
to LOCAL_PATCHES.
* LOCAL_PATCHES: Add commit 70b043845d.
Use SSE, instead of AVX, to save and restore XMM registers to support
processors without AVX. The affected codes are unused in upstream since
https://github.com/llvm/llvm-project/commit/66d4ce7e26a5
and will be removed in
https://reviews.llvm.org/D112604
This fixed
FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O0 execution test
FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O2 execution test
on machines without AVX.
PR sanitizer/103466
* tsan/tsan_rtl_amd64.S (__tsan_trace_switch_thunk): Replace
vmovdqu with movdqu.
(__tsan_report_race_thunk): Likewise.
The recent fix to PR103527 exposed an issue with how the various
special casing for AVX512 masks in vect_build_gather_load_calls
are handled. The following makes that more obvious, fixing the
miscompile of 403.gcc.
2021-12-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/103581
* tree-vect-stmts.c (vect_build_gather_load_calls): Properly
guard all the AVX512 mask cases.
* gcc.dg/vect/pr103581.c: New testcase.
When SLP reduction chain vectorization support added handling of
an outer conversion in the chain picking a failed reduction up
as SLP reduction that broke the invariant that the whole reduction
was forward reachable. The following plugs that hole noting
a future enhancement possibility.
2021-12-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/103544
* tree-vect-slp.c (vect_analyze_slp): Only add a SLP reduction
opportunity if the stmt in question is the reduction root.
(dot_slp_tree): Add missing check for NULL child.
* gcc.dg/vect/pr103544.c: New testcase.
CSE uses equivalence classes to keep track of expressions that all have the same
values at the current point in the program.
Normal equivalences through SETs only insert and perform lookups in this set but
equivalence determined from comparisons, e.g.
(insn 46 44 47 7 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 105 [ iD.2893 ])
(const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
(expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
(nil)))
creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).
This causes a merge to happen between the two equivalence sets denoted by
(const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.
The operation happens through merge_equiv_classes however this function has an
invariant that the classes to be merge not contain any duplicates. This is
because it frees entries before merging.
The given testcase when using the supplied flags trigger an ICE due to the
equivalence set being
(rr) p dump_class (class1)
Equivalence chain for (reg:SI 105 [ iD.2893 ]):
(reg:SI 105 [ iD.2893 ])
$3 = void
(rr) p dump_class (class2)
Equivalence chain for (const_int 0 [0]):
(const_int 0 [0])
(reg:SI 97 [ _10 ])
(reg:SI 97 [ _10 ])
$4 = void
This happens because the original INSN being recorded is
(insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
(const_vector:V1SI [
(const_int 0 [0])
])) "cse.c":11:9 1363 {*movv1si_internal}
(expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
(nil)))
and we end up generating two equivalences. the first one is simply that
reg:SI 97 is 0. The second one is that 0 can be extracted from the V1SI, so
subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0. This nested subreg gets folded away
to just reg:SI 97 and we re-insert the same equivalence.
This patch changes it so that if the nunits of a subreg is 1 then don't generate
a vec_select from the subreg as the subreg will be folded away and we get a dup.
gcc/ChangeLog:
PR rtl-optimization/103404
* cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
subreg.
gcc/testsuite/ChangeLog:
PR rtl-optimization/103404
* gcc.target/i386/pr103404.c: New test.
When moves between integer and sse registers are cheap.
2021-12-06 Hongtao Liu <Hongtao.liu@intel.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/95740
* config/i386/i386.c (ix86_preferred_reload_class): Allow
integer regs when moves between register units are cheap.
* config/i386/i386.h (INT_SSE_CLASS_P): New.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr95740.c: New test.
This is the original binutils bugzilla report,
https://sourceware.org/bugzilla/show_bug.cgi?id=28509
And this is the first version of the proposed binutils patch,
https://sourceware.org/pipermail/binutils/2021-November/118398.html
After applying the binutils patch, I get the the unexpected error when
building libgcc,
/scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42:
/scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld: relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not be used when making a shared object; recompile with -fPIC
Therefore, this patch add an extra hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead.
The solution is similar to glibc as follows,
https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b
libgcc/ChangeLog:
* config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target it since it is non-preemptible.
* config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and
HIDDEN_DEF.
This moves the GTY declaration of the meta-data indentifier
array into the header that enumerates these and provides
shorthand defines for them. This avoids a problem seen with
a relocatable PCH implementation.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/objc/ChangeLog:
* objc-next-metadata-tags.h (objc_rt_trees): Declare here.
* objc-next-runtime-abi-01.c: Remove from here.
* objc-next-runtime-abi-02.c: Likewise.
* objc-runtime-shared-support.c: Reorder headers, provide
a GTY declaration the definition of objc_rt_trees.
The new builtin machinery has an early exit, so move the AIX-specific
builtins before the new machinery.
gcc/ChangeLog:
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Move
AIX math builtin initialization before new_builtins_are_live.
Implements moste of OpenMP 5.1 atomic extensions,
except that 'compare' is parsed but rejected during
resolution. (As the trans-openmp.c handling is missing.)
gcc/fortran/ChangeLog:
* dump-parse-tree.c (show_omp_clauses): Handle
weak/compare/fail clause.
* gfortran.h (gfc_omp_clauses): Add weak, compare, fail.
* openmp.c (enum omp_mask1, gfc_match_omp_clauses,
OMP_ATOMIC_CLAUSES): Update for new clauses.
(gfc_match_omp_atomic): Update for 5.1 atomic changes.
(is_conversion): Support widening in one go.
(is_scalar_intrinsic_expr): New.
(resolve_omp_atomic): Update for 5.1 atomic changes.
* parse.c (parse_omp_oacc_atomic): Update for compare.
* resolve.c (gfc_resolve_blocks): Update asserts.
* trans-openmp.c (gfc_trans_omp_atomic): Handle new clauses.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/atomic-2.f90: Move now supported code to ...
* gfortran.dg/gomp/atomic.f90: here.
* gfortran.dg/gomp/atomic-10.f90: New test.
* gfortran.dg/gomp/atomic-12.f90: New test.
* gfortran.dg/gomp/atomic-15.f90: New test.
* gfortran.dg/gomp/atomic-16.f90: New test.
* gfortran.dg/gomp/atomic-17.f90: New test.
* gfortran.dg/gomp/atomic-18.f90: New test.
* gfortran.dg/gomp/atomic-19.f90: New test.
* gfortran.dg/gomp/atomic-20.f90: New test.
* gfortran.dg/gomp/atomic-22.f90: New test.
* gfortran.dg/gomp/atomic-24.f90: New test.
* gfortran.dg/gomp/atomic-25.f90: New test.
* gfortran.dg/gomp/atomic-26.f90: New test.
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.1): Update status.
This fixes a -Wuninitialized warning for std::cmatch m1, m2; m1=m2;
Also name the template parameters in the forward declaration, to get rid
of the <template-parameter-1-1> noise in diagnostics.
libstdc++-v3/ChangeLog:
PR libstdc++/103549
* include/bits/regex.h (match_results): Give names to template
parameters in first declaration.
(match_results::_M_begin): Add default member-initializer.
P1272R4 has added to the std::byteswap new stuff to me quite unrelated
clarification for std::bit_cast.
The patch treats it as DR, applying to all languages.
We no longer diagnose if padding bits are stored into unsigned char
or std::byte result, fields or bitfields, instead arrange for that result,
those fields or bitfields to get indeterminate value (empty
CONSTRUCTOR with CONSTRUCTOR_NO_ZEROING or just leaving the member's
initializer out and setting CONSTRUCTOR_NO_ZEROING on parent).
We still have a bug that we don't diagnose in lots of places lvalue-to-rvalue
conversions of indeterminate values or class objects with some indeterminate
members.
2021-12-04 Jakub Jelinek <jakub@redhat.com>
* cp-tree.h (is_byte_access_type_not_plain_char): Declare.
* tree.c (is_byte_access_type_not_plain_char): New function.
* constexpr.c (clear_uchar_or_std_byte_in_mask): New function.
(cxx_eval_bit_cast): Don't error about padding bits if target
type is unsigned char or std::byte, instead return no clearing
ctor. Use clear_uchar_or_std_byte_in_mask.
* g++.dg/cpp2a/bit-cast11.C: New test.
* g++.dg/cpp2a/bit-cast12.C: New test.
* g++.dg/cpp2a/bit-cast13.C: New test.
* g++.dg/cpp2a/bit-cast14.C: New test.
The https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557903.html
change broke the following testcases. The problem is when a pragma
namespace allows expansion (i.e. p->is_nspace && p->allow_expansion),
e.g. the omp or acc namespaces do, then when parsing the second pragma
token we do it with pfile->state.in_directive set,
pfile->state.prevent_expansion clear and pfile->state.in_deferred_pragma
clear (the last one because we don't know yet if it will be a deferred
pragma or not). If the pragma line only contains a single name
and newline after it, and there exists a function-like macro with the
same name, the preprocessor needs to peek in funlike_invocation_p
the next token whether it isn't ( but in this case it will see a newline.
As pfile->state.in_directive is set, we don't read anything after the
newline, pfile->buffer->need_line is set and CPP_EOF is lexed, which
funlike_invocation_p doesn't push back. Because name is a function-like
macro and on the pragma line there is no ( after the name, it isn't
expanded, and control flow returns to do_pragma. If name is valid
deferred pragma, we set pfile->state.in_deferred_pragma (and really
need it set so that e.g. end_directive later on doesn't eat all the
tokens from the pragma line).
Before Nathan's change (which unfortunately didn't contain rationale
on why it is better to do it like that), this wasn't a problem,
next _cpp_lex_direct called when we want next token would return
CPP_PRAGMA_EOF when it saw buffer->need_line, which would turn off
pfile->state.in_deferred_pragma and following get token would already
read the next line. But Nathan's patch replaced it with an assertion
failure that now triggers and CPP_PRAGMA_EOL is done only when lexing
the '\n'. Except for this special case that works fine, but in
this case it doesn't because when peeking the token we still didn't know
that it will be a deferred pragma.
I've tried to fix that up in do_pragma by detecting this and pushing
CPP_PRAGMA_EOL as lookahead, but that doesn't work because end_directive
still needs to see pfile->state.in_deferred_pragma set.
So, this patch affectively reverts part of Nathan's change, CPP_PRAGMA_EOL
addition isn't done only when parsing the '\n', but is now done in both
places, in the first one instead of the assertion failure.
2021-12-04 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/102432
* lex.c (_cpp_lex_direct): If buffer->need_line while
pfile->state.in_deferred_pragma, return CPP_PRAGMA_EOL token instead
of assertion failure.
* c-c++-common/gomp/pr102432.c: New test.
* c-c++-common/goacc/pr102432.c: New test.
When -fif-conversion2 is enabled, we attempt to replace conditional
branches around unconditional traps with conditional traps. That
canonicalizes compares, which may change an immediate that barely fits
into one that doesn't.
The compare for the trap is first checked using the predicates of
cbranch predicates, and then, compare and conditional trap insns are
emitted and recognized.
In the failing s390x testcase, i <=u 0xffff_ffff is canonicalized into
i <u 0x1_0000_0000, and the latter immediate doesn't fit. The insn
predicates (both cbranch and cmpdi_ccu) happily accept it, since the
register allocator has no trouble getting them into registers. The
problem is that ifcvt2 runs after reload, so we recognize the compare
insn successfully, but later on we barf when we find that none of the
constraints fit.
This patch arranges for the trap_if-issuing bits in ifcvt to validate
post-reload insns using a stricter test that also checks that operands
fit the constraints.
for gcc/ChangeLog
PR rtl-optimization/103028
* ifcvt.c (find_cond_trap): Validate new insns more strictly
after reload.
for gcc/testsuite/ChangeLog
PR rtl-optimization/103028
* gcc.dg/pr103028.c: New.