Fortran: Fix attributes and bounds in ISO_Fortran_binding.
2021-07-26 José Rui Faustino de Sousa <jrfsousa@gmail.com>
Tobias Burnus <tobias@codesourcery.com>
PR fortran/93308
PR fortran/93963
PR fortran/94327
PR fortran/94331
PR fortran/97046
gcc/fortran/ChangeLog:
* trans-decl.c (convert_CFI_desc): Only copy out the descriptor
if necessary.
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Updated attribute
handling which reflect a previous intermediate version of the
standard. Only copy out the descriptor if necessary.
libgfortran/ChangeLog:
* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Add code
to verify the descriptor. Correct bounds calculation.
(gfc_desc_to_cfi_desc): Add code to verify the descriptor.
gcc/testsuite/ChangeLog:
* gfortran.dg/ISO_Fortran_binding_1.f90: Add pointer attribute,
this test is still erroneous but now it compiles.
* gfortran.dg/bind_c_array_params_2.f90: Update regex to match
code changes.
* gfortran.dg/PR93308.f90: New test.
* gfortran.dg/PR93963.f90: New test.
* gfortran.dg/PR94327.c: New test.
* gfortran.dg/PR94327.f90: New test.
* gfortran.dg/PR94331.c: New test.
* gfortran.dg/PR94331.f90: New test.
* gfortran.dg/PR97046.f90: New test.
VRP simplifies conditionals involving casted values outside of the main
folding mechanism, because this optimization inhibits the VRP jump
threader from threading through the comparison.
As part of replacing VRP with an evrp instance, I am making sure we do
everything VRP does. Hence, I am abstracting this functionality out so
we can call it from from elsewhere.
ISTM that when the proposed ranger-based jump threader can handle
everything the forward threader does, there will be no need for this
optimization to be done outside of the evrp folder. Perhaps we can fold
this into the substitute_using_ranges class. But that's further down
the line.
Also, there is no need to pass a vr_values around, when the base
range_query class will do. I fixed this, at it makes it trivial to pass
down a ranger or evrp instance.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-vrp.c (vrp_simplify_cond_using_ranges): Rename vr_values
with range_query.
(execute_vrp): Abstract out simplification of conditionals...
(simplify_casted_conds): ...here.
I have changed the use of the array_bounds_checker in VRP to use a
ranger in my local tree to make sure there are no regressions when using
either VRP or the ranger. In doing so I noticed that the checker
does not pass context to get_value_range, which causes the ranger to miss a
few cases. This patch fixes the oversight.
Tested on x86-64 Linux using the array bounds checker both with VRP and
the ranger.
gcc/ChangeLog:
* gimple-array-bounds.cc (array_bounds_checker::get_value_range):
Add gimple argument.
(array_bounds_checker::check_array_ref): Same.
(array_bounds_checker::check_addr_expr): Same.
(array_bounds_checker::check_array_bounds): Pass statement to
check_array_bounds and check_addr_expr.
* gimple-array-bounds.h (check_array_bounds): Add gimple argument.
(check_addr_expr): Same.
(get_value_range): Same.
The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.
The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.
However operand[3] is not expected to be written to so the current pattern is
bogus.
Instead I rewrite the RTL to be in canonical ordering and merge them.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
(sdot_prod, udot_prod): ... This.
* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): Merged
into...
(<sur>dot_prod<vsi2qi>): ... this.
(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
Change operands order.
(<sur>sadv16qi): Use new operands order.
* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
vdotq_s32): Use new RTL ordering.
There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.
This means we need different patterns here. This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.
There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
aarch64_types_ternop_suss_qualifiers): New.
* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.
This patch adds support for expressing the section and scan directives
using the attribute syntax and additionally fixes some bugs in the attribute
syntax directive handling.
For now it requires that the scan and section directives appear as the only
attribute, not combined with other OpenMP or non-OpenMP attributes on the same
statement.
2021-07-26 Jakub Jelinek <jakub@redhat.com>
* parser.h (struct cp_lexer): Add orphan_p member.
* parser.c (cp_parser_statement): Don't change in_omp_attribute_pragma
upon restart from CPP_PRAGMA handling. Fix up condition when a lexer
should be destroyed and adjust saved_tokens if it records tokens from
the to be destroyed lexer.
(cp_parser_omp_section_scan): New function.
(cp_parser_omp_scan_loop_body): Use it. If
parser->lexer->in_omp_attribute_pragma, allow optional comma
after scan.
(cp_parser_omp_sections_scope): Use cp_parser_omp_section_scan.
* g++.dg/gomp/attrs-1.C: Use attribute syntax even for section
and scan directives.
* g++.dg/gomp/attrs-2.C: Likewise.
* g++.dg/gomp/attrs-6.C: New test.
* g++.dg/gomp/attrs-7.C: New test.
* g++.dg/gomp/attrs-8.C: New test.
This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.
I've also tested -traditional-cpp.
include/ChangeLog:
* ansidecl.h: Check if __cplusplus is defined before checking
the value of __cpp_constexpr and __cplusplus. Don't check
__STDC_VERSION__ in C++.
gcc/fortran/ChangeLog:
PR fortran/101536
* check.c (array_check): Adjust check for the case of CLASS
arrays.
gcc/testsuite/ChangeLog:
PR fortran/101536
* gfortran.dg/pr101536.f90: New test.
Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
This expression code is used in only one context: as the
destination operand of a 'set' expression. In addition, the
operand of this expression must be a non-paradoxical 'subreg'
expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.
The following patch fixes it by ensuring the requirement is satisfied.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.
* gcc.c-torture/compile/pr101562.c: New test.
Now that all dependencies of array_bounds_checker take a range_query, we
can sever the relationship with vr_values. Changing this will allow us
to use the array_bounds_checker with VRP, evrp, or the ranger.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-array-bounds.h (class array_bounds_checker): Change
ranges type to range_query.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x2
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.
gcc/ChangeLog:
2021-07-23 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst1_s64_x2): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_oi one vector at a time.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x3
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x3 intrinsics.
gcc/ChangeLog:
2021-07-23 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst1_s64_x3): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Don't return hard register in ix86_gen_scratch_sse_rtx when LRA is in
progress to avoid ICE when there are no available hard registers for
LRA.
gcc/
PR target/101504
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): Don't return
hard register when LRA is in progress.
gcc/testsuite/
PR target/101504
* gcc.target/i386/pr101504.c: New test.
The <future> header only needs std::atomic_flag, so can include
<bits/atomic_base.h> instead of the whole of <atomic>.
libstdc++-v3/ChangeLog:
* include/std/future: Include <bits/atomic_base.h> instead of
<atomic>.
Use __builtin_memcpy to copy vector structures instead of using a
union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.
gcc/ChangeLog:
2021-07-21 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst1_s8_x4): Use
__builtin_memcpy instead of using a union.
(vst1q_s8_x4): Likewise.
(vst1_s16_x4): Likewise.
(vst1q_s16_x4): Likewise.
(vst1_s32_x4): Likewise.
(vst1q_s32_x4): Likewise.
(vst1_u8_x4): Likewise.
(vst1q_u8_x4): Likewise.
(vst1_u16_x4): Likewise.
(vst1q_u16_x4): Likewise.
(vst1_u32_x4): Likewise.
(vst1q_u32_x4): Likewise.
(vst1_f16_x4): Likewise.
(vst1q_f16_x4): Likewise.
(vst1_f32_x4): Likewise.
(vst1q_f32_x4): Likewise.
(vst1_p8_x4): Likewise.
(vst1q_p8_x4): Likewise.
(vst1_p16_x4): Likewise.
(vst1q_p16_x4): Likewise.
(vst1_s64_x4): Likewise.
(vst1_u64_x4): Likewise.
(vst1_p64_x4): Likewise.
(vst1q_s64_x4): Likewise.
(vst1q_u64_x4): Likewise.
(vst1q_p64_x4): Likewise.
(vst1_f64_x4): Likewise.
(vst1q_f64_x4): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst2[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.
gcc/ChangeLog:
2021-07-21 Jonathan Wrightt <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vst2_u64): Likewise.
(vst2_f64): Likewise.
(vst2_s8): Likewise.
(vst2_p8): Likewise.
(vst2_s16): Likewise.
(vst2_p16): Likewise.
(vst2_s32): Likewise.
(vst2_u8): Likewise.
(vst2_u16): Likewise.
(vst2_u32): Likewise.
(vst2_f16): Likewise.
(vst2_f32): Likewise.
(vst2_p64): Likewise.
(vst2q_s8): Likewise.
(vst2q_p8): Likewise.
(vst2q_s16): Likewise.
(vst2q_p16): Likewise.
(vst2q_s32): Likewise.
(vst2q_s64): Likewise.
(vst2q_u8): Likewise.
(vst2q_u16): Likewise.
(vst2q_u32): Likewise.
(vst2q_u64): Likewise.
(vst2q_f16): Likewise.
(vst2q_f32): Likewise.
(vst2q_f64): Likewise.
(vst2q_p64): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst3[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst3q intrinsics.
gcc/ChangeLog:
2021-07-21 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst3_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_ci one vector
at a time.
(vst3_u64): Likewise.
(vst3_f64): Likewise.
(vst3_s8): Likewise.
(vst3_p8): Likewise.
(vst3_s16): Likewise.
(vst3_p16): Likewise.
(vst3_s32): Likewise.
(vst3_u8): Likewise.
(vst3_u16): Likewise.
(vst3_u32): Likewise.
(vst3_f16): Likewise.
(vst3_f32): Likewise.
(vst3_p64): Likewise.
(vst3q_s8): Likewise.
(vst3q_p8): Likewise.
(vst3q_s16): Likewise.
(vst3q_p16): Likewise.
(vst3q_s32): Likewise.
(vst3q_s64): Likewise.
(vst3q_u8): Likewise.
(vst3q_u16): Likewise.
(vst3q_u32): Likewise.
(vst3q_u64): Likewise.
(vst3q_f16): Likewise.
(vst3q_f32): Likewise.
(vst3q_f64): Likewise.
(vst3q_p64): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst4[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.
gcc/ChangeLog:
2021-07-20 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_xi one vector
at a time.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbx4
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbl[34]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbx[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbx[234] intrinsics.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: New tests.
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbl[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: New test.
The comments in <bits/stl_relops.h> describe problems that were solved
years ago (for GCC 3.1). The comparison operators in <iterator> are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.
The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/stl_relops.h: Update documentation comments.
Now that the C++ FE supports these attributes, but not through registering
them in the attributes tables (they work quite differently from other
attributes), this teaches c_common_has_attributes about those.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name also
on attr_id. Return 1 for omp::directive or omp::sequence in C++11
and later.
* c-c++-common/gomp/attrs-1.c: New test.
* c-c++-common/gomp/attrs-2.c: New test.
* c-c++-common/gomp/attrs-3.c: New test.
The OpenMP 5.1 spec says that the attribute and pragma syntax directives
should not be mixed on the same statement. The following patch adds diagnostic
for that,
[[omp::directive (...)]]
#pragma omp ...
is always an error and for the other order
#pragma omp ...
[[omp::directive (...)]]
it depends on whether the pragma directive is an OpenMP construct
(then it is an error because it needs a structured block or loop
or statement as body) or e.g. a standalone directive (then it is fine).
Only block scope is handled for now though, namespace scope and class scope
still needs implementing even the basic support.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP__START_ and
PRAGMA_OMP__LAST_ enumerators.
gcc/cp/
* parser.h (struct cp_parser): Add omp_attrs_forbidden_p member.
* parser.c (cp_parser_handle_statement_omp_attributes): Diagnose
mixing of attribute and pragma syntax directives when seeing
omp::directive if parser->omp_attrs_forbidden_p or if attribute syntax
directives are followed by OpenMP pragma.
(cp_parser_statement): Clear parser->omp_attrs_forbidden_p after
the cp_parser_handle_statement_omp_attributes call.
(cp_parser_omp_structured_block): Add disallow_omp_attrs argument,
if true, set parser->omp_attrs_forbidden_p.
(cp_parser_omp_scan_loop_body, cp_parser_omp_sections_scope): Pass
false as disallow_omp_attrs to cp_parser_omp_structured_block.
(cp_parser_omp_parallel, cp_parser_omp_task): Set
parser->omp_attrs_forbidden_p.
gcc/testsuite/
* g++.dg/gomp/attrs-4.C: New test.
* g++.dg/gomp/attrs-5.C: New test.
CFI_allocate and CFI_select_part were incorrectly treating
CFI_type_signed_char as a Fortran character type for the purpose of
deciding whether or not to use the elem_len argument. It is a Fortran
integer type per table 18.2 in the 2018 Fortran standard.
Other functions in ISO_Fortran_binding.c appeared to handle this case
correctly already.
2021-07-15 Sandra Loosemore <sandra@codesourcery.com>
libgfortran/
* runtime/ISO_Fortran_binding.c (CFI_allocate): Don't use elem_len
for CFI_type_signed_char.
(CFI_select_part): Likewise.
When I added the new mixin to _Hashtable, I forgot to explicitly
construct it in each non-default constructor. That means you can't
use any constructors unless all three of the hash function, equality
function, and allocator are all default constructible.
libstdc++-v3/ChangeLog:
PR libstdc++/101583
* include/bits/hashtable.h (_Hashtable): Replace mixin with
_Enable_default_ctor. Construct it explicitly in all
non-forwarding, non-defaulted constructors.
* testsuite/23_containers/unordered_map/cons/default.cc: Check
non-default constructors can be used.
* testsuite/23_containers/unordered_set/cons/default.cc:
Likewise.
The problem here is we try to an initialized value
from a scalar constant. For vectors we need to do
a vect_dup instead. This fixes that issue by using
build_{one,zero}_cst instead of integer_{one,zero}_node
when calling create_tailcall_accumulator.
Changes from v1:
* v2: Use build_{one,zero}_cst and get the correct type before.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/10153
* tree-tailcall.c (create_tailcall_accumulator):
Don't call fold_convert as the type should be correct already.
(tree_optimize_tail_calls_1): Use build_{one,zero}_cst instead
of integer_{one,zero}_node for the call of create_tailcall_accumulator.
gcc/testsuite/ChangeLog:
PR tree-optimization/10153
* gcc.c-torture/compile/pr10153-1.c: New test.
* gcc.c-torture/compile/pr10153-2.c: New test.
Fix non_null_ref::adjust_range so it always adjust ranges, not just
varying ranges. This will allow pointers that have a range, but are not
necessarily non-null, to be adjusted.
gcc/ChangeLog:
* gimple-range-cache.cc (non_null_ref::adjust_range): Replace
varying_p check for null/non-null check.
AIX math.h provides C++ overloaded inlined math functions, which should
not be present for G++. The definitions have been guaded by
__COMPATMATH__, but that macro had other uses in IBM xlC++. A new
macro has been introduced with the sole purpose of guarding the functions.
This patch updates libstdc++ os_defines.h to define the additional macro.
The earlier macro definition is retained to guard the functions in the
math.h header of earlier AIX releases.
libstdc++-v3/ChangeLog:
* config/os/aix/os_defines.h (__LIBC_NO_CPP_MATH_OVERLOADS__): Define.
Clang provides __builtin_operator_new and __builtin_operator_delete,
which have the same semantics as ::operator new and ::operator delete
except that the compiler is allowed to elide calls to them. This changes
std::allocator to use those built-in functions so that memory allocated
by std::allocator can be optimized away when using Clang. This avoids an
abstraction penalty for using std::allocator to allocate storage rather
than a new-expression.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/94295
* include/ext/new_allocator.h (_GLIBCXX_OPERATOR_NEW)
(_GLIBCXX_OPERATOR_DELETE, _GLIBCXX_SIZED_DEALLOC): Define.
(allocator::allocate, allocator::deallocate): Use new macros.
Make the ranges::uninitialized_xxx algorithms use std::addressof to
protect against iterator types that overload operator&.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101571
* include/bits/ranges_uninitialized.h (_DestroyGuard): Change
constructor parameter to reference and use addressof.
* testsuite/util/testsuite_iterators.h: Define deleted operator&
overloads for test iterators.
The std::function::swap member swaps each data member unconditionally,
resulting in -Wmaybe-uninitialized warnings for a default constructed
object. This happens because the _M_invoker and _M_functor members are
only initialized if the function has a target.
This change ensures that all subobjects are zero-initialized on
construction.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/std_function.h (_Function_base): Add
default member initializers and define constructor as defaulted.
(function::_M_invoker): Add default member initializer.
As the PR points out, we removed the debug version of std::array without
any period of deprecation. Although std::array contains all the actual
debug checks now, removing the <debug/arrray> header breaks any code
that was using that explicitly. The manual still lists doing that as
supported.
This restores the <debug/array> header, but simply defines
__gnu_debug::array as an alias for std::array, and declares the alias
with the deprecated attribute. The docs are updated to match.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/100682
* doc/xml/manual/debug_mode.xml: Update documentation about
debug capability of std::array.
* doc/html/*: Regenerate.
* include/debug/array: New file.
Don't trap if equivalences are processed out of DOM order, and aren't
completely symmetrical. We will eventually resolve this, but its OK for now.
gcc/
PR tree-optimization/101511
* value-relation.cc (relation_oracle::query_relation): Check if ssa1
is in ssa2's equiv set, and don't trap if so.
gcc/testsuite/
* g++.dg/pr101511.C: New.
Eevntually all functionality will be subsumed. Until then, call it only
if needed.
gcc/
PR tree-optimization/101496
* vr-values.c (simplify_using_ranges::fold_cond): Call range_of_stmt
first, then vrp_visit_cond_Stmt.
gcc/testsuite
* gcc.dg/pr101496.c: New.
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.
Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.
benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
The performance diff
strategy : cycles
memory : 1046611188
memory : 1255420817
memory : 1044720793
memory : 1253414145
average : 1097868397
broadcast : 1044430688
broadcast : 1044477630
broadcast : 1253554603
broadcast : 1044561934
average : 1096756213
But however broadcast has larger size.
the size diff
size broadcast.o
text data bss dec hex filename
137 0 0 137 89 broadcast.o
size memory.o
text data bss dec hex filename
115 0 0 115 73 memory.o
gcc/ChangeLog:
* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.