OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Richard Biener	a4066d3a50	tree-optimization/104676 - free nb_iterations after loop distribution Loop distribution can release SSA names used in nb_iterations, make sure to release those. 2022-02-24 Richard Biener <rguenther@suse.de> PR tree-optimization/104676 * tree-loop-distribution.cc (loop_distribution::execute): Do a full scev_reset. * gcc.dg/torture/pr104676.c: New testcase.	2022-02-24 15:57:55 +01:00
Jakub Jelinek	9251b457eb	sccvn: Fix visit_reference_op_call value numbering of vdefs [PR104601] The following testcase is miscompiled, because -fipa-pure-const discovers that bar is const, but when sccvn during fre3 sees # .MEM_140 = VDEF <.MEM_96> __pred$__d_43 = _50 (_49); where _50 value numbers to &bar, it value numbers .MEM_140 to vuse_ssa_val (gimple_vuse (stmt)). For const/pure calls that return a SSA_NAME (or don't have lhs) that is fine, those calls don't store anything, but if the lhs is present and not an SSA_NAME, value numbering the vdef to anything but itself means that e.g. walk_non_aliased_vuses won't consider the call, but the call acts as a store to its lhs. When it is ignored, sccvn will return whatever has been stored to the lhs earlier. I've bootstrapped/regtested an earlier version of this patch, which did the if (!lhs && gimple_call_lhs (stmt)) changed \|= set_ssa_val_to (vdef, vdef); part before else if (vnresult->result_vdef), and that regressed +FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo \\\\(" 1 +FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo2 \\\\(" 1 so this updated patch uses result_vdef there as before and only otherwise (which I think must be the const/pure case) decides based on whether the lhs is non-SSA_NAME. 2022-02-24 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/104601 tree-ssa-sccvn.cc (visit_reference_op_call): For calls with non-SSA_NAME lhs value number vdef to itself instead of e.g. the vuse value number. * g++.dg/torture/pr104601.C: New test.	2022-02-24 15:29:02 +01:00
Tom de Vries	59b8ade887	[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm.c Add openmp test-cases that test the omp declare variant construct: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... using the available nvptx isas. Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do link. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-24 Tom de Vries <tdevries@suse.de> testsuite/libgomp.c/declare-variant-3-sm30.c: New test. * testsuite/libgomp.c/declare-variant-3-sm35.c: New test. * testsuite/libgomp.c/declare-variant-3-sm53.c: New test. * testsuite/libgomp.c/declare-variant-3-sm70.c: New test. * testsuite/libgomp.c/declare-variant-3-sm75.c: New test. * testsuite/libgomp.c/declare-variant-3-sm80.c: New test. * testsuite/libgomp.c/declare-variant-3.h: New header file.	2022-02-24 11:41:03 +01:00
Tom de Vries	a046033ea0	[nvptx] Add missing t-omp-device isas In t-omp-device we list isas that can be used in omp declare variant like so: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... and in nvptx_omp_device_kind_arch_isa we handle them. Update both to reflect the current list of isas. Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Handle sm_70, sm_75 and sm_80. * config/nvptx/t-omp-device: Add sm_53, sm_70, sm_75 and sm_80. Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>	2022-02-24 09:19:01 +01:00
Tom de Vries	c982d02ffe	[nvptx] Add shf.{l,r}.wrap insn Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be used to implement 32-bit left or right rotate. Add define_insns rotlsi3 and rotrsi3. Tested on nvptx. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn "rotrsi3"): New define_insn. gcc/testsuite/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/rotate-run.c: New test. * gcc.target/nvptx/rotate.c: New test.	2022-02-24 09:18:47 +01:00
Tom de Vries	7862f6ccd8	[nvptx] Fix dummy location in gen_comment I committed "[nvptx] Add -mptx-comment", but tested it in combination with the proposed "[final] Handle compiler-generated asm insn" ( https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so by itself the commit introduced some regressions: ... FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr80764.c -O2 (internal compiler error: Segmentation fault) ... There are due to cfun->function_start_locus == 0. Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead. Tested on nvptx. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (gen_comment): Use DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.	2022-02-24 09:17:27 +01:00
liuhongt	ffb2c67170	Fix typo in <code>v1ti3. For evex encoding vp{xor,or,and}, suffix is needed. Or there would be an error for vpxor %xmm0, %xmm31, %xmm1 Error: unsupported instruction `vpxor' gcc/ChangeLog: * config/i386/sse.md (<code>v1ti3): Add suffix and replace isa attr of alternative 2 from avx to avx512vl. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-logicsuffix-1.c: New test.	2022-02-24 09:05:10 +08:00
GCC Administrator	4bf3bac151	Daily bump.	2022-02-24 00:16:22 +00:00
David Malcolm	aee1adf2cd	analyzer: handle __attribute__((const)) [PR104434] When testing -fanalyzer on openblas-0.3, I noticed slightly over 2000 false positives from -Wanalyzer-malloc-leak on code like this: if( LAPACKE_lsame( vect, 'b' ) \|\| LAPACKE_lsame( vect, 'p' ) ) { pt_t = (lapack_complex_float) LAPACKE_malloc( sizeof(lapack_complex_float) ldpt_t * MAX(1,n) ); [...snip...] } [...snip lots of code...] if( LAPACKE_lsame( vect, 'b' ) \|\| LAPACKE_lsame( vect, 'q' ) ) { LAPACKE_free( pt_t ); } where LAPACKE_lsame is a char-comparison function implemented in a different TU. The analyzer naively considers the execution path where: LAPACKE_lsame( vect, 'b' ) \|\| LAPACKE_lsame( vect, 'p' ) is true at the malloc guard, but then false at the free guard, which is thus a memory leak. This patch makes -fanalyer respect __attribute__((const)), so that the analyzer treats such functions as returning the same value when given the same inputs. I've filed https://github.com/xianyi/OpenBLAS/issues/3543 suggesting that LAPACKE_lsame be annotated with __attribute__((const)); with that, and with this patch, the false positives seem to be fixed. gcc/analyzer/ChangeLog: PR analyzer/104434 * analyzer.h (class const_fn_result_svalue): New decl. * region-model-impl-calls.cc (call_details::get_manager): New. * region-model-manager.cc (region_model_manager::get_or_create_const_fn_result_svalue): New. (region_model_manager::log_stats): Log m_const_fn_result_values_map. * region-model.cc (const_fn_p): New. (maybe_get_const_fn_result): New. (region_model::on_call_pre): Handle fndecls with __attribute__((const)) by calling the above rather than making a conjured_svalue. * region-model.h (visitor::visit_const_fn_result_svalue): New. (region_model_manager::get_or_create_const_fn_result_svalue): New decl. (region_model_manager::const_fn_result_values_map_t): New typedef. (region_model_manager::m_const_fn_result_values_map): New field. (call_details::get_manager): New decl. * svalue.cc (svalue::cmp_ptr): Handle SK_CONST_FN_RESULT. (const_fn_result_svalue::dump_to_pp): New. (const_fn_result_svalue::dump_input): New. (const_fn_result_svalue::accept): New. * svalue.h (enum svalue_kind): Add SK_CONST_FN_RESULT. (svalue::dyn_cast_const_fn_result_svalue): New. (class const_fn_result_svalue): New. (is_a_helper <const const_fn_result_svalue >::test): New. (template <> struct default_hash_traits<const_fn_result_svalue::key_t>): New. gcc/testsuite/ChangeLog: PR analyzer/104434 gcc.dg/analyzer/attr-const-1.c: New test. * gcc.dg/analyzer/attr-const-2.c: New test. * gcc.dg/analyzer/attr-const-3.c: New test. * gcc.dg/analyzer/pr104434-const.c: New test. * gcc.dg/analyzer/pr104434-nonconst.c: New test. * gcc.dg/analyzer/pr104434.h: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2022-02-23 18:51:26 -05:00
Marek Polacek	cdcea7c1ef	c++: Add new test [PR79493] A nice side effect of r12-1822 was improving the diagnostic we emit for the following test. PR c++/79493 gcc/testsuite/ChangeLog: * g++.dg/diagnostic/undeclared1.C: New test.	2022-02-23 12:47:24 -05:00
Marek Polacek	9675ecf7f9	c++: Add fixed test [PR70077] Fixed with r10-1280. PR c++/70077 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept76.C: New test.	2022-02-23 12:38:02 -05:00
Richard Biener	fdc46830f1	middle-end/104644 - recursion with bswap match.pd pattern The following patch avoids infinite recursion during generic folding. The (cmp (bswap @0) INTEGER_CST@1) simplification relies on (bswap @1) actually being simplified, if it is not simplified, we just move the bswap from one operand to the other and if @0 is also INTEGER_CST, we apply the same rule next. The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in such cases: static inline bool integer_cst_p (tree t) { return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t); } The patch uses ! modifier to ensure the bswap is simplified and extends support to GENERIC by means of requiring !EXPR_P which is not perfect but a conservative approximation. 2022-02-22 Richard Biener <rguenther@suse.de> PR tree-optimization/104644 * doc/match-and-simplify.texi: Amend ! documentation. * genmatch.cc (expr::gen_transform): Code-generate ! support for GENERIC. (parser::parse_expr): Allow ! for GENERIC. * match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on bswap. * gcc.dg/pr104644.c: New test. Co-Authored-by: Jakub Jelinek <jakub@redhat.com>	2022-02-23 13:51:43 +01:00
Richard Biener	f4ed267fa5	Support SSA name declarations with pointer type Currently we fail to parse int * _3; as SSA name and instead get a VAR_DECL because of the way the C frontends declarator specs work. That causes havoc if those supposed SSA names are used in PHIs or in other places where VAR_DECLs are not allowed. The following fixes the pointer case in an ad-hoc way - for more complex type declarators we probably have to find a way to re-use the C frontend grokdeclarator without actually creating a VAR_DECL there (or maybe make it create an SSA name). Pointers appear too often to be neglected though, thus the following ad-hoc fix for this. This also adds verification that we do not end up with SSA names without definitions as can happen when reducing a GIMPLE testcase. Instead of working through segfaults one-by-one we emit errors for all of those at once now. 2022-02-23 Richard Biener <rguenther@suse.de> gcc/c * gimple-parser.cc (c_parser_parse_gimple_body): Diagnose SSA names without definition. (c_parser_gimple_declaration): Handle pointer typed SSA names. gcc/testsuite/ * gcc.dg/gimplefe-49.c: New testcase. * gcc.dg/gimplefe-error-13.c: Likewise.	2022-02-23 12:15:30 +01:00
Richard Biener	6e80c4f1ad	tree-optimization/101636 - CTOR vectorization ICE The following fixes an ICE when vectorizing the defs of a CTOR results in a different vector type than expected. That can happen with AARCH64 SVE and a fixed vector length as noted in r10-5979 and on x86 with AVX512 mask CTORs and trying to re-vectorize using SSE as shown in this bug. The fix is simply to reject the vectorization when it didn't produce the desired type. 2022-02-23 Richard Biener <rguenther@suse.de> PR tree-optimization/101636 * tree-vect-slp.cc (vect_print_slp_tree): Dump the vector type of the node. (vect_slp_analyze_operations): Make sure the CTOR is vectorized with an expected type. (vectorize_slp_instance_root_stmt): Revert r10-5979 fix. * gcc.target/i386/pr101636.c: New testcase. * c-c++-common/torture/pr101636.c: Likewise.	2022-02-23 12:14:14 +01:00
Jakub Jelinek	c8cb5098c7	warn-recursion: Don't warn for __builtin_calls in gnu_inline extern inline functions [PR104633] The first two testcases show different ways how e.g. the glibc _FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c the new -Winfinite-recursion warning emits a false positive warning. It is a false positive because when a builtin with 2 names is called through the __builtin_ name (but not all builtins have a name prefixed exactly like that) from extern inline function with gnu_inline semantics, it doesn't mean the compiler will ever attempt to use the user inline wrapper for the call, the __builtin_ just does what the builtin function is expected to do and either expands into some compiler generated code, or if the compiler decides to emit a call it will use an actual definition of the function, but that is not the extern inline gnu_inline function which is never emitted out of line. Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline wrapper calls the builtin by the same name as the function's name and in that case it is infinite recursion, we actuall try to inline the recursive call and also error because the recursion is infinite during inlining; without always_inline we wouldn't error but it is still infinite recursion, the user has no control on how many recursive calls we actually inline. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR c/104633 * gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit): Don't warn about calls to corresponding builtin from extern inline gnu_inline wrappers. * gcc.dg/Winfinite-recursion-3.c: New test. * gcc.dg/Winfinite-recursion-4.c: New test. * gcc.dg/Winfinite-recursion-5.c: New test.	2022-02-23 12:03:55 +01:00
Roger Sayle	0677014871	nvptx: Back-end portion of a fix for PR target/104489. This one line fix/tweak is the back-end specific change for a fix for PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped to sm_53. The machine-independent middle-end pieces were posted here: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html 2022-02-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/104489 * config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute.	2022-02-23 07:24:50 +00:00
Christophe Lyon	fd0ab7c734	arm: Fix typo in auto-vectorized MVE comparisons I made a last minute renaming of mve_const_bool_vec_to_hi () into mve_bool_vec_to_const () and forgot to update the call sites in vfp.md accordingly. Committed as obvious. 2022-02-23 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Fix typo.	2022-02-23 06:44:12 +00:00
Cui,Lili	2f0c93326f	x86: Update Intel architectures ISA support in documentation. Since the ISA supported by Intel architectures in the documentation are inconsistent with the actual, modify them all. gcc/Changelog: * doc/invoke.texi: Update documents for Intel architectures.	2022-02-23 10:24:21 +08:00
GCC Administrator	2cfb33fc1e	Daily bump.	2022-02-23 00:16:24 +00:00
Ian Lance Taylor	3d54f1ffaf	libgo: update README.gcc Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387514	2022-02-22 15:34:46 -08:00
Paul A. Clarke	96ee5ce5f8	rs6000: Move g++.dg/ext powerpc tests to g++.target Also adjust DejaGnu directives, as specifically requiring "powerpc--" is no longer required. 2021-02-22 Paul A. Clarke <pc@us.ibm.com> gcc/testsuite g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg directives. * g++.dg/ext/altivec-2.C: Likewise. * g++.dg/ext/altivec-3.C: Likewise. * g++.dg/ext/altivec-4.C: Likewise. * g++.dg/ext/altivec-5.C: Likewise. * g++.dg/ext/altivec-6.C: Likewise. * g++.dg/ext/altivec-7.C: Likewise. * g++.dg/ext/altivec-8.C: Likewise. * g++.dg/ext/altivec-9.C: Likewise. * g++.dg/ext/altivec-10.C: Likewise. * g++.dg/ext/altivec-11.C: Likewise. * g++.dg/ext/altivec-12.C: Likewise. * g++.dg/ext/altivec-13.C: Likewise. * g++.dg/ext/altivec-14.C: Likewise. * g++.dg/ext/altivec-15.C: Likewise. * g++.dg/ext/altivec-16.C: Likewise. * g++.dg/ext/altivec-17.C: Likewise. * g++.dg/ext/altivec-18.C: Likewise. * g++.dg/ext/altivec-cell-1.C: Likewise. * g++.dg/ext/altivec-cell-2.C: Likewise. * g++.dg/ext/altivec-cell-3.C: Likewise. * g++.dg/ext/altivec-cell-4.C: Likewise. * g++.dg/ext/altivec-cell-5.C: Likewise. * g++.dg/ext/altivec-types-1.C: Likewise. * g++.dg/ext/altivec-types-2.C: Likewise. * g++.dg/ext/altivec-types-3.C: Likewise. * g++.dg/ext/altivec-types-4.C: Likewise. * g++.dg/ext/undef-bool-1.C: Likewise.	2022-02-22 17:26:15 -06:00
Harald Anlauf	bc66b471d1	Fortran: skip compile-time shape check if constructor shape is not known gcc/fortran/ChangeLog: PR fortran/104619 * resolve.cc (resolve_structure_cons): Skip shape check if shape of constructor cannot be determined at compile time. gcc/testsuite/ChangeLog: PR fortran/104619 * gfortran.dg/derived_constructor_comps_7.f90: New test.	2022-02-22 21:34:58 +01:00
Roger Sayle	9d1796d82d	Restore bootstrap on x86_64-pc-linux-gnu This patch resolves the bootstrap failure on x86_64-pc-linux-gnu. 2022-02-22 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore bootstrap.	2022-02-22 18:17:24 +00:00
Thomas Schwinge	54f7450232	Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref' Clean-up for commit `e2a58ed6dc` "openacc: Middle-end worker-partitioning support": as of commit `2a3f9f6532` "openacc: Shared memory layout optimisation", we're no longer running into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'. gcc/ * omp-low.cc (omp_build_component_ref): Move function... * omp-general.cc (omp_build_component_ref): ... here. Remove 'static'. * omp-general.h (omp_build_component_ref): Declare function. * omp-oacc-neuter-broadcast.cc (oacc_build_component_ref): Remove function. (build_receiver_ref, build_sender_ref): Call 'omp_build_component_ref' instead.	2022-02-22 17:53:10 +01:00
Thomas Schwinge	0fe9176f41	Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t' Now that I've resolved GCC 'hash_map' issues (a while ago already), we may further simplify this after commit `049eda8274` "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as 'hash_map' Value, directly store 'field_map_t' objects, not pointers to manually allocated 'field_map_t' objects. gcc/ * omp-oacc-neuter-broadcast.cc (record_field_map_t): Further simplify. Adjust all users.	2022-02-22 17:43:39 +01:00
Thomas Schwinge	f8187b5c0d	Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' This was a latent problem, and this commit here now resolves a regression that after recent commit `a78b1ab1df` "amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a GCN offloading '-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC gang-redundant execution.	2022-02-22 17:32:03 +01:00
Segher Boessenkool	537c965880	rs6000: Fix GC on rs6000.c decls for atomic handling (PR88134) In PR88134 it is pointed out that we do not have GTY markup for some variables we use for atomic. So, let's add that. 2022-02-22 Segher Boessenkool <segher@kernel.crashing.org> PR target/88134 * config/rs6000/rs6000.cc (atomic_hold_decl, atomic_clear_decl, atomic_update_decl): Add GTY markup.	2022-02-22 16:20:23 +00:00
Christophe Lyon	e9f8443a91	arm: Add VPR_REG to ALL_REGS VPR_REG should be part of ALL_REGS, this patch fixes this omission. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.	2022-02-22 15:55:09 +00:00
Christophe Lyon	c6b4ea7ab1	arm: Convert more MVE/CDE builtins to predicate qualifiers This patch covers a few non-load/store builtins where we do not use the <mode> iterator and thus we cannot use <MVE_vpred>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use predicate. (CX_BINARY_UNONE_QUALIFIERS): Likewise. (CX_TERNARY_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete. * config/arm/arm_mve_builtins.def: Use predicated qualifiers. * config/arm/mve.md: Use VxBI instead of HI.	2022-02-22 15:55:09 +00:00
Christophe Lyon	6a7c13a0cf	arm: Convert more load/store MVE builtins to predicate qualifiers This patch covers a few builtins where we do not use the <mode> iterator and thus we cannot use <MVE_vpred>. For v2di instructions, we keep the HI mode for predicates. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate qualifier. (STRSBU_P_QUALIFIERS): Likewise. (LDRGBS_Z_QUALIFIERS): Likewise. (LDRGBU_Z_QUALIFIERS): Likewise. (LDRGBWBXU_Z_QUALIFIERS): Likewise. (LDRGBWBS_Z_QUALIFIERS): Likewise. (LDRGBWBU_Z_QUALIFIERS): Likewise. (STRSBWBS_P_QUALIFIERS): Likewise. (STRSBWBU_P_QUALIFIERS): Likewise. * config/arm/mve.md: Use VxBI instead of HI.	2022-02-22 15:55:09 +00:00
Christophe Lyon	724d6566cd	arm: Convert more MVE builtins to predicate qualifiers This patch covers all builtins that have an HI operand and use the <mode> iterator, thus we can replace HI whe <MVE_vpred>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ... (TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this. (TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this. (TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ... (TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this. (QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New. (QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New. (QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this. (STRS_P_QUALIFIERS): Use predicate qualifier. (STRU_P_QUALIFIERS): Likewise. (STRSU_P_QUALIFIERS): Likewise. (STRSS_P_QUALIFIERS): Likewise. (LDRGS_Z_QUALIFIERS): Likewise. (LDRGU_Z_QUALIFIERS): Likewise. (LDRS_Z_QUALIFIERS): Likewise. (LDRU_Z_QUALIFIERS): Likewise. (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (BINOP_NONE_NONE_PRED_QUALIFIERS): New. (BINOP_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm_mve_builtins.def: Use new predicated qualifiers. * config/arm/mve.md: Use MVE_VPRED instead of HI.	2022-02-22 15:55:08 +00:00
Christophe Lyon	e6a4aefce8	arm: Convert remaining MVE vcmp builtins to predicate qualifiers This is mostly a mechanical change, only tested by the intrinsics expansion tests. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS): Delete. (TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ... (TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this. (TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm_mve_builtins.def (vcmpq_n_, vcmpq_m_f): Use new predicated qualifiers. * config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>) (mve_vcmp*q_m_f<mode>): Use MVE_VPRED instead of HI.	2022-02-22 15:55:08 +00:00
Christophe Lyon	df0e57c2c0	arm: Fix vcond_mask expander for MVE (PR target/100757) The problem in this PR is that we call VPSEL with a mask of vector type instead of HImode. This happens because operand 3 in vcond_mask is the pre-computed vector comparison and has vector type. This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE, returning the appropriate VxBI mode when targeting MVE. In turn, this implies implementing vec_cmp<mode><MVE_vpred>, vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and vcond_mask_<mode><v_cmp_result> back to neon.md since they are not used by MVE anymore. The new <MVE_vpred> patterns listed above are implemented in mve.md since they are only valid for MVE. However this may make maintenance/comparison more painful than having all of them in vec-common.md. In the process, we can get rid of the recently added vcond_mve parameter of arm_expand_vector_compare. Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm: Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH iterator added in r12-835 (to have V4HF/V8HF support), as well as the (!<Is_float_mode> \|\| flag_unsafe_math_optimizations) condition which was not present before r12-834 although SF modes were enabled by VDQW (I think this was a bug). Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no longer need to generate vpsel with vectors of 0 and 1: the masks are now merged via scalar 'ands' instructions operating on 16-bit masks after converting the boolean vectors. In addition, this patch fixes a problem in arm_expand_vcond() where the result would be a vector of 0 or 1 instead of operand 1 or 2. Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new arm_mve effective target. Reducing the number of iterations in pr100757-3.c from 32 to 8, we generate the code below: float a[32]; float fn1(int d) { float c = 4.0f; for (int b = 0; b < 8; b++) if (a[b] != 2.0f) c = 5.0f; return c; } fn1: ldr r3, .L3+48 vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0) vldr.64 d5, .L3+8 vldrw.32 q0, [r3] // q0=a(0..3) adds r3, r3, #16 vcmp.f32 eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0) vldrw.32 q1, [r3] // q1=a(4..7) vmrs r3, P0 vcmp.f32 eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0) vmrs r2, P0 @ movhi ands r3, r3, r2 // r3=select(a(0..3]) & select(a(4..7)) vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0) vldr.64 d5, .L3+24 vmsr P0, r3 vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0) vldr.64 d7, .L3+40 vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0) vmov.32 r2, q3[1] // keep the scalar max vmov.32 r0, q3[3] vmov.32 r3, q3[2] vmov.f32 s11, s12 vmov s15, r2 vmov s14, r3 vmaxnm.f32 s15, s11, s15 vmaxnm.f32 s15, s15, s14 vmov s14, r0 vmaxnm.f32 s15, s15, s14 vmov r0, s15 bx lr .L4: .align 3 .L3: .word 1073741824 // 2.0f .word 1073741824 .word 1073741824 .word 1073741824 .word 1084227584 // 5.0f .word 1084227584 .word 1084227584 .word 1084227584 .word 1082130432 // 4.0f .word 1082130432 .word 1082130432 .word 1082130432 This patch adds tests that trigger an ICE without this fix. The pr100757.c testcases are derived from gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using various types and return values different from 0 and 1 to avoid commonalization with boolean masks. In addition, since we should not need these masks, the tests make sure they are not present. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> PR target/100757 gcc/ * config/arm/arm-protos.h (arm_get_mask_mode): New prototype. (arm_expand_vector_compare): Update prototype. * config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New. (arm_vector_mode_supported_p): Add support for VxBI modes. (arm_expand_vector_compare): Remove useless generation of vpsel. (arm_expand_vcond): Fix select operands. (arm_get_mask_mode): New. * config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New. (vec_cmpu<mode><MVE_vpred>): New. (vcond_mask_<mode><MVE_vpred>): New. * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ... * config/arm/neon.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here and disable for MVE. * doc/sourcebuild.texi (arm_mve): Document new effective-target. gcc/testsuite/ PR target/100757 * gcc.target/arm/simd/pr100757-2.c: New. * gcc.target/arm/simd/pr100757-3.c: New. * gcc.target/arm/simd/pr100757-4.c: New. * gcc.target/arm/simd/pr100757.c: New. * gcc.dg/signbit-2.c: Skip when targeting ARM/MVE. * lib/target-supports.exp (check_effective_target_arm_mve): New.	2022-02-22 15:55:07 +00:00
Christophe Lyon	91224cf625	arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates We make use of qualifier_predicate to describe MVE builtins prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins, as they are exercised by the tests added earlier in the series. Special handling is needed for mve_vpselq because it has a v2di variant, which has no natural VPR.P0 representation: we keep HImode for it. The vector_compare expansion code is updated to use the right VxBI mode instead of HI for the result. We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns to use the new MVE_7_HI iterator which covers HI and the new VxBI modes, in conjunction with the new DB constraint for a constant vector of booleans. This patch also adds tests derived from the one provided in PR target/101325: there is a compile-only test because I did not have access to anything that could execute MVE code until recently. I have been able to add an executable test since QEMU supports MVE. Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does. This ensures arm_mve_hw passes even if the toolchain does not generate MVE code by default. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS) (BINOP_PRED_NONE_NONE_QUALIFIERS) (TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS) (TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm-protos.h (mve_bool_vec_to_const): New. * config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI modes. (arm_mode_to_pred_mode): New. (arm_expand_vector_compare): Use the right VxBI mode instead of HI. (arm_expand_vcond): Likewise. (simd_valid_immediate): Handle MODE_VECTOR_BOOL. (mve_bool_vec_to_const): New. (neon_make_constant): Call mve_bool_vec_to_const when needed. * config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_) (vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f) (vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u) (vpselq_s, vpselq_f): Use new predicated qualifiers. * config/arm/constraints.md (DB): New. * config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators. (MVE_VPRED, MVE_vpred): New attribute iterators. * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>) (@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>) (@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI. (@mve_vpselq_<supf>v2di): Define separately. (mov<mode>): New expander for VxBI modes. * config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use MVE_7_HI iterator and add support for DB constraint. gcc/testsuite/ PR target/100757 PR target/101325 * gcc.dg/rtl/arm/mve-vxbi.c: New test. * gcc.target/arm/simd/pr101325.c: New. * gcc.target/arm/simd/pr101325-2.c: New. * lib/target-supports.exp (check_effective_target_arm_mve_hw): Use add_options_for_arm_v8_1m_mve_fp.	2022-02-22 15:55:07 +00:00
Christophe Lyon	884f77b422	arm: Implement MVE predicates as vectors of booleans This patch implements support for vectors of booleans to support MVE predicates, instead of HImode. Since the ABI mandates pred16_t (aka uint16_t) to represent predicates in intrinsics prototypes, we introduce a new "predicate" type qualifier so that we can map relevant builtins HImode arguments and return value to the appropriate vector of booleans (VxBI). We have to update test_vector_ops_duplicate, because it iterates using an offset in bytes, where we would need to iterate in bits: we stop iterating when we reach the end of the vector of booleans. In addition, we have to fix the underlying definition of vectors of booleans because ARM/MVE needs a different representation than AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the element size, so that a true element of V4BI is represented by '0b1111'. This patch updates the aarch64 definition of VNxBI as needed. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ PR target/100757 PR target/101325 config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI, VNx2BI): Update definition. * config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new simd types. (arm_init_builtin): Map predicate vectors arguments to HImode. (arm_expand_builtin_args): Move HImode predicate arguments to VxBI rtx. Move return value to HImode rtx. * config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate. * config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes. * config/arm/arm-simd-builtin-types.def (Pred1x16_t, Pred2x8_t,Pred4x4_t): New. * emit-rtl.cc (init_emit_once): Handle all boolean modes. * genmodes.cc (mode_data): Add boolean field. (blank_mode): Initialize it. (make_complex_modes): Fix handling of boolean modes. (make_vector_modes): Likewise. (VECTOR_BOOL_MODE): Use new COMPONENT parameter. (make_vector_bool_mode): Likewise. (BOOL_MODE): New. (make_bool_mode): New. (emit_insn_modes_h): Fix generation of boolean modes. (emit_class_narrowest_mode): Likewise. * machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT parameter. Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to define BImode. * rtx-vector-builder.cc (rtx_vector_builder::find_cached_value): Fix handling of constm1_rtx for VECTOR_BOOL. * simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL. (native_decode_vector_rtx): Likewise. (test_vector_ops_duplicate): Skip vec_merge test with vectors of booleans. * varasm.cc (output_constant_pool_2): Likewise.	2022-02-22 15:55:07 +00:00
Christophe Lyon	0d0aaea105	arm: Fix mve_vmvnq_n_<supf><mode> argument mode The vmvnq_n* intrinsics and have [u]int[16\|32]_t arguments, so use <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode for operand 1.	2022-02-22 15:55:06 +00:00
Christophe Lyon	6769084fdf	arm: Add support for VPR_REG in arm_class_likely_spilled_p VPR_REG is the only register in its class, so it should be handled by TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling default_class_likely_spilled_p. No test fails without this patch, but it seems it should be implemented. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.	2022-02-22 15:55:06 +00:00
Christophe Lyon	bf3e36fbf1	arm: Add GENERAL_AND_VPR_REGS regclass At some point during the development of this patch series, it appeared that in some cases the register allocator wants “VPR or general” rather than “VPR or general or FP” (which is the same thing as ALL_REGS). The series does not seem to require this anymore, but it seems to be a good thing to do anyway, to give the register allocator more freedom. CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a regression in gcc.dg/stack-usage-1.c when compiled with -mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (CLASS_MAX_NREGS): Handle VPR. * config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.	2022-02-22 15:55:06 +00:00
Christophe Lyon	7b1cce9273	arm: Add new tests for comparison vectorization with Neon and MVE This patch mainly adds Neon tests similar to existing MVE ones, to make sure we do not break Neon when fixing MVE. mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional with 2.0f and 3.0f constants to help scan-assembler-times. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/testsuite/ * gcc.target/arm/simd/mve-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-compare-1.c: New. * gcc.target/arm/simd/neon-compare-2.c: New. * gcc.target/arm/simd/neon-compare-3.c: New. * gcc.target/arm/simd/neon-compare-scalar-1.c: New. * gcc.target/arm/simd/neon-vcmp-f16.c: New. * gcc.target/arm/simd/neon-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-vcmp-f32-3.c: New. * gcc.target/arm/simd/neon-vcmp-f32.c: New. * gcc.target/arm/simd/neon-vcmp.c: New.	2022-02-22 15:55:05 +00:00
Christophe Lyon	39c0b8f1ac	MAINTAINERS: Update my email address. * MAINTAINERS (Write After Approval): Update my e-mail address.	2022-02-22 15:55:05 +00:00
Tom de Vries	5ed77fb3ed	[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end Consider the following omp fragment. ... #pragma omp target #pragma omp parallel num_threads (2) #pragma omp task ; ... This hangs at -O0 for nvptx. Investigating the behaviour gives us the following trace of events: - both threads execute GOMP_task, where they: - deposit a task, and - execute gomp_team_barrier_wake - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread, proceeds to wait at the team barrier - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it calls gomp_barrier_handle_tasks, where it: - executes both tasks and marks the team barrier done - executes a gomp_team_barrier_wake which wakes up thread 1 - thread 1 exits the team barrier - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at the team barrier. - thread 0 hangs. To understand why there is a hang here, it's good to understand how things are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is a copy of the libgomp/config/linux/bar.c implementation, with uses of both futex_wake and do_wait replaced with uses of ptx insn bar.sync: ... if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); ... The point where thread 0 goes to wait at the team barrier, corresponds in the linux implementation with a do_wait. In the linux case, the call to do_wait doesn't hang, because it's waiting for bar->generation to become a certain value, and if bar->generation already has that value, it just proceeds, without any need for coordination with other threads. In the nvtpx case, the bar.sync waits until thread 1 joins it in the same logical barrier, which never happens: thread 1 is lingering in the thread pool at the thread pool barrier (using a different logical barrier), waiting to join a new team. The easiest way to fix this is to revert to the posix implementation for bar.{c,h}. That however falls back on a busy-waiting approach, and does not take advantage of the ptx bar.sync insn. Instead, we revert to the linux implementation for bar.c, and implement bar.c local functions futex_wait and futex_wake using the bar.sync insn. The bar.sync insn takes an argument specifying how many threads are participating, and that doesn't play well with the futex syntax where it's not clear in advance how many threads will be woken up. This is solved by waking up all waiting threads each time a futex_wait or futex_wake happens, and possibly going back to sleep with an updated thread count. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2021-04-20 Tom de Vries <tdevries@suse.de> PR target/99555 * config/nvptx/bar.c (generation_to_barrier): New function, copied from config/rtems/bar.c. (futex_wait, futex_wake): New function. (do_spin, do_wait): New function, copied from config/linux/wait.h. (gomp_barrier_wait_end, gomp_barrier_wait_last) (gomp_team_barrier_wake, gomp_team_barrier_wait_end): (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove and replace with include of config/linux/bar.c. * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock. (gomp_barrier_init): Init new fields. * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific workarounds. * testsuite/libgomp.c/pr99555-1.c: Same. * testsuite/libgomp.fortran/task-detach-6.f90: Same.	2022-02-22 15:48:03 +01:00
Tobias Burnus	bd73d8dd31	nvptx: Add -misa=sm_70 Add -misa=sm_70, and use it to specify the misa value in test-case gcc.target/nvptx/atomic-store-2.c. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Likewise. * config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70. gcc/testsuite/ChangeLog: 2022-02-22 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70. * gcc.target/nvptx/uniform-simt-3.c: Same. Co-Authored-By: Tom de Vries <tdevries@suse.de>	2022-02-22 15:38:55 +01:00
Patrick Palka	5e1b17f038	libstdc++: Implement P2415R2 changes to viewable_range / views::all This implements the wording changes in P2415R2 "What is a view?", which is a DR for C++20. libstdc++-v3/ChangeLog: * include/bits/ranges_base.h (__detail::__is_initializer_list): Define. (viewable_range): Adjust as per P2415R2. * include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value. * include/std/ranges (owning_view): Define as per P2415R2. (enable_borrowed_range<owning_view>): Likewise. (views::__detail::__can_subrange): Replace with ... (views::__detail::__can_owning_view): ... this. (views::_All::_S_noexcept): Sync with operator(). (views::_All::operator()): Use owning_view instead of subrange as per P2415R2. * include/std/version (__cpp_lib_ranges): Adjust value. * testsuite/std/ranges/adaptors/all.cc (test06): Adjust now that views::all uses owning_view instead of subrange. (test08): New test. * testsuite/std/ranges/adaptors/lazy_split.cc (test09): Adjust now that rvalue non-view non-borrowed ranges are viewable. * testsuite/std/ranges/adaptors/split.cc (test06): Likewise.	2022-02-22 09:37:58 -05:00
Tobias Burnus	bc91cb8d8c	nvptx: Add -mptx=6.0 Currently supported internally are 3.1, 6.0, 6.3 and 7.0. However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0. Add -mptx=6.0 for consistency. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0. * doc/invoke.texi (-mptx): Update for new values and defaults. Co-Authored-By: Tom de Vries <tdevries@suse.de>	2022-02-22 14:57:28 +01:00
Tom de Vries	c2b23aaaf4	[nvptx] Add -mptx-comment Add functionality that indicates which insns are added by -minit-regs, such that for instance we have for pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... Can be switched off using -mno-ptx-comment. Tested on nvptx. gcc/ChangeLog: 2022-02-21 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (gen_comment): New function. (workaround_uninit_method_1, workaround_uninit_method_2) (workaround_uninit_method_3): : Use gen_comment. * config/nvptx/nvptx.opt (mptx-comment): New option.	2022-02-22 14:51:59 +01:00
Richard Biener	d669237f7d	Dump def that we use for a splat This makes the SLP vectorizer dump the def we use for a splat to aid debugging. 2022-02-22 Richard Biener <rguenther@suse.de> * tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used for a splat.	2022-02-22 14:28:02 +01:00
Roger Sayle	2ef0e75d0b	Implement constant-folding simplifications of reductions. This patch addresses a code quality regression in GCC 12 by implementing some constant folding/simplification transformations for REDUC_PLUS_EXPR in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with -O2 -ffast-math (with vectorization now enabled) gets optimized to: float f (float x) { vector(4) float vect_x_14.11; vector(4) float _2; float _32; _2 = {x_9(D), 0.0, 0.0, 0.0}; vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 }; _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call] return _32; } With these proposed new transformations, we can simplify the above code even further. float f (float x) { float _32; _32 = x_9(D) + 1.36e+2; return _32; } [which happens to match what we'd produce with -fno-tree-vectorize, and with GCC 11]. 2022-02-22 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * fold-const.cc (ctor_single_nonzero_element): New function to return the single non-zero element of a (vector) constructor. * fold-const.h (ctor_single_nonzero_element): Prototype here. * match.pd (reduc (constructor@0)): Simplify reductions of a constructor containing a single non-zero element. (reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify reductions of vector operations of the same operator with constant vector operands. gcc/testsuite/ChangeLog * gcc.dg/fold-reduc-1.c: New test case.	2022-02-22 12:32:22 +00:00
Jakub Jelinek	2f59f06761	libiberty: Fix up debug.temp.o creation if .o has 64K+ sections [PR104617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive).	2022-02-22 11:33:45 +01:00
Jakub Jelinek	d44dc131f4	ranger: Fix up REALPART_EXPR/IMAGPART_EXPR handling [PR104604] The following testcase is miscompiled since r12-3328. That change assumed that if rhs1 of a GIMPLE_ASSIGN is COMPLEX_CST, then that is the value of the lhs of the stmt, but that is not the case always, only if it is a GIMPLE_SINGLE_RHS stmt. If it is e.g. GIMPLE_UNARY_RHS or GIMPLE_BINARY_RHS (the latter happens in the testcase), then it can be e.g. __complex__ (3, 0) / var and the REALPART_EXPR of that isn't 3, but the realpart of the division. I assume once the ranger can do complex numbers adjust_part_expr will just fetch one or the other range from a underlying complex range, but until then, we should limit this to what r12-3328 meant to do. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/104604 gimple-range-fold.cc (adjust_imagpart_expr, adjust_realpart_expr): Only check if gimple_assign_rhs1 is COMPLEX_CST if gimple_assign_rhs_code is COMPLEX_CST. * gcc.c-torture/execute/pr104604.c: New test.	2022-02-22 10:43:13 +01:00
Jakub Jelinek	7e691189ca	i386: Fix up copysign/xorsign expansion [PR104612] We ICE on the following testcase for -m32 since r12-3435. because operands[2] is (subreg:SF (reg:DI ...) 0) and lowpart_subreg (V4SFmode, operands[2], SFmode) returns NULL, and that is what we use in AND etc. insns we emit. My earlier version of the patch fixes that by calling force_reg for the input operands, to make sure they are really REGs and so lowpart_subreg will succeed on them - even for theoretical MEMs using REGs there seems desirable, we don't want to read following memory slots for the paradoxical subreg. For the outputs, I thought we'd get better code by always computing result into a new pseudo and them move lowpart of that pseudo into dest. Unfortunately it regressed FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps on which the patch changes: vandps .LC0(%rip), %xmm1, %xmm1 - vxorps %xmm0, %xmm1, %xmm0 + vxorps %xmm0, %xmm1, %xmm1 + vmovaps %xmm1, %xmm0 ret The RA sees: (insn 8 4 9 2 (set (reg:V4SF 85) (and:V4SF (subreg:V4SF (reg:SF 90) 0) (mem/u/c:V4SF (symbol_ref/u:DI (".LC0") [flags 0x2]) [0 S16 A128]))) "pr89984-2.c":7:12 2838 {andv4sf3} (expr_list:REG_DEAD (reg:SF 90) (nil))) (insn 9 8 10 2 (set (reg:V4SF 87) (xor:V4SF (reg:V4SF 85) (subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842 {xorv4sf3} (expr_list:REG_DEAD (reg:SF 89) (expr_list:REG_DEAD (reg:V4SF 85) (nil)))) (insn 10 9 14 2 (set (reg:SF 82 [ <retval> ]) (subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142 {movsf_internal} (expr_list:REG_DEAD (reg:V4SF 87) (nil))) (insn 14 10 15 2 (set (reg/i:SF 20 xmm0) (reg:SF 82 [ <retval> ])) "pr89984-2.c":8:1 142 {movsf_internal} (expr_list:REG_DEAD (reg:SF 82 [ <retval> ]) (nil))) (insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1 (nil)) and doesn't know that if it would use xmm0 not just for pseudo 82 but also for pseudo 87, it could create a noop move in insn 10 and so could avoid an extra register copy and nothing later on is able to figure that out either. I don't know how the RA should know that though. So that we don't regress, this version of the patch will do this stuff (i.e. use fresh vector pseudo as destination and then move lowpart of that to dest) over what it used before (i.e. use paradoxical subreg of the dest) only if lowpart_subreg returns NULL. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR target/104612 config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg on input operands before calling lowpart_subreg on it. For output operand, use a vmode pseudo as destination and then move its lowpart subreg into operands[0] if lowpart_subreg fails on dest. (ix86_expand_xorsign): Likewise. * gcc.dg/pr104612.c: New test.	2022-02-22 10:38:37 +01:00

... 2 3 4 5 6 ...

191931 Commits