OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Richard Biener	8865133614	tree-optimization/103188 - avoid running ranger on not-up-to-date SSA The following splits loop header copying into an analysis phase that uses ranger and a transform phase that can do without to avoid running ranger on IL that has SSA form not updated. 2021-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/103188 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Remove query parameter, split out check for size optimization. (ch_base::m_ranger, cb_base::m_query): Remove. (ch_base::copy_headers): Split processing loop into analysis around which we allocate and use ranger and transform where we do not. (pass_ch::execute): Do not allocate/free ranger here. (pass_ch_vect::execute): Likewise. * gcc.dg/torture/pr103188.c: New testcase.	2021-11-11 15:01:26 +01:00
Jan Hubicka	6e30c48120	Fix recursion discovery in ipa-pure-const We make self recursive functions as looping of fear of endless recursion. This is done correctly for local pure/const and for non-trivial SCCs in callgraph, but for trivial SCCs we miss the flag. I think it is bad decision since infinite recursion will run out of stack, but changing it upsets some testcases and should be done independently. So this patch is fixing current behaviour to be consistent. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-pure-const.c (propagate_pure_const): Self recursion is a side effects.	2021-11-11 14:39:19 +01:00
Jan Hubicka	61396dfb2a	Fix noreturn discovery. Fix ipa-pure-const handling of noreturn flags. It is not safe to set it for interposable symbols and we should also set it for aliases (just like we do for other flags). This patch merely copies other flag handling and implements it here. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * cgraph.c (set_noreturn_flag_1): New function. (cgraph_node::set_noreturn_flag): New member function * cgraph.h (cgraph_node::set_noreturn_flags): Declare. * ipa-pure-const.c (pass_local_pure_const::execute): Use it.	2021-11-11 14:35:10 +01:00
Patrick Palka	e106221db2	c++: use auto_vec in cp_parser_template_argument_list gcc/cp/ChangeLog: * parser.c (cp_parser_template_argument_list): Use auto_vec instead of manual memory management.	2021-11-11 08:10:20 -05:00
Jakub Jelinek	fa4fcb111a	libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.	2021-11-11 13:57:31 +01:00
Aldy Hernandez	3e5a190533	Resolve entry loop condition for the edge remaining in the loop. There is a known failure for gfortran.dg/vector_subscript_1.f90. It was previously failing for all optimization levels except -Os. Getting the loop header copying right, now makes it fail for all levels :-). Tested on x86-64 Linux. Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve statically to the edge remaining in the loop.	2021-11-11 13:17:32 +01:00
Richard Biener	a5fed4063f	middle-end/103181 - fix operation_could_trap_p for vector division For integer vector division we only checked for all zero vector constants rather than checking whether any element in the constant vector is zero. 2021-11-11 Richard Biener <rguenther@suse.de> PR middle-end/103181 * tree-eh.c (operation_could_trap_helper_p): Properly check vector constants for a zero element for integer division. Separate floating point and integer division code. * gcc.dg/torture/pr103181.c: New testcase.	2021-11-11 10:32:51 +01:00
Jakub Jelinek	10db757301	dwarf2out: Fix up field_byte_offset [PR101378] For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code to deal with it since many years ago (see it e.g. in GCC 3.2, although it used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints). But that code apparently isn't able to cope with members with empty class types with [[no_unique_address]] attribute, because the empty classes have non-zero type size but zero decl size and so one can end up from the computation with negative offset or offset 1 byte smaller than it should be. For !PCC_BITFIELD_TYPE_MATTERS, we just use tree_result = byte_position (decl); which seems exactly right even for the empty classes or anything which is not a bitfield (and for which we don't add DW_AT_bit_offset attribute). So, instead of trying to handle those no_unique_address members in the current already very complicated code, this limits it to bitfields. stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE. As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because DECL_BIT_FIELD might be cleared for some bitfields with bitsizes multiple of BITS_PER_UNIT and e.g. struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s; struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t; int main () { s.c = 0x55; s.d = 0xaaaa; t.c = 0x55; t.d = 0xaaaa; s.e++; } has different debug info with DECL_BIT_FIELD check. 2021-11-11 Jakub Jelinek <jakub@redhat.com> PR debug/101378 * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS handling only for DECL_BIT_FIELD_TYPE decls. * g++.dg/debug/dwarf2/pr101378.C: New test.	2021-11-11 10:16:45 +01:00
Prathamesh Kulkarni	145be5efaf	[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr. gcc/ChangeLog: PR target/102376 * config/aarch64/aarch64.c (aarch64_process_target_attr): Check if token is arch extension without leading '+' and emit appropriate diagnostic for the same. gcc/testsuite/ChangeLog: PR target/102376 * gcc.target/aarch64/pr102376.c: New test.	2021-11-11 14:40:21 +05:30
Jakub Jelinek	48d7327f2a	openmp: Add support for 2 argument num_teams clause In OpenMP 5.1, num_teams clause can accept either one expression as before, but it in that case changed meaning, rather than create <= expression teams it is now create == expression teams. Or it accepts two expressions separated by :, with the meaning that the first is low bound and second upper bound on how many teams should be created. The other ways to set number of teams are upper bounds with lower bound of 1. The following patch does parsing of this for C/C++. For host teams, we actually don't need to do anything further right now, we always create (pretend to create) exactly the requested number of teams, so we can just evaluate and throw away the lower bound for now. For teams nested in target, we don't guarantee that though and further work will be needed. In particular, omplower now turns the teams part of: struct S { S (); S (const S &); ~S (); int s; }; void bar (S &, S &); int baz (); _Pragma ("omp declare target to (baz)"); void foo (void) { S a, b; #pragma omp target private (a) map (b) { #pragma omp teams firstprivate (b) num_teams (baz ()) { bar (a, b); } } } into: retval.0 = baz (); retval.1 = retval.0; { unsigned int retval.3; struct S * D.2549; struct S b; retval.3 = (unsigned int) retval.1; D.2549 = .omp_data_i->b; S::S (&b, D.2549); #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a) __builtin_GOMP_teams (retval.3, 0); { bar (&a, &b); } S::~S (&b); #pragma omp return(nowait) } IMHO we want a new API, say GOMP_teams3 which will take 3 arguments instead of 2 (the lower and upper bounds from num_teams and thread_limit) and will return a bool whether it should do the teams body or not. And, we should add right before outermost {} above while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0)) and remove the __builtin_GOMP_teams call. The current function performs exit equivalent (at least on NVPTX) which seems bad because that means the destructors of e.g. private variables on target aren't invoked, and at the current placement neither destructors of the already constructed privatized variables in teams. I'll do this next on the compiler side, but I'm afraid I'll need help with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be able to use %ctaid.x . I think ideal would be to use a .shared integer variable for the omp_get_team_num value, but I don't have any experience with that, are .shared variables zero initialized by default, or do they have random value at start? PTX docs say they aren't initializable. 2021-11-11 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ... (OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this. (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define. * tree.c (omp_clause_num_ops): Increase num ops for OMP_CLAUSE_NUM_TEAMS to 2. * tree-pretty-print.c (dump_omp_clause): Print optional lower bound for OMP_CLAUSE_NUM_TEAMS. * gimplify.c (gimplify_scan_omp_clauses): Gimplify OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL. (optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. * omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. * omp-expand.c (expand_teams_call, get_target_arguments): Likewise. gcc/c/ * c-parser.c (c_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/cp/ * parser.c (cp_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause. * pt.c (tsubst_omp_clauses): Likewise. (tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. gcc/testsuite/ * c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression to half of the num_teams clauses. * c-c++-common/gomp/num-teams-1.c: New test. * c-c++-common/gomp/num-teams-2.c: New test. * g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression to half of the num_teams clauses. * g++.dg/gomp/attrs-2.C (bar): Likewise. * g++.dg/gomp/num-teams-1.C: New test. * g++.dg/gomp/num-teams-2.C: New test. libgomp/ * testsuite/libgomp.c-c++-common/teams-1.c: New test.	2021-11-11 09:42:47 +01:00
Richard Biener	0136f25ac0	Remove find_pdom and find_dom This removes now useless wrappers around get_immediate_dominator. 2021-11-11 Richard Biener <rguenther@suse.de> * cfganal.c (find_pdom): Remove. (control_dependences::find_control_dependence): Remove special-casing of entry block, call get_immediate_dominator directly. * gimple-predicate-analysis.cc (find_pdom): Remove. (find_dom): Likewise. (find_control_equiv_block): Call get_immediate_dominator directly. (compute_control_dep_chain): Likewise. (predicate::init_from_phi_def): Likewise.	2021-11-11 09:20:15 +01:00
Richard Biener	a11afa7af8	Apply TLC to control dependence compute This makes the control dependence compute avoid a find_edge and optimizes allocation by embedding the bitmap head into the vector of control dependences instead of allocating all of them. It also uses a local bitmap obstack. The bitmap changes make it necessary to shuffle some includes. 2021-11-10 Richard Biener <rguenther@suse.de> * cfganal.h (control_dependences::control_dependence_map): Embed bitmap_head. (control_dependences::m_bitmaps): New. * cfganal.c (control_dependences::set_control_dependence_map_bit): Adjust. (control_dependences::clear_control_dependence_bitmap): Likewise. (control_dependences::find_control_dependence): Do not find_edge for the abnormal edge test. (control_dependences::control_dependences): Instead do not add abnormal edges to the edge list. Adjust. (control_dependences::~control_dependences): Likewise. (control_dependences::get_edges_dependent_on): Likewise. * function-tests.c: Include bitmap.h. gcc/analyzer/ * supergraph.cc: Include bitmap.h. gcc/c/ * gimple-parser.c: Shuffle bitmap.h include.	2021-11-11 09:19:49 +01:00
Kewen Lin	a97fdde627	rs6000/doc: Rename future cpu with power10 Commmit `5d9d0c9458` renamed future to power10 and `ace60939fd` updated the documentation for "future" renaming. This patch is to rename the remaining "future architecture" references in documentation and polish the words for float128. gcc/ChangeLog: * doc/invoke.texi: Change references to "future cpu" to "power10", "-mcpu=future" to "-mcpu=power10". Adjust words for float128.	2021-11-10 19:59:18 -06:00
Cui,Lili	4f442a3bcb	x86: Update -mtune=alderlake Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and UMONITOR/UMWAIT/TPAUSE are supported. gcc/ChangeLog * config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake from m_CORE_AVX2. (processor_cost_table): Use alderlake_cost for Alderlake. * config/i386/i386.c (ix86_sched_init_global): Handle Alderlake. * config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake cost. * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake issue rate to 4. (ix86_adjust_cost): Handle Alderlake. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_MEMORY_MISMATCH_STALL): Likewise. (X86_TUNE_USE_LEAVE): Likewise. (X86_TUNE_PUSH_MEMORY): Likewise. (X86_TUNE_USE_INCDEC): Likewise. (X86_TUNE_INTEGER_DFMODE_MOVES): Likewise. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise. (X86_TUNE_USE_SAHF): Likewise. (X86_TUNE_USE_BT): Likewise. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise. (X86_TUNE_ONE_IF_CONV_INSN): Likewise. (X86_TUNE_AVOID_MFENCE): Likewise. (X86_TUNE_USE_SIMODE_FIOP): Likewise. (X86_TUNE_EXT_80387_CONSTANTS): Likewise. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_SSE_TYPELESS_STORES): Likewise. (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise. (X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise. (X86_TUNE_USE_GATHER): Disable for Alderlake. (X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise. (X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.	2021-11-11 09:28:23 +08:00
liuhongt	e166cada08	Extend vpcmov to handle V8HF/V16HFmode under TARGET_XOP. gcc/ChangeLog: PR target/103151 * config/i386/sse.md (V_128_256): Extend to V8HF/V16HF. (avxsizesuffix): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr103151.c: New test.	2021-11-11 09:25:53 +08:00
Kito Cheng	402d28998f	RISC-V: Fix wrong zifencei handling in riscv_subset_list::to_string This issue cause zifencei never correctly appended on the ISA string. gcc/ChangeLog * common/config/riscv/riscv-common.c (riscv_subset_list::to_string): Fix wrong marco checking.	2021-11-11 08:46:53 +08:00
GCC Administrator	8d36a0d288	Daily bump.	2021-11-11 00:16:28 +00:00
Aldy Hernandez	e82c382971	Allow loop header copying when first iteration condition is known. As discussed in the PR, the loop header copying pass avoids doing so when optimizing for size. However, sometimes we can determine the loop entry conditional statically for the first iteration of the loop. This patch uses the path solver to determine the outgoing edge out of preheader->header->xx. If so, it allows header copying. Doing this in the loop optimizer saves us from doing gymnastics in the threader which doesn't have the context to determine if a loop transformation is profitable. I am only returning true in entry_loop_condition_is_static for a true conditional. Technically a false conditional is also provably static, but allowing any boolean value causes a regression in gfortran.dg/vector_subscript_1.f90. I would have preferred not passing around the query object, but the layout of pass_ch and should_duplicate_loop_header_p make it a bit awkward to get it right without an outright refactor to the pass. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102906 * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New. (should_duplicate_loop_header_p): Call entry_loop_condition_is_static. (class ch_base): Add m_ranger and m_query. (ch_base::copy_headers): Pass m_query to entry_loop_condition_is_static. (pass_ch::execute): Allocate and deallocate m_ranger and m_query. (pass_ch_vect::execute): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr102906.c: New test.	2021-11-10 23:13:27 +01:00
Andrew Pinski	c744ae0897	[COMMITTED] aarch64: [PR103170] Fix aarch64_simd_dup<mode> The problem here is aarch64_simd_dup<mode> use the vw iterator rather than vwcore iterator. This causes problems for the V4SF and V2DF modes. I changed both of aarch64_simd_dup<mode> patterns to be consistent. Committed as obvious after a bootstrap/test on aarch64-linux-gnu. PR target/103170 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>): Use vwcore iterator for the r constraint output string. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/vector-dup-1.c: New test.	2021-11-10 22:06:23 +00:00
Harald Anlauf	abc2f01914	Fortran: avoid NULL pointer dereferences CLASS(), PARAMETER is not yet properly implemented in gfortran. Using it in declarations could lead to subsequent NULL pointer dereferences during checking or simplification of expressions involving those CLASS variables. gcc/fortran/ChangeLog: PR fortran/103137 PR fortran/103138 * check.c (gfc_check_shape): Avoid NULL pointer dereference on missing ref. * simplify.c (gfc_simplify_cshift): Avoid NULL pointer dereference when shape not set. (gfc_simplify_transpose): Likewise.	2021-11-10 20:30:27 +01:00
H.J. Lu	b83705b477	Add a testcase for PR tree-optimization/102892 PR tree-optimization/102892 is fixed by commit `4b3a325f07` Author: Aldy Hernandez <aldyh@redhat.com> Date: Thu Oct 28 15:35:21 2021 +0200 Remove VRP threader passes in exchange for better threading pre-VRP. PR tree-optimization/102892 * gcc.dg/pr102892-1.c: New file. * gcc.dg/pr102892-2.c: Likewise.	2021-11-10 11:27:38 -08:00
Martin Sebor	7c8a416da8	Adjust test to avoid target-specific failures [PR103161]. Resolves: PR testsuite/103161 - Better ranges cause builtin-sprintf-warn-16.c failure gcc/testsuite: PR testsuite/103161 * gcc.dg/tree-ssa/builtin-sprintf-warn-16.c: Avoid relying on argument evaluation order. Cast width and precision to signed to avoid undefined behavior.	2021-11-10 11:39:35 -07:00
Qing Zhao	1c04af34c9	Apply pattern initialization only when have_insn_for return true. For -ftrivial-auto-var-init=pattern, initialize the variable with patterns only when have_insn_for (SET, mode) return true. Otherwise initialize it with zeros. with this change, _Complex long double on X86 is initialized to zero for pattern initialization. gcc/ChangeLog: 2021-11-10 qing zhao <qing.zhao@oracle.com> * internal-fn.c (expand_DEFERRED_INIT): Apply pattern initialization only when have_insn_for return true for the mode. Fix a memory leak. gcc/testsuite/ChangeLog: 2021-11-10 qing zhao <qing.zhao@oracle.com> * gcc.target/i386/auto-init-6.c: _Complex long double is initialized to zero now with -ftrivial-auto-var-init=pattern.	2021-11-10 17:59:31 +00:00
Christophe Lyon	1200e211a8	arm: Initialize vector costing fields The movi, dup and extract costing fields were recently added to struct vector_cost_table, but there initialization is missing for the arm (aarch32) specific descriptions. Although the arm port does not use these fields (only aarch64 does), this is causing warnings during the build, and even build failures when using gcc-4.8.5 as host compiler: /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::movi' }; ^ /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::movi' [-Wmissing-field-initializers] /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::dup' /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::dup' [-Wmissing-field-initializers] /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::extract' /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::extract' [-Wmissing-field-initializers] This patch uses the same initialization values as in aarch64 for consistency: + COSTS_N_INSNS (1), /* movi. / + COSTS_N_INSNS (2), / dup. / + COSTS_N_INSNS (2) / extract. / 2021-11-10 Christophe Lyon <christophe.lyon@foss.st.com> gcc/ config/arm/arm.c (cortexa9_extra_costs, cortexa8_extra_costs, cortexa5_extra_costs, cortexa7_extra_costs, cortexa12_extra_costs, cortexa15_extra_costs, v7m_extra_costs): Initialize movi, dup and extract costing fields.	2021-11-10 16:58:50 +00:00
Aldy Hernandez	b0c83d59f4	path solver: Adjustments for use outside of the backward threader. Here are some enhancements to make it easier for other clients to use the path solver. First, I've made the imports to the solver optional since we can calculate them ourselves. However, I've left the ability to set them, since the backward threader adds a few SSA names in addition to the default ones. As a follow-up I may move all the import set up code from the threader to the solver, as the extra imports tend to improve the behavior slightly. Second, Richi suggested an entry point where you just feed the solver an edge, which will be quite convenient for a subsequent patch adding a client in the header copying pass. The required some shuffling, since we'll be adding the blocks on the fly. There's now a vector copy, but the impact will be minimal, since these are just 5-6 entries at the most. Tested on ppc64le Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): Do not init m_path. (path_range_query::dump): Change m_path uses to non-pointer. (path_range_query::defined_outside_path): Same. (path_range_query::set_path): Same. (path_range_query::add_copies_to_imports): Same. (path_range_query::range_of_stmt): Same. (path_range_query::compute_outgoing_relations): Same. (path_range_query::compute_ranges): Imports are now optional. Implement overload that takes an edge. * gimple-range-path.h (class path_range_query): Make imports optional for compute_ranges. Add compute_ranges(edge) overload. Make m_path an auto_vec instead of a pointer and adjust accordingly.	2021-11-10 17:45:01 +01:00
Tamar Christina	86ffc845b2	AArch64: do not keep negated mask and inverse mask live at the same time The following example: void f11(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? w[i] : y[i]; } } Generates currently: ptrue p2.b, all ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p2/z, z0.d, #0.0 bic p3.b, p2/z, p0.b, p1.b ld1d z1.d, p3/z, [x3, x2, lsl 3] and after the previous patches generates: ptrue p3.b, all ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p0/z, z0.d, #0.0 fcmgt p2.d, p3/z, z0.d, #0.0 not p1.b, p0/z, p1.b ld1d z1.d, p1/z, [x3, x2, lsl 3] where a duplicate comparison is performed for w[i] > 0. This is because in the vectorizer we're emitting a comparison for both a and ~a where we just need to emit one of them and invert the other. After this patch we generate: ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p0/z, z0.d, #0.0 mov p2.b, p1.b not p1.b, p0/z, p1.b ld1d z1.d, p1/z, [x3, x2, lsl 3] In order to perform the check I have to fully expand the NOT stmts when recording them as the SSA names for the top level expressions differ but their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not the operands in _34. But we only do this when the operation is an ordered one because mixing ordered and unordered expressions can lead to de-optimized code. Note: This patch series is working incrementally towards generating the most efficient code for this and other loops in small steps. The mov is created by postreload when it does a late CSE. gcc/ChangeLog: * tree-vectorizer.h (struct scalar_cond_masked_key): Add inverted_p. (default_hash_traits<scalar_conf_masked_key>): Likewise. * tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask is live. * tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree): Register mask inverses. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred-not-gen-1.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-2.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-3.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-4.c: Update testcase.	2021-11-10 16:03:18 +00:00
Tamar Christina	8ed62c929c	middle-end: Add an RPO pass after successful vectorization Following my current SVE predicate optimization series a problem has presented itself in that the way vector masks are generated for masked operations relies on CSE to share masks efficiently. The issue however is that masking is done using the & operand and & is associative and so reassoc decides to reassociate the masked operations. This makes CSE then unable to CSE an unmasked and a masked operation leading to duplicate operations being performed. To counter this we want to add an RPO pass over the vectorized loop body when vectorization succeeds. This makes it then no longer reliant on the RTL level CSE. I have not added a testcase for this as it requires the changes in my patch series, however the entire series relies on this patch to work so all the tests there cover it. gcc/ChangeLog: * tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon successful vectorization.	2021-11-10 15:58:15 +00:00
Andrew MacLeod	eaec20fde5	Grow sbr_vector in ranger's on-entry cache as needed. The on-entry cache does not expect the number of BBs to change. This could happen in various scenarios, recently in the suggestion to use ranger with loop unswitching and also with a work in progress to use the path solver in the loopch pass. This patch fixes both. This is a patch from Andrew, who tested it on x86-64 Linux. gcc/ChangeLog: * gimple-range-cache.cc (sbr_vector::grow): New. (sbr_vector::set_bb_range): Call grow. (sbr_vector::get_bb_range): Same. (sbr_vector::bb_range_p): Remove assert.	2021-11-10 16:51:30 +01:00
Tamar Christina	5ba247ade1	AArch64: Remove shuffle pattern for rounding variant. This removed the patterns to optimize the rounding shift and narrow. The optimization is valid only for the truncating rounding shift and narrow, for the rounding shift and narrow we need a different pattern that I will submit separately. This wasn't noticed before as the benchmarks did not run conformance as part of the run, which we now do and this now passes again. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_topbits_shuffle<mode>_le ,aarch64_topbits_shuffle<mode>_be): Remove. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shrn-combine-8.c: Update. * gcc.target/aarch64/shrn-combine-9.c: Update.	2021-11-10 15:10:09 +00:00
Jan Hubicka	992644c351	Extend modref by side-effect analysis Make modref to also collect info whether function has side effects. This allows pure/const function detection and also handling functions which do store some memory in similar way as we handle pure/consts now. The code is symmetric to what ipa-pure-const does. Modref is actually more capable on proving that a given function is pure/const (since it understands that non-pure function can be called when it only modifies data on stack) so we could retire ipa-pure-const's pure-const discovery at some point. However this patch only does the anlaysis - the consumers of this flag will come next. Bootstrapped/regtested x86_64-linux. I plan to commit it later today if there are no complains. gcc/ChangeLog: * ipa-modref.c: Include tree-eh.h (modref_summary::modref_summary): Initialize side_effects. (struct modref_summary_lto): New bool field side_effects. (modref_summary_lto::modref_summary_lto): Initialize side_effects. (modref_summary::dump): Dump side_effects. (modref_summary_lto::dump): Dump side_effects. (merge_call_side_effects): Merge side effects. (process_fnspec): Calls to non-const/pure or looping function is a side effect. (analyze_call): Self-recursion is a side-effect; handle special builtins. (analyze_load): Watch for volatile and throwing memory. (analyze_store): Likewise. (analyze_stmt): Watch for volatitle asm. (analyze_function): Handle side_effects. (modref_summaries::duplicate): Duplicate side_effects. (modref_summaries_lto::duplicate): Likewise. (modref_write): Stream side_effects. (read_section): Likewise. (update_signature): Update. (propagate_unknown_call): Handle side_effects. (modref_propagate_in_scc): Likewise. * ipa-modref.h (struct modref_summary): Add side_effects. * ipa-pure-const.c (special_builtin_state): Rename to ... (builtin_safe_for_const_function_p): ... this one. (check_call): Update. (finite_function_p): Break out from ... (propagate_pure_const): ... here * ipa-utils.h (finite_function): Declare.	2021-11-10 16:00:40 +01:00
Jan Hubicka	a5c9b9bc2b	Fix typo in modref-13.c gcc/testsuite/ChangeLog: 2021-11-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/tree-ssa/modref-13.c: Fix typo.	2021-11-10 15:48:47 +01:00
Lucas A. M. Magalhaes	9598134a05	rs6000: Remove LINK_OS_EXTRA_SPEC{32,64} from --with-advance-toolchain Historically this was added to fill gaps from ld.so.cache on early AT releases. This now are just causing errors and rework. Since AT5.0 the AT's ld.so is using a correctly configured ld.so.cache and sets the DT_INTERP to AT's ld.so. This two factors are sufficient for an AT builded program to get the correct libraries. GCC congured with --with-advance-toolchain has issues building GlibC releases because it adds DT_RUNPATH to ld.so and that's unsupported. 2021-11-10 Lucas A. M. Magalhães <lamm@linux.ibm.com> gcc/ * config.gcc (powerpc--*): Remove -rpath from --with-advance-toolchain.	2021-11-10 14:32:09 +00:00
Marek Polacek	a1ad0d84d7	attribs: Implement -Wno-attributes=vendor::attr [PR101940] It is desirable for -Wattributes to warn about e.g. [[deprecate]] void g(); // typo, should warn However, -Wattributes also warns about vendor-specific attributes (that's because lookup_scoped_attribute_spec -> find_attribute_namespace finds nothing), which, with -Werror, causes grief. We don't want the -Wattributes warning for [[company::attr]] void f(); GCC warns because it doesn't know the "company" namespace; it only knows the "gnu" and "omp" namespaces. We could entirely disable warning about attributes in unknown scopes but then the compiler would also miss typos like [[company::attrx]] void f(); or [[gmu::warn_used_result]] int write(); so that is not a viable solution. A workaround is to use a #pragma: #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wattributes" [[company::attr]] void f() {} #pragma GCC diagnostic pop but that's a mouthful and awkward to use and could also hide typos. In fact, any macro-based solution doesn't seem like a way forward. This patch implements -Wno-attributes=, which takes these arguments: company::attr company:: This option should go well with using @file: the user could have a file containing -Wno-attributes=vendor::attr1,vendor::attr2 and then invoke gcc with '@attrs' or similar. I've also added a new pragma which has the same effect: The pragma along with the new option should help with various static analysis tools. PR c++/101940 gcc/ChangeLog: * attribs.c (struct scoped_attributes): Add a bool member. (lookup_scoped_attribute_spec): Forward declare. (register_scoped_attributes): New bool parameter, defaulted to false. Use it. (handle_ignored_attributes_option): New function. (free_attr_data): New function. (init_attributes): Call handle_ignored_attributes_option. (attr_namespace_ignored_p): New function. (decl_attributes): Check attr_namespace_ignored_p before warning. * attribs.h (free_attr_data): Declare. (register_scoped_attributes): Adjust declaration. (handle_ignored_attributes_option): Declare. (canonicalize_attr_name): New function template. (canonicalize_attr_name): Use it. * common.opt (Wattributes=): New option with a variable. * doc/extend.texi: Document #pragma GCC diagnostic ignored_attributes. * doc/invoke.texi: Document -Wno-attributes=. * opts.c (common_handle_option) <case OPT_Wattributes_>: Handle. * plugin.h (register_scoped_attributes): Adjust declaration. * toplev.c (compile_file): Call free_attr_data. gcc/c-family/ChangeLog: * c-pragma.c (handle_pragma_diagnostic): Handle #pragma GCC diagnostic ignored_attributes. gcc/testsuite/ChangeLog: * c-c++-common/Wno-attributes-1.c: New test. * c-c++-common/Wno-attributes-2.c: New test. * c-c++-common/Wno-attributes-3.c: New test.	2021-11-10 09:17:13 -05:00
Przemyslaw Wirkus	9701f153f6	arm: enable cortex-a710 CPU This patch is adding support for Cortex-A710 CPU in Arm. gcc/ChangeLog: * config/arm/arm-cpus.in (cortex-a710): New CPU. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * doc/invoke.texi: Update docs.	2021-11-10 14:11:09 +00:00
Andre Vieira	03f7843c3f	[AArch64] Fix bootstrap failure due to missing ATTRIBUTE_UNUSED gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin): Mark argument as unused.	2021-11-10 12:58:10 +00:00
Martin Liska	c905e72471	lto-wrapper: fix memory corruption. The first argument of merge_and_complain is actually vector where we merge options and it should be propagated to caller properly. Fixes: ==6656== Invalid read of size 8 ==6656== at 0x408056: merge_and_complain (lto-wrapper.c:335) ==6656== by 0x408056: find_and_merge_options(int, long, char const, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>, char const) (lto-wrapper.c:1139) ==6656== by 0x408AFC: run_gcc(unsigned int, char) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) ==6656== Address 0x4e69b18 is 344 bytes inside a block of size 1,768 free'd ==6656== at 0x484339F: realloc (vg_replace_malloc.c:1192) ==6656== by 0x4993C0: xrealloc (xmalloc.c:181) ==6656== by 0x406A82: reserve<cl_decoded_option> (vec.h:290) ==6656== by 0x406A82: reserve (vec.h:1858) ==6656== by 0x406A82: vec<cl_decoded_option, va_heap, vl_ptr>::safe_push(cl_decoded_option const&) [clone .isra.0] (vec.h:1967) ==6656== by 0x4077E0: merge_and_complain (lto-wrapper.c:457) ==6656== by 0x4077E0: find_and_merge_options(int, long, char const, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>, char const) (lto-wrapper.c:1139) ==6656== by 0x408AFC: run_gcc(unsigned int, char*) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) ==6656== Block was alloc'd at ==6656== at 0x483E70F: malloc (vg_replace_malloc.c:380) ==6656== by 0x4993D7: xrealloc (xmalloc.c:179) ==6656== by 0x407476: reserve<cl_decoded_option> (vec.h:290) ==6656== by 0x407476: reserve (vec.h:1858) ==6656== by 0x407476: reserve_exact (vec.h:1878) ==6656== by 0x407476: create (vec.h:1893) ==6656== by 0x407476: get_options_from_collect_gcc_options(char const, char const) (lto-wrapper.c:163) ==6656== by 0x407674: find_and_merge_options(int, long, char const, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>, char const) (lto-wrapper.c:1132) ==6656== by 0x408AFC: run_gcc(unsigned int, char*) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) gcc/ChangeLog: lto-wrapper.c (merge_and_complain): Make the first argument a reference type.	2021-11-10 13:43:24 +01:00
Richard Sandiford	6d331688fc	aarch64: Tweak FMAX/FMIN iterators There was some duplication between the maxmin_uns (uns for unspec rather than unsigned) int attribute and the optab int attribute. The difficulty for FMAXNM and FMINNM is that the instructions really correspond to two things: the smax/smin optabs for floats (used only for fast-math-like flags) and the fmax/fmin optabs (used for built-in functions). The optab attribute was consistently for the former but maxmin_uns had a mixture of both. This patch renames maxmin_uns to fmaxmin and only uses it for the fmax and fmin optabs. The reductions that previously used the maxmin_uns attribute now use the optab attribute instead. FMAX and FMIN are awkward in that they don't correspond to any optab. It's nevertheless useful to define them alongside the “real” optabs. Previously they were known as “smax_nan” and “smin_nan”, but the problem with those names it that smax and smin are only used for floats if NaNs don't matter. This patch therefore uses fmax_nan and fmin_nan instead. There is still some inconsistency, in that the optab attribute handles UNSPEC_COND_FMAX but the fmaxmin attribute handles UNSPEC_FMAX. This is because the SVE FP instructions, being predicated, have to use unspecs in cases where the Advanced SIMD ones could use rtl codes. At least there are no duplicate entries though, so this seemed like the best compromise for now. gcc/ * config/aarch64/iterators.md (optab): Use fmax_nan instead of smax_nan and fmin_nan instead of smin_nan. (maxmin_uns): Rename to... (fmaxmin): ...this and make the same changes. Remove entries unrelated to fmax* and fmin. config/aarch64/aarch64.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): Rename to... (aarch64_<optab>p<mode>): ...this. (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. (reduc_<maxmin_uns>_scal_<mode>): Rename to... (reduc_<optab>_scal_<mode>): ...this and update gen* call. (aarch64_reduc_<maxmin_uns>_internal<mode>): Rename to... (aarch64_reduc_<optab>_internal<mode>): ...this. (aarch64_reduc_<maxmin_uns>_internalv2si): Rename to... (aarch64_reduc_<optab>_internalv2si): ...this. * config/aarch64/aarch64-sve.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd-builtins.def (smax_nan, smin_nan) Rename to... (fmax_nan, fmin_nan): ...this. * config/aarch64/arm_neon.h (vmax_f32, vmax_f64, vmaxq_f32, vmaxq_f64) (vmin_f32, vmin_f64, vminq_f32, vminq_f64, vmax_f16, vmaxq_f16) (vmin_f16, vminq_f16): Update accordingly.	2021-11-10 12:38:43 +00:00
Richard Sandiford	0612883d9d	vect: Pass scalar_costs to finish_cost When finishing the vector costs, it can be useful to know what the associated scalar costs were. This allows targets to read information collected about the original scalar loop when trying to make a final judgement about the cost of the vector code. This patch therefore passes the scalar costs to vector_costs::finish_cost. The parameter is null for the scalar costs themselves. gcc/ * tree-vectorizer.h (vector_costs::finish_cost): Take the corresponding scalar costs as a parameter. (finish_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost) (vect_estimate_min_profitable_iters): Update accordingly. * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise. * tree-vectorizer.c (vector_costs::finish_cost): Likewise. * config/aarch64/aarch64.c (aarch64_vector_costs::finish_cost): Likewise. * config/rs6000/rs6000.c (rs6000_cost_data::finish_cost): Likewise.	2021-11-10 12:31:02 +00:00
Richard Sandiford	6ddc6a57a7	vect: Keep scalar costs around longer The scalar costs for a loop are fleeting, with only the final single_scalar_iteration_cost being kept for later comparison. This patch replaces single_scalar_iteration_cost with the cost structure, so that (with later patches) it's possible for targets to examine other target-specific cost properties as well. This will be done by passing the scalar costs to hooks where appropriate; targets shouldn't try to read the information directly from loop_vec_infos. gcc/ * tree-vectorizer.h (_loop_vec_info::scalar_costs): New member variable. (_loop_vec_info::single_scalar_iteration_cost): Delete. (LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Delete. (vector_costs::total_cost): New function. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update after above changes. (_loop_vec_info::~_loop_vec_info): Delete scalar_costs. (vect_compute_single_scalar_iteration_cost): Store the costs in loop_vinfo->scalar_costs. (vect_estimate_min_profitable_iters): Get the scalar cost from loop_vinfo->scalar_costs.	2021-11-10 12:31:01 +00:00
Richard Sandiford	5720a9d5be	vect: Hookize better_loop_vinfo_p One of the things we want to do on AArch64 is compare vector loops side-by-side and pick the best one. For some targets, we want this to be based on issue rates as well as the usual latency-based costs (at least for loops with relatively high iteration counts). The current approach to doing this is: when costing vectorisation candidate A, try to guess what the other main candidate B will look like and adjust A's latency-based cost up or down based on the likely difference between A and B's issue rates. This effectively means that we try to cost parts of B at the same time as A, without actually being able to see B. This is needlessly indirect and complex. It was a compromise due to the code being added (too) late in the GCC 11 cycle, so that target-independent changes weren't possible. The target-independent code already compares two candidate loop_vec_infos side-by-side, so that information about A and B above are available directly. This patch creates a way for targets to hook into this comparison. The AArch64 code can therefore hook into better_main_loop_than_p to compare issue rates. If the issue rate comparison isn't decisive, the code can fall back to the normal latency-based comparison instead. gcc/ * tree-vectorizer.h (vector_costs::better_main_loop_than_p) (vector_costs::better_epilogue_loop_than_p) (vector_costs::compare_inside_loop_cost) (vector_costs::compare_outside_loop_cost): Likewise. * tree-vectorizer.c (vector_costs::better_main_loop_than_p) (vector_costs::better_epilogue_loop_than_p) (vector_costs::compare_inside_loop_cost) (vector_costs::compare_outside_loop_cost): New functions, containing code moved from... * tree-vect-loop.c (vect_better_loop_vinfo_p): ...here.	2021-11-10 12:31:01 +00:00
Richard Sandiford	772d76acb5	vect: Remove vec_outside/inside_cost fields The vector costs now use a common base class instead of being completely abstract. This means that there's no longer a need to record the inside and outside costs separately. gcc/ * tree-vectorizer.h (_loop_vec_info): Remove vec_outside_cost and vec_inside_cost. (vector_costs::outside_cost): New function. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update after above. (vect_estimate_min_profitable_iters): Likewise. (vect_better_loop_vinfo_p): Get the inside and outside costs from the loop_vec_infos' vector_costs.	2021-11-10 12:31:00 +00:00
Richard Sandiford	4725f62789	vect: Move vector costs to loop_vec_info target_cost_data is in vec_info but is really specific to loop_vec_info. This patch moves it there and renames it to vector_costs, to distinguish it from scalar target costs. gcc/ * tree-vectorizer.h (vec_info::target_cost_data): Replace with... (_loop_vec_info::vector_costs): ...this. (LOOP_VINFO_TARGET_COST_DATA): Delete. * tree-vectorizer.c (vec_info::vec_info): Remove target_cost_data initialization. (vec_info::~vec_info): Remove corresponding delete. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize vector_costs to null. (_loop_vec_info::~_loop_vec_info): Delete vector_costs. (vect_analyze_loop_operations): Update after above changes. (vect_analyze_loop_2): Likewise. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-slp.c (vect_slp_analyze_operations): Likewise.	2021-11-10 12:31:00 +00:00
Jan Hubicka	d70ef65692	Make EAF flags more regular (and expressive) I hoped that I am done with EAF flags related changes, but while looking into the Fortran testcases I noticed that I have designed them in unnecesarily restricted way. I followed the scheme of NOESCAPE and NODIRECTESCAPE which is however the only property tht is naturally transitive. This patch replaces the existing flags by 9 flags: EAF_UNUSED EAF_NO_DIRECT_CLOBBER and EAF_NO_INDIRECT_CLOBBER EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ EAF_NO_DIRECT_ESCAPE and EAF_NO_INDIRECT_ESCAPE EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ So I have removed the unified EAF_DIRECT flag and made each of the flags to come in direct and indirect variant. Newly the indirect variant is not implied by direct (well except for escape but it is not special cased in the code) Consequently we can analyse i.e. the case where function reads directly and clobber indirectly as in the following testcase: struct wrap { void *array; }; __attribute__ ((noinline)) void write_array (struct wrap ptr) { ptr->array[0]=0; } int test () { void arrayval; struct wrap w = {&arrayval}; write_array (&w); return w.array == &arrayval; } This is pretty common in array descriptors and also C++ pointer wrappers or structures containing pointers to arrays. Other advantage is that !binds_to_current_def_p functions we can still track the fact that the value is not clobbered indirectly while previously we implied EAF_DIRECT for all three cases. Finally the propagation becomes more regular and I hope easier to understand because the flags are handled in a symmetric way. In tree-ssa-structalias I now produce "callarg" var_info as before and if necessary also "indircallarg" for the indirect accesses. I added some logic to optimize the common case where we can not make difference between direct and indirect. gcc/ChangeLog: 2021-11-09 Jan Hubicka <hubicka@ucw.cz> tree-core.h (EAF_DIRECT): Remove. (EAF_NOCLOBBER): Remove. (EAF_UNUSED): Remove. (EAF_NOESCAPE): Remove. (EAF_NO_DIRECT_CLOBBER): New. (EAF_NO_INDIRECT_CLOBBER): New. (EAF_NODIRECTESCAPE): Remove. (EAF_NO_DIRECT_ESCAPE): New. (EAF_NO_INDIRECT_ESCAPE): New. (EAF_NOT_RETURNED): Remove. (EAF_NOT_RETURNED_INDIRECTLY): New. (EAF_NOREAD): Remove. (EAF_NO_DIRECT_READ): New. (EAF_NO_INDIRECT_READ): New. * gimple.c (gimple_call_arg_flags): Update for new flags. (gimple_call_retslot_flags): Update for new flags. * ipa-modref.c (dump_eaf_flags): Likewise. (remove_useless_eaf_flags): Likewise. (deref_flags): Likewise. (modref_lattice::init): Likewise. (modref_lattice::merge): Likewise. (modref_lattice::merge_direct_load): Likewise. (modref_lattice::merge_direct_store): Likewise. (modref_eaf_analysis::merge_call_lhs_flags): Likewise. (callee_to_caller_flags): Likewise. (modref_eaf_analysis::analyze_ssa_name): Likewise. (modref_eaf_analysis::propagate): Likewise. (modref_merge_call_site_flags): Likewise. * ipa-modref.h (interposable_eaf_flags): Likewise. * tree-ssa-alias.c: (ref_maybe_used_by_call_p_1) Likewise. * tree-ssa-structalias.c (handle_call_arg): Likewise. (handle_rhs_call): Likewise. * tree-ssa-uninit.c (maybe_warn_pass_by_reference): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ipa/modref-1.C: Update template. * gcc.dg/ipa/modref-3.c: Update template. * gcc.dg/lto/modref-3_0.c: Update template. * gcc.dg/lto/modref-4_0.c: Update template. * gcc.dg/tree-ssa/modref-10.c: Update template. * gcc.dg/tree-ssa/modref-11.c: Update template. * gcc.dg/tree-ssa/modref-5.c: Update template. * gcc.dg/tree-ssa/modref-6.c: Update template. * gcc.dg/tree-ssa/modref-13.c: New test.	2021-11-10 13:08:41 +01:00
Tamar Christina	0cf6065ce4	testsuite: change vect_long to vect_long_long in complex tests. These tests are still failing on SPARC and it looks like this is because I need to use vect_long_long instead of vect_long. gcc/testsuite/ChangeLog: PR testsuite/103042 * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Use vect_long_long instead of vect_long. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: Likewise. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Likewise. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: Likewise.	2021-11-10 12:05:50 +00:00
Tamar Christina	92617a8e2a	middle-end: Fix signbit tests when ran on ISA with support for masks. These test don't work on vector ISAs where the truth type don't match the vector mode of the operation. However I still want the tests to run on these architectures but just turn off the ISA modes that enable masks. This thus turns off SVE is it's on and turns off AVX512 if it's on. gcc/testsuite/ChangeLog: * gcc.dg/signbit-2.c: Turn off masks. * gcc.dg/signbit-5.c: Likewise.	2021-11-10 12:05:50 +00:00
Tamar Christina	5cfa174e08	vect: remove unused variable in complex numbers detection code. This removed an unused variable that clang seems to catch when compiling GCC with Clang. gcc/ChangeLog: * tree-vect-slp-patterns.c (complex_mul_pattern::matches): Remove l1node.	2021-11-10 12:05:49 +00:00
Jonathan Wakely	77963796ae	libstdc++: Fix test for libstdc++ not including <unistd.h> [PR100117] The <cxxx> headers for the C library are not under our control, so we can't prevent them from including <unistd.h>. Change the PR 49745 test to only include the C++ library headers, not the <cxxx> ones. To ensure <bits/stdc++.h> isn't included automatically we need to use no_pch to disable PCH. libstdc++-v3/ChangeLog: PR libstdc++/100117 * testsuite/17_intro/headers/c++1998/49745.cc: Explicitly list all C++ headers instead of including <bits/stdc++.h>	2021-11-10 12:03:29 +00:00
Jonathan Wakely	80fe172ba9	libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133] Since Glibc 2.34 all pthreads symbols are defined directly in libc not libpthread, and since Glibc 2.32 we have used __libc_single_threaded to avoid unnecessary locking in single-threaded programs. This means there is no reason to avoid linking to libpthread now, and so no reason to use weak symbols defined in gthr-posix.h for all the pthread_xxx functions. libstdc++-v3/ChangeLog: PR libstdc++/100748 PR libstdc++/103133 * config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define for glibc 2.34 and later.	2021-11-10 12:01:27 +00:00
Richard Biener	b2cd32b743	testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.C This XFAILs the bogus diagnostic test and rectifies the expectation on the optimization. 2021-11-10 Richard Biener <rguenther@suse.de> PR testsuite/102690 * g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part and optimization.	2021-11-10 11:09:56 +01:00
Andre Vieira	0f68560161	[AArch64] Fix TBAA information when lowering NEON loads and stores to gimple This patch fixes the wrong TBAA information when lowering NEON loads and stores to gimple that showed up when bootstrapping with UBSAN. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin): Change pointer alignment and alias. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/lowering_tbaa.c: New test.	2021-11-10 09:52:49 +00:00

... 3 4 5 6 7 ...

189764 Commits