When using rangers private callback mechanism to provide context
to fold_stmt calls, we are only suppose to be using the cache in read
only mode, never calculate new values.
gcc/
PR tree-optimization/103122
* gimple-range.cc (gimple_ranger::range_of_expr): Request the cache
entry with "calulate new values" set to false.
gcc/testsuite/
* g++.dg/pr103122.C: New.
For nested functions we output call to builtin_dwarf_cfa which
initializes frame entry used only for debugging. This however
prevents us from detecting functions containing nested functions
as const/pure or analyze side effects in modref.
builtin_dwarf_cfa is not documented and I wonder if it should be turned to
internal function. But I think we could consider functions using it const even
if in theory one can do things like test the return address and see the
difference between different frame addreses.
While doing so I also noticed that special_buitin_state handles quite few
builtins that are not special cased by ipa-modref. They do not make
user visible loads/stores and thus I think they shoul dbe annotated by
".c" to make this explicit for both modref and PTA.
Finally I aded dwarf_cfa and similar return_address to list of simple
bulitins since it compiles to simple stack frame load (and we consider
simple other builtins doing so).
* builtins.c (is_simple_builtin): Add builitin_dwarf_cfa
and builtin_return_address.
(builtin_fnspec): Annotate builtin_return,
bulitin_eh_pointer, builtin_eh_filter, builtin_unwind_resume,
builtin_cxa_end_cleanup, builtin_eh_copy_values,
builtin_frame_address, builtin_apply_args,
builtin_asan_before_dynamic_init, builtin_asan_after_dynamic_init,
builtin_prefetch, builtin_dwarf_cfa, builtin_return_addrss
as ".c"
* ipa-pure-const.c (special_builtin_state): Add builtin_dwarf_cfa
and builtin_return_address.
moveS uncprop after modref and pure/const pass and adds a comment that
this pass should alwasy be last since it is only supposed to help PHI lowering.
The pass replaces constant by SSA names that are known to be constant at the
place which hardly helps other passes.
gcc/ChangeLog:
PR tree-optimization/103177
* passes.def: Move uncprop after pure/const and modref.
My recent patch to improve debug experience when there are removed
parameters (by ipa-sra or ipa-split) was not careful to unshare the
expressions that were then put into debug statements, which manifests
itself as PR 103099. This patch adds unsharing them using
unshare_expr_without_location which is a bit more careful with stripping
locations than what we were doing manually and so also fixes PR 103107.
gcc/ChangeLog:
2021-11-08 Martin Jambor <mjambor@suse.cz>
PR ipa/103099
PR ipa/103107
* tree-inline.c (remap_gimple_stmt): Unshare the expression without
location before invoking remap_with_debug_expressions on it.
* ipa-param-manipulation.c
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
gcc/testsuite/ChangeLog:
2021-11-08 Martin Jambor <mjambor@suse.cz>
PR ipa/103099
PR ipa/103107
* g++.dg/ipa/pr103099.C: New test.
* gcc.dg/ipa/pr103107.c: Likewise.
The vsx_splat_v4si_di pattern uses a Power8 and a Power9 instruction.
The final condition of TARGET_DIRECT_MODE_64BIT implicitly requires Power8.
The "we" constraint requires Power9, but also requires 64 bit. Because
the DImode pattern already requires 64 bit mode, this isn't horrible,
but it would be best to remove all uses of "we" constraint. The
mtvsrws instruction itself does not require 64 bit mode.
This patch reverts the previous change to fix the breakage.
gcc/ChangeLog:
* config/rs6000/vsx.md (vsx_splat_v4si_di): Revert "wa"
constraint to "we".
The sbitmap bitmap_{set,clear}_bit changes trigger spurious
uninit value use reportings from valgrind since we now
read the old value before setting/clearing a bit so
verify_loop_structures optimization to not clear the sbitmap is reported.
Fixed by using a temporary BB flag which should also be more
efficient in terms of cache re-use.
2021-11-08 Richard Biener <rguenther@suse.de>
* cfgloop.c (verify_loop_structure): Use a temporary BB flag
instead of an sbitmap to cache irreducible state.
The problem here is an ordering issue with a path that starts
with 19->3:
<bb 3> [local count: 916928331]:
# value_20 = PHI <value_17(19), value_7(D)(17)>
# n_27 = PHI <n_16(19), 1(17)>
n_16 = n_27 + 4;
value_17 = value_20 / 10000;
if (value_20 > 42949672959999)
goto <bb 19>; [89.00%]
else
goto <bb 4>; [11.00%]
The problem here is that both value_17 and value_20 are in the set of
imports we must pre-calculate. The value_17 name occurs first in the
bitmap, so we try to resolve it first, which causes us to recursively
solve the value_20 range. We do so correctly and put them both in the
cache. However, when we try to solve value_20 from the bitmap, we
ignore that it already has a cached entry and try to resolve the PHI
with the wrong value of value_17:
# value_20 = PHI <value_17(19), value_7(D)(17)>
The right thing to do is to avoid recalculating definitions already
solved.
Regstrapped and checked for # threads before and after on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/103120
* gimple-range-path.cc (path_range_query::range_defined_in_block):
Bail if there's a cache entry.
gcc/testsuite/ChangeLog:
* gcc.dg/pr103120.c: New test.
There are a few leftover places where we use the old rs6000_builtins_decl
array, but we need to use rs6000_builtins_decl_x instead when the new
builtins infrastructure is in play.
2021-11-08 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
rs6000_builtin_decls_x when appropriate.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
Create a new version of this function that uses the new infrastructure,
and particularly checks for supported builtins the new way.
2021-11-08 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New function.
(rs6000_builtin_decl): Call it.
Running 'contrib/update-copyright.py' currently fails:
[...]
Traceback (most recent call last):
File "contrib/update-copyright.py", line 365, in update_copyright
canon_form = self.canonicalise_years (dir, filename, filter, years)
File "contrib/update-copyright.py", line 270, in canonicalise_years
(min_year, max_year) = self.year_range (years)
File "contrib/update-copyright.py", line 253, in year_range
year_list = [self.parse_year (year)
File "contrib/update-copyright.py", line 253, in <listcomp>
year_list = [self.parse_year (year)
File "contrib/update-copyright.py", line 250, in parse_year
raise self.BadYear (string)
TypeError: exceptions must derive from BaseException
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "contrib/update-copyright.py", line 796, in <module>
GCCCmdLine().main()
File "contrib/update-copyright.py", line 527, in main
self.copyright.process_tree (dir, filter)
File "contrib/update-copyright.py", line 458, in process_tree
self.process_file (dir, filename, filter)
File "contrib/update-copyright.py", line 421, in process_file
res = self.update_copyright (dir, filename, filter,
File "contrib/update-copyright.py", line 366, in update_copyright
except self.BadYear as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Fix up for commit 3b25e83536
"Port update-copyright.py to Python3".
contrib/
* update-copyright.py (class BadYear): Derive from 'Exception'.
The LD3/ST3 and LD4/ST4 address cost code had no test coverage (oops).
This patch fixes that and updates it for the new structure modes.
The test only covers Advanced SIMD because SVE doesn't have
post-increment forms.
gcc/
* config/aarch64/aarch64.c (aarch64_ldn_stn_vectors): New function.
(aarch64_address_cost): Use it instead of testing for CImode and
XImode directly.
gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_1.c: New test.
I was working on a patch that needed to calculate the number of
modes in a particular class. It seemed better to have genmodes
generate this directly rather than do the kind of dance that
expmed.h had.
gcc/
* genmodes.c (emit_insn_modes_h): Define NUM_MODE_* macros.
* expmed.h (NUM_MODE_INT): Delete in favor of genmodes definitions.
(NUM_MODE_PARTIAL_INT, NUM_MODE_VECTOR_INT): Likewise.
* real.h (real_format_for_mode): Use NUM_MODE_FLOAT and
NUM_MODE_DECIMAL_FLOAT.
(REAL_MODE_FORMAT): Likewise.
This fixes an oversight that caused vectorized epilogues to have
versioning for niters applied.
2021-11-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_create_loop_vinfo): Add main_loop_info
parameter.
* tree-vect-loop.c (vect_create_loop_vinfo): Likewise. Set
LOOP_VINFO_ORIG_LOOP_INFO and conditionalize set of
LOOP_VINFO_NITERS_ASSUMPTIONS.
(vect_analyze_loop_1): Adjust.
(vect_analyze_loop): Move loop constraint setting and
SCEV/niter reset here from vect_create_loop_vinfo to perform
it only once.
(vect_analyze_loop_form): Move dumping of symbolic niters
here from vect_create_loop_vinfo.
Adds tracking of accesses relative to static chain into modref
load/stores analysis. This helps some Fortran benchmarks however it is still
quite limited. One problem is that we never discover functions with nested
functions as const, pure or not accessing global memory because it contains
__builtin_dward_cfa call which we believe to be non-pure.
Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow if there are
no complains and once periodic testers picks today modref changes.
Honza
gcc/ChangeLog:
* ipa-modref-tree.h (enum modref_special_parms): New enum.
(struct modref_access_node): update for special parms.
(struct modref_ref_node): Likewise.
(struct modref_parm_map): Likewise.
(struct modref_tree): Likewise.
* ipa-modref.c (dump_access): Likewise.
(get_access): Detect static chain.
(parm_map_for_arg): Take tree as arg instead of
stmt and index.
(merge_call_side_effects): Compute map for static chain.
(process_fnspec): Update.
(struct escape_point): Remove retslot_arg and static_chain_arg.
(analyze_parms): Update.
(compute_parm_map): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(modref_merge_call_site_flags): Update.
(ipa_merge_modref_summary_after_inlining): Update.
* tree-ssa-alias.c (modref_may_conflict): Handle static chain.
* ipa-modref-tree.c (test_merge): Update.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/modref-12.c: New test.
gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable
gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not
set.
gcc/testsuite/
* gcc.target/powerpc/vec-minmax-1.c: New test.
* gcc.target/powerpc/vec-minmax-2.c: Likewise.
gcc/ChangeLog:
PR tree-optimization/103077
* doc/invoke.texi (Options That Control Optimization):
Update documentation for -ftree-loop-vectorize and
-ftree-slp-vectorize which are enabled by default at -02.
> Note that this is not safe with -fsignaling-nans, so needs to be disabled
> for that option (if there isn't already logic somewhere with that effect),
> because the extend will convert a signaling NaN to quiet (raising
> "invalid"), but copysign won't, so this transformation could result in a
> signaling NaN being wrongly returned when the original code would never
> have returned a signaling NaN.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
gcc/ChangeLog
PR target/102464
* match.pd (Simplifcation (trunc)copysign((extend)a, (extend)b)
to .COPYSIGN (a, b)): Add !HONOR_SNANS.
a, b, c are same type as truncation type and has less precision than
extend type, the optimization is guarded under
flag_unsafe_math_optimizations.
gcc/ChangeLog:
PR target/102464
* match.pd: Simplify
(trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a, b,
c) under flag_unsafe_math_optimizations.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102464-fma.c: New test.
Now that things have stabilized, we can remove the old code.
I have left the hybrid threader in tree-ssa-threadedge, even though the
VRP threader was the only user, because we may need it as an interim
step for DOM threading removal.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-pass.h (make_pass_vrp_threader): Remove.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove
ASSERT_EXPR references.
* tree-ssa-threadedge.c (jt_state::register_equivs_stmt): Same.
* tree-vrp.c (vrp_folder::simplify_casted_conds): Same.
(execute_vrp): Same.
(class hybrid_threader): Remove.
(hybrid_threader::hybrid_threader): Remove.
(hybrid_threader::~hybrid_threader): Remove.
(hybrid_threader::before_dom_children): Remove.
(hybrid_threader::after_dom_children): Remove.
(execute_vrp_threader): Remove.
(class pass_vrp_threader): Remove.
(make_pass_vrp_threader): Remove.
While proofreading the code for handling EAF flags of !binds_to_current_def_p I
noticed that the interprocedural dataflow actually ignores the flag possibly
introducing wrong code on quite complex interposable functions in non-trivial
recursion cycles (or at ltrans partition boundary).
This patch unifies the flags changes to single place (remove_useless_eaf_flags)
and does extend modref_merge_call_site_flags to do the right thing.
lto-bootstrapped/regtested x86_64-linux. Plan to commit it today after bit
more testing (firefox/clang build).
gcc/ChangeLog:
* gimple.c (gimple_call_arg_flags): Use interposable_eaf_flags.
(gimple_call_retslot_flags): Likewise.
(gimple_call_static_chain_flags): Likewise.
* ipa-modref.c (remove_useless_eaf_flags): Do not remove everything for
NOVOPS.
(modref_summary::useful_p): Likewise.
(modref_summary_lto::useful_p): Likewise.
(analyze_parms): Do not give up on NOVOPS.
(analyze_function): When dumping report chnages in EAF flags
between IPA and local pass.
(modref_merge_call_site_flags): Compute implicit eaf flags
based on callee ecf_flags and fnspec; if the function does not
bind to current defs use interposable_eaf_flags.
(modref_propagate_flags_in_scc): Update.
* ipa-modref.h (interposable_eaf_flags): New function.
This patch forms the meat of the improvements for this patch series.
We develop a replacement for rs6000_expand_builtin and its supporting
functions, which are inefficient and difficult to maintain.
Differences between the old and new support in this patch include:
- Make use of the new builtin data structures, directly looking up
a function's information rather than searching for the function
multiple times;
- Test for enablement of builtins at expand time, to support #pragma
target changes within a compilation unit;
- Use the builtin function attributes (e.g., bif_is_cpu) to control
special handling;
- Refactor common code into one place; and
- Provide common error handling in one place for operands that are
restricted to specific values or ranges.
2021-11-07 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
forward decl.
(rs6000_invalid_new_builtin): New function.
(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
(rs6000_expand_ldst_mask): New function.
(new_cpu_expand_builtin): Likewise.
(elemrev_icode): Likewise.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
(new_mma_expand_builtin): Likewise.
(new_htm_spr_num): Likewise.
(new_htm_expand_builtin): Likewise.
(rs6000_expand_new_builtin): Likewise.
(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.
implement the (long promised) intraprocedural dataflow for
propagating eaf flags, so we can handle parameters that participate
in loops in SSA graphs. Typical example are acessors that walk linked
lists, for example.
I implemented dataflow using the standard iteration over BBs in RPO some time
ago, but did not like it becuase it had measurable compile time impact with
very small code quality effect. This is why I kept mainline to do the DFS walk
instead. The reason is that we care about flags of SSA names that corresponds
to parameters and those can be often determined from a small fraction of the
SSA graph so solving dataflow for all SSA names in a function is a waste.
This patch implements dataflow more carefully. The DFS walk is kept in place to
solve acyclic cases and discover the relevat part of SSA graph into new graph
(which is similar to one used for inter-procedrual dataflow - we only need to
know the edges and if the access is direct or derefernced). The RPO iterative
dataflow then works on this simplified graph.
This seems to be fast in practice. For GCC linktime we do dataflow for 4881
functions. Out of that 4726 finishes in one iteration, 144 in two and 10 in 3.
Overall 31979 functions are analysed, so we do dataflow only for bit over of
10% of cases. 131123 edges are visited by the solver. I measured no compile
time impact of this.
gcc/ChangeLog:
* ipa-modref.c (modref_lattice): Add do_dataflow,
changed and propagate_to fields.
(modref_lattice::release): Free propagate_to
(modref_lattice::merge): Do not give up early on unknown
lattice values.
(modref_lattice::merge_deref): Likewise.
(modref_eaf_analysis): Update toplevel comment.
(modref_eaf_analysis::analyze_ssa_name): Record postponned ssa names;
do optimistic dataflow initialization.
(modref_eaf_analysis::merge_with_ssa_name): Build dataflow graph.
(modref_eaf_analysis::propagate): New member function.
(analyze_parms): Update to new API of modref_eaf_analysis.
gcc/ChangeLog:
* cgraph.h (cgraph_node::can_be_discarded_p): Do not
return true on functions from other partition.
gcc/lto/ChangeLog:
PR ipa/103070
PR ipa/103058
* lto-partition.c (must_not_rename): Update comment.
(promote_symbol): Set resolution to LDPR_PREVAILING_DEF_IRONLY.
Tamar's recent patch to teach CSE to perform vector extract exercises
VSX splat more frequently, which exposed a constraint error for the
vsx_splat patterns. The pattern could be created for Power9, but
the "we constraint only provided alternatives in 64 bit mode. The
instructions are valid in 32 bit mode and SImode is allowed in VSX
registers. This patch updates the constraints from "we" to "wa" to
allow the pattern and fix the failing testcases.
gcc/ChangeLog:
* config/rs6000/vsx.md (vsx_splat_v4si): Change constraints to "wa".
(vsx_splat_v4si_di): Change constraint to "wa".
The problem here is that we are incorrectly threading 41->20->21 here:
<bb 35> [local count: 56063504182]:
_134 = M.10_120 + 1;
if (_71 <= _134)
goto <bb 19>; [11.00%]
else
goto <bb 41>; [89.00%]
...
...
...
<bb 41> [local count: 49896518755]:
<bb 20> [local count: 56063503181]:
# lb_75 = PHI <_134(41), 1(18)>
_117 = mstep_49 + lb_75;
_118 = _117 + -1;
_119 = mstep_49 + _118;
M.10_120 = MIN_EXPR <_119, _71>;
if (lb_75 > M.10_120)
goto <bb 21>; [11.00%]
else
goto <bb 22>; [89.00%]
First, lb_17 == _134 because of the PHI.
Second, _134 > M.10_120 because of _134 = M.10_120 + 1.
We then assume that lb_75 > M.10_120, but this is incorrect because
M.10_120 was killed along the path.
This incorrect thread causes the miscompilation in 527.cam4_r.
Tested on x86-64 and ppc64le Linux.
gcc/ChangeLog:
PR tree-optimization/103061
* value-relation.cc (path_oracle::path_oracle): Initialize
m_killed_defs.
(path_oracle::killing_def): Set m_killed_defs.
(path_oracle::query_relation): Do not look at the root oracle for
killed defs.
* value-relation.h (class path_oracle): Add m_killed_defs.
AIX does not provide memalign, so the testcases much use
posix_memalign for portability on AIX.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/tsvc/tsvc.h (init): Use posix_memalign on AIX.
The main path discovery function was due for a cleanup. First,
there's a nagging goto and second, my bitmap use was sloppy. Hopefully
this makes the code easier for others to read.
Regstrapped on x86-64 Linux. I also made sure there were no difference
in the number of threads with this patch.
No functional changes.
gcc/ChangeLog:
* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
Remove gotos and other cleanups.
gcc/ChangeLog:
PR ipa/103073
* ipa-modref-tree.h (modref_tree::insert): Do nothing for
paradoxical and zero sized accesses.
gcc/testsuite/ChangeLog:
PR ipa/103073
* g++.dg/torture/pr103073.C: New test.
* gcc.dg/tree-ssa/modref-11.c: New test.
gcc/fortran/ChangeLog:
PR fortran/69419
* match.c (gfc_match_common): Check array spec of a symbol in a
COMMON object list and reject it if it is a coarray.
gcc/testsuite/ChangeLog:
PR fortran/69419
* gfortran.dg/pr69419.f90: New test.
These declarations should be noexcept after I added it to the
definitions in <valarray>.
libstdc++-v3/ChangeLog:
* include/bits/range_access.h (begin(valarray), end(valarray)):
Add noexcept.
2021-11-05 Sandra Loosemore <sandra@codesourcery.com>
PR fortran/35276
gcc/fortran/
* gfortran.texi (Mixed-Language Programming): Talk about C++,
and how to link.
Currently all the tsvc tests fail to build on Darwin because
they assume that <malloc.h> and memalign() are available.
For Darwin, <stdlib.h> is sufficient to obtain the declarations
for malloc and the port has posix_memalign () but not memalign.
Fixed as below.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:
* gcc.dg/vect/tsvc/tsvc.h: Do not try to include malloc.h
on Darwin also use posix_memalign ().