gcc/ada/
* sem_ch13.adb (Freeze_Entity_Checks): Analyze the expression of
a pragma Predicate associated with an aspect at the freeze point
of the type, to ensure that references to globals get saved when
the aspect occurs within a generic body. Also, add
Aspect_Static_Predicate to the choices of the membership test of
the enclosing guard.
gcc/ada/
* exp_ch4.adb (Arr_Attr): Refine type of the parameter from Int
to Pos; refine name of the parameter from Num to Dim; fix
reference to "Expr" in comment.
gcc/ada/
* libgnat/s-regexp.adb (Compile.Check_Well_Formed_Patern): When
a "|" operator is encountered in a pattern, check that it is not
the last character of the pattern.
gcc/ada/
* checks.adb (Apply_Constraint_Check): Guard against calling
Choices when the first association in an array aggregate is a
N_Iterated_Component_Association node.
gcc/ada/
* sem_prag.adb (Check_Usage): Guard against calling Usage_Error
with illegal Item_Id. The intention to do this was already
described in the comment but not implemented.
The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.
No additional cleanups have been done. For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range. Fixing this
could further improve these passes.
Basically the entire patch is just adjusting the calls to range_of_expr
to include context. The previous context of si->stmt was mostly
empty, so not really useful ;-).
With ranger we are now able to remove the range calculation from
before_dom_children entirely. Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code. I have XFAILed the test and documented what the problem is.
On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-strlen.c (compare_nonzero_chars): Pass statement
context to ranger.
(get_addr_stridx): Same.
(get_stridx): Same.
(get_range_strlen_dynamic): Same.
(handle_builtin_strlen): Same.
(handle_builtin_strchr): Same.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Same.
(handle_builtin_stxncpy_strncat): Same.
(handle_builtin_memcpy): Same.
(handle_builtin_strcat): Same.
(handle_alloc_call): Same.
(handle_builtin_memset): Same.
(handle_builtin_string_cmp): Same.
(handle_pointer_plus): Same.
(count_nonzero_bytes_addr): Same.
(count_nonzero_bytes): Same.
(handle_store): Same.
(fold_strstr_to_strncmp): Same.
(handle_integral_assign): Same.
(check_and_optimize_stmt): Same.
(class strlen_dom_walker): Replace evrp with ranger.
(strlen_dom_walker::before_dom_children): Remove evrp.
(strlen_dom_walker::after_dom_children): Remove evrp.
* gimple-ssa-warn-access.cc (maybe_check_access_sizes):
Restrict sprintf output.
gcc/cp/ChangeLog:
* ptree.c (cxx_print_xnode): Add more space to pfx array.
gcc/fortran/ChangeLog:
* misc.c (gfc_dummy_typename): Make sure ts->kind is
non-negative.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: XFAIL.
According to
https://gcc.gnu.org/legacy-ml/gcc-patches/2008-03/msg01698.html, the
TLS support, including the __tls_lookup function, was added to VxWorks
in 6.6.
It certainly doesn't exist on our VxWorks 5 platform, but the fallback
code in eh_globals.cc using __gthread_key_create() etc. used to work
just fine.
libstdc++-v3/ChangeLog:
* config/os/vxworks/os_defines.h (_GLIBCXX_HAVE_TLS): Only
define for VxWorks >= 6.6.
The first issue is that the !gotoff_operand path of legitimize_pic_address
in the large PIC model does not make use of REG when it is available, which
breaks for thunks because new pseudo-registers can no longer be created.
And the second issue is that the system compiler (LLVM) generates @GOTOFF
in large model even for RTP, so we do the same.
gcc/
* config/i386/i386.c (legitimize_pic_address): Adjust comment and
use the REG argument on the CM_LARGE_PIC code path as well.
* config/i386/predicates.md (gotoff_operand): Do not treat VxWorks
specially with the large code models.
When using rangers private callback mechanism to provide context
to fold_stmt calls, we are only suppose to be using the cache in read
only mode, never calculate new values.
gcc/
PR tree-optimization/103122
* gimple-range.cc (gimple_ranger::range_of_expr): Request the cache
entry with "calulate new values" set to false.
gcc/testsuite/
* g++.dg/pr103122.C: New.
For nested functions we output call to builtin_dwarf_cfa which
initializes frame entry used only for debugging. This however
prevents us from detecting functions containing nested functions
as const/pure or analyze side effects in modref.
builtin_dwarf_cfa is not documented and I wonder if it should be turned to
internal function. But I think we could consider functions using it const even
if in theory one can do things like test the return address and see the
difference between different frame addreses.
While doing so I also noticed that special_buitin_state handles quite few
builtins that are not special cased by ipa-modref. They do not make
user visible loads/stores and thus I think they shoul dbe annotated by
".c" to make this explicit for both modref and PTA.
Finally I aded dwarf_cfa and similar return_address to list of simple
bulitins since it compiles to simple stack frame load (and we consider
simple other builtins doing so).
* builtins.c (is_simple_builtin): Add builitin_dwarf_cfa
and builtin_return_address.
(builtin_fnspec): Annotate builtin_return,
bulitin_eh_pointer, builtin_eh_filter, builtin_unwind_resume,
builtin_cxa_end_cleanup, builtin_eh_copy_values,
builtin_frame_address, builtin_apply_args,
builtin_asan_before_dynamic_init, builtin_asan_after_dynamic_init,
builtin_prefetch, builtin_dwarf_cfa, builtin_return_addrss
as ".c"
* ipa-pure-const.c (special_builtin_state): Add builtin_dwarf_cfa
and builtin_return_address.
moveS uncprop after modref and pure/const pass and adds a comment that
this pass should alwasy be last since it is only supposed to help PHI lowering.
The pass replaces constant by SSA names that are known to be constant at the
place which hardly helps other passes.
gcc/ChangeLog:
PR tree-optimization/103177
* passes.def: Move uncprop after pure/const and modref.
My recent patch to improve debug experience when there are removed
parameters (by ipa-sra or ipa-split) was not careful to unshare the
expressions that were then put into debug statements, which manifests
itself as PR 103099. This patch adds unsharing them using
unshare_expr_without_location which is a bit more careful with stripping
locations than what we were doing manually and so also fixes PR 103107.
gcc/ChangeLog:
2021-11-08 Martin Jambor <mjambor@suse.cz>
PR ipa/103099
PR ipa/103107
* tree-inline.c (remap_gimple_stmt): Unshare the expression without
location before invoking remap_with_debug_expressions on it.
* ipa-param-manipulation.c
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
gcc/testsuite/ChangeLog:
2021-11-08 Martin Jambor <mjambor@suse.cz>
PR ipa/103099
PR ipa/103107
* g++.dg/ipa/pr103099.C: New test.
* gcc.dg/ipa/pr103107.c: Likewise.
The vsx_splat_v4si_di pattern uses a Power8 and a Power9 instruction.
The final condition of TARGET_DIRECT_MODE_64BIT implicitly requires Power8.
The "we" constraint requires Power9, but also requires 64 bit. Because
the DImode pattern already requires 64 bit mode, this isn't horrible,
but it would be best to remove all uses of "we" constraint. The
mtvsrws instruction itself does not require 64 bit mode.
This patch reverts the previous change to fix the breakage.
gcc/ChangeLog:
* config/rs6000/vsx.md (vsx_splat_v4si_di): Revert "wa"
constraint to "we".
The sbitmap bitmap_{set,clear}_bit changes trigger spurious
uninit value use reportings from valgrind since we now
read the old value before setting/clearing a bit so
verify_loop_structures optimization to not clear the sbitmap is reported.
Fixed by using a temporary BB flag which should also be more
efficient in terms of cache re-use.
2021-11-08 Richard Biener <rguenther@suse.de>
* cfgloop.c (verify_loop_structure): Use a temporary BB flag
instead of an sbitmap to cache irreducible state.
The problem here is an ordering issue with a path that starts
with 19->3:
<bb 3> [local count: 916928331]:
# value_20 = PHI <value_17(19), value_7(D)(17)>
# n_27 = PHI <n_16(19), 1(17)>
n_16 = n_27 + 4;
value_17 = value_20 / 10000;
if (value_20 > 42949672959999)
goto <bb 19>; [89.00%]
else
goto <bb 4>; [11.00%]
The problem here is that both value_17 and value_20 are in the set of
imports we must pre-calculate. The value_17 name occurs first in the
bitmap, so we try to resolve it first, which causes us to recursively
solve the value_20 range. We do so correctly and put them both in the
cache. However, when we try to solve value_20 from the bitmap, we
ignore that it already has a cached entry and try to resolve the PHI
with the wrong value of value_17:
# value_20 = PHI <value_17(19), value_7(D)(17)>
The right thing to do is to avoid recalculating definitions already
solved.
Regstrapped and checked for # threads before and after on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/103120
* gimple-range-path.cc (path_range_query::range_defined_in_block):
Bail if there's a cache entry.
gcc/testsuite/ChangeLog:
* gcc.dg/pr103120.c: New test.
There are a few leftover places where we use the old rs6000_builtins_decl
array, but we need to use rs6000_builtins_decl_x instead when the new
builtins infrastructure is in play.
2021-11-08 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
rs6000_builtin_decls_x when appropriate.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
Create a new version of this function that uses the new infrastructure,
and particularly checks for supported builtins the new way.
2021-11-08 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New function.
(rs6000_builtin_decl): Call it.
Running 'contrib/update-copyright.py' currently fails:
[...]
Traceback (most recent call last):
File "contrib/update-copyright.py", line 365, in update_copyright
canon_form = self.canonicalise_years (dir, filename, filter, years)
File "contrib/update-copyright.py", line 270, in canonicalise_years
(min_year, max_year) = self.year_range (years)
File "contrib/update-copyright.py", line 253, in year_range
year_list = [self.parse_year (year)
File "contrib/update-copyright.py", line 253, in <listcomp>
year_list = [self.parse_year (year)
File "contrib/update-copyright.py", line 250, in parse_year
raise self.BadYear (string)
TypeError: exceptions must derive from BaseException
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "contrib/update-copyright.py", line 796, in <module>
GCCCmdLine().main()
File "contrib/update-copyright.py", line 527, in main
self.copyright.process_tree (dir, filter)
File "contrib/update-copyright.py", line 458, in process_tree
self.process_file (dir, filename, filter)
File "contrib/update-copyright.py", line 421, in process_file
res = self.update_copyright (dir, filename, filter,
File "contrib/update-copyright.py", line 366, in update_copyright
except self.BadYear as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Fix up for commit 3b25e83536bcd1b2977659a2c6d9f0f9bf2a3152
"Port update-copyright.py to Python3".
contrib/
* update-copyright.py (class BadYear): Derive from 'Exception'.
The LD3/ST3 and LD4/ST4 address cost code had no test coverage (oops).
This patch fixes that and updates it for the new structure modes.
The test only covers Advanced SIMD because SVE doesn't have
post-increment forms.
gcc/
* config/aarch64/aarch64.c (aarch64_ldn_stn_vectors): New function.
(aarch64_address_cost): Use it instead of testing for CImode and
XImode directly.
gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_1.c: New test.
I was working on a patch that needed to calculate the number of
modes in a particular class. It seemed better to have genmodes
generate this directly rather than do the kind of dance that
expmed.h had.
gcc/
* genmodes.c (emit_insn_modes_h): Define NUM_MODE_* macros.
* expmed.h (NUM_MODE_INT): Delete in favor of genmodes definitions.
(NUM_MODE_PARTIAL_INT, NUM_MODE_VECTOR_INT): Likewise.
* real.h (real_format_for_mode): Use NUM_MODE_FLOAT and
NUM_MODE_DECIMAL_FLOAT.
(REAL_MODE_FORMAT): Likewise.
This fixes an oversight that caused vectorized epilogues to have
versioning for niters applied.
2021-11-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_create_loop_vinfo): Add main_loop_info
parameter.
* tree-vect-loop.c (vect_create_loop_vinfo): Likewise. Set
LOOP_VINFO_ORIG_LOOP_INFO and conditionalize set of
LOOP_VINFO_NITERS_ASSUMPTIONS.
(vect_analyze_loop_1): Adjust.
(vect_analyze_loop): Move loop constraint setting and
SCEV/niter reset here from vect_create_loop_vinfo to perform
it only once.
(vect_analyze_loop_form): Move dumping of symbolic niters
here from vect_create_loop_vinfo.
Adds tracking of accesses relative to static chain into modref
load/stores analysis. This helps some Fortran benchmarks however it is still
quite limited. One problem is that we never discover functions with nested
functions as const, pure or not accessing global memory because it contains
__builtin_dward_cfa call which we believe to be non-pure.
Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow if there are
no complains and once periodic testers picks today modref changes.
Honza
gcc/ChangeLog:
* ipa-modref-tree.h (enum modref_special_parms): New enum.
(struct modref_access_node): update for special parms.
(struct modref_ref_node): Likewise.
(struct modref_parm_map): Likewise.
(struct modref_tree): Likewise.
* ipa-modref.c (dump_access): Likewise.
(get_access): Detect static chain.
(parm_map_for_arg): Take tree as arg instead of
stmt and index.
(merge_call_side_effects): Compute map for static chain.
(process_fnspec): Update.
(struct escape_point): Remove retslot_arg and static_chain_arg.
(analyze_parms): Update.
(compute_parm_map): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(modref_merge_call_site_flags): Update.
(ipa_merge_modref_summary_after_inlining): Update.
* tree-ssa-alias.c (modref_may_conflict): Handle static chain.
* ipa-modref-tree.c (test_merge): Update.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/modref-12.c: New test.
gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable
gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not
set.
gcc/testsuite/
* gcc.target/powerpc/vec-minmax-1.c: New test.
* gcc.target/powerpc/vec-minmax-2.c: Likewise.
gcc/ChangeLog:
PR tree-optimization/103077
* doc/invoke.texi (Options That Control Optimization):
Update documentation for -ftree-loop-vectorize and
-ftree-slp-vectorize which are enabled by default at -02.
> Note that this is not safe with -fsignaling-nans, so needs to be disabled
> for that option (if there isn't already logic somewhere with that effect),
> because the extend will convert a signaling NaN to quiet (raising
> "invalid"), but copysign won't, so this transformation could result in a
> signaling NaN being wrongly returned when the original code would never
> have returned a signaling NaN.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
gcc/ChangeLog
PR target/102464
* match.pd (Simplifcation (trunc)copysign((extend)a, (extend)b)
to .COPYSIGN (a, b)): Add !HONOR_SNANS.
a, b, c are same type as truncation type and has less precision than
extend type, the optimization is guarded under
flag_unsafe_math_optimizations.
gcc/ChangeLog:
PR target/102464
* match.pd: Simplify
(trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a, b,
c) under flag_unsafe_math_optimizations.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102464-fma.c: New test.
Now that things have stabilized, we can remove the old code.
I have left the hybrid threader in tree-ssa-threadedge, even though the
VRP threader was the only user, because we may need it as an interim
step for DOM threading removal.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-pass.h (make_pass_vrp_threader): Remove.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove
ASSERT_EXPR references.
* tree-ssa-threadedge.c (jt_state::register_equivs_stmt): Same.
* tree-vrp.c (vrp_folder::simplify_casted_conds): Same.
(execute_vrp): Same.
(class hybrid_threader): Remove.
(hybrid_threader::hybrid_threader): Remove.
(hybrid_threader::~hybrid_threader): Remove.
(hybrid_threader::before_dom_children): Remove.
(hybrid_threader::after_dom_children): Remove.
(execute_vrp_threader): Remove.
(class pass_vrp_threader): Remove.
(make_pass_vrp_threader): Remove.
While proofreading the code for handling EAF flags of !binds_to_current_def_p I
noticed that the interprocedural dataflow actually ignores the flag possibly
introducing wrong code on quite complex interposable functions in non-trivial
recursion cycles (or at ltrans partition boundary).
This patch unifies the flags changes to single place (remove_useless_eaf_flags)
and does extend modref_merge_call_site_flags to do the right thing.
lto-bootstrapped/regtested x86_64-linux. Plan to commit it today after bit
more testing (firefox/clang build).
gcc/ChangeLog:
* gimple.c (gimple_call_arg_flags): Use interposable_eaf_flags.
(gimple_call_retslot_flags): Likewise.
(gimple_call_static_chain_flags): Likewise.
* ipa-modref.c (remove_useless_eaf_flags): Do not remove everything for
NOVOPS.
(modref_summary::useful_p): Likewise.
(modref_summary_lto::useful_p): Likewise.
(analyze_parms): Do not give up on NOVOPS.
(analyze_function): When dumping report chnages in EAF flags
between IPA and local pass.
(modref_merge_call_site_flags): Compute implicit eaf flags
based on callee ecf_flags and fnspec; if the function does not
bind to current defs use interposable_eaf_flags.
(modref_propagate_flags_in_scc): Update.
* ipa-modref.h (interposable_eaf_flags): New function.
This patch forms the meat of the improvements for this patch series.
We develop a replacement for rs6000_expand_builtin and its supporting
functions, which are inefficient and difficult to maintain.
Differences between the old and new support in this patch include:
- Make use of the new builtin data structures, directly looking up
a function's information rather than searching for the function
multiple times;
- Test for enablement of builtins at expand time, to support #pragma
target changes within a compilation unit;
- Use the builtin function attributes (e.g., bif_is_cpu) to control
special handling;
- Refactor common code into one place; and
- Provide common error handling in one place for operands that are
restricted to specific values or ranges.
2021-11-07 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
forward decl.
(rs6000_invalid_new_builtin): New function.
(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
(rs6000_expand_ldst_mask): New function.
(new_cpu_expand_builtin): Likewise.
(elemrev_icode): Likewise.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
(new_mma_expand_builtin): Likewise.
(new_htm_spr_num): Likewise.
(new_htm_expand_builtin): Likewise.
(rs6000_expand_new_builtin): Likewise.
(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.
implement the (long promised) intraprocedural dataflow for
propagating eaf flags, so we can handle parameters that participate
in loops in SSA graphs. Typical example are acessors that walk linked
lists, for example.
I implemented dataflow using the standard iteration over BBs in RPO some time
ago, but did not like it becuase it had measurable compile time impact with
very small code quality effect. This is why I kept mainline to do the DFS walk
instead. The reason is that we care about flags of SSA names that corresponds
to parameters and those can be often determined from a small fraction of the
SSA graph so solving dataflow for all SSA names in a function is a waste.
This patch implements dataflow more carefully. The DFS walk is kept in place to
solve acyclic cases and discover the relevat part of SSA graph into new graph
(which is similar to one used for inter-procedrual dataflow - we only need to
know the edges and if the access is direct or derefernced). The RPO iterative
dataflow then works on this simplified graph.
This seems to be fast in practice. For GCC linktime we do dataflow for 4881
functions. Out of that 4726 finishes in one iteration, 144 in two and 10 in 3.
Overall 31979 functions are analysed, so we do dataflow only for bit over of
10% of cases. 131123 edges are visited by the solver. I measured no compile
time impact of this.
gcc/ChangeLog:
* ipa-modref.c (modref_lattice): Add do_dataflow,
changed and propagate_to fields.
(modref_lattice::release): Free propagate_to
(modref_lattice::merge): Do not give up early on unknown
lattice values.
(modref_lattice::merge_deref): Likewise.
(modref_eaf_analysis): Update toplevel comment.
(modref_eaf_analysis::analyze_ssa_name): Record postponned ssa names;
do optimistic dataflow initialization.
(modref_eaf_analysis::merge_with_ssa_name): Build dataflow graph.
(modref_eaf_analysis::propagate): New member function.
(analyze_parms): Update to new API of modref_eaf_analysis.