Commit Graph

189307 Commits

Author SHA1 Message Date
Richard Biener d136035016 rtl-optimization/103075 - avoid ICEing on unfolded int-to-float converts
The following avoids asserting in exact_int_to_float_conversion_p that
the argument is not constant which it in fact can be with
-frounding-math and inexact int-to-float conversions.  Say so.

2021-11-04  Richard Biener  <rguenther@suse.de>

	PR rtl-optimization/103075
	* simplify-rtx.c (exact_int_to_float_conversion_p): Return
	false for a VOIDmode operand.

	* gcc.dg/pr103075.c: New testcase.
2021-11-04 13:33:19 +01:00
Richard Sandiford d43fc1df73 aarch64: Move more code into aarch64_vector_costs
This patch moves more code into aarch64_vector_costs and reuses
some of the information that is now available in the base class.

I'm planing to significantly rework this code, with more hooks
into the vectoriser, but this seemed worth doing as a first step.

gcc/
	* config/aarch64/aarch64.c (aarch64_vector_costs): Make member
	variables private and add "m_" to their names.  Remove is_loop.
	(aarch64_record_potential_advsimd_unrolling): Replace with...
	(aarch64_vector_costs::record_potential_advsimd_unrolling): ...this.
	(aarch64_analyze_loop_vinfo): Replace with...
	(aarch64_vector_costs::analyze_loop_vinfo): ...this.
	Move initialization of (m_)vec_flags to add_stmt_cost.
	(aarch64_analyze_bb_vinfo): Delete.
	(aarch64_count_ops): Replace with...
	(aarch64_vector_costs::count_ops): ...this.
	(aarch64_vector_costs::add_stmt_cost): Set m_vec_flags,
	using m_costing_for_scalar to test whether we're costing
	scalar or vector code.
	(aarch64_adjust_body_cost_sve): Replace with...
	(aarch64_vector_costs::adjust_body_cost_sve): ...this.
	(aarch64_adjust_body_cost): Replace with...
	(aarch64_vector_costs::adjust_body_cost): ...this.
	(aarch64_vector_costs::finish_cost): Use m_vinfo instead of is_loop.
2021-11-04 12:31:17 +00:00
Richard Sandiford 6239dd0512 vect: Convert cost hooks to classes
The current vector cost interface has a quite a bit of redundancy
built in.  Each target that defines its own hooks has to replicate
the basic unsigned[3] management.  Currently each target also
duplicates the cost adjustment for inner loops.

This patch instead defines a vector_costs class for holding
the scalar or vector cost and allows targets to subclass it.
There is then only one costing hook: to create a new costs
structure of the appropriate type.  Everything else can be
virtual functions, with common concepts implemented in the
base class rather than in each target's derivation.

This might seem like excess C++-ification, but it shaves
~100 LOC.  I've also got some follow-on changes that become
significantly easier with this patch.  Maybe it could help
with things like weighting blocks based on frequency too.

This will clash with Andre's unrolling patches.  His patches
have priority so this patch should queue behind them.

The x86 and rs6000 parts fully convert to a self-contained class.
The equivalent aarch64 changes are more complex, so this patch
just does the bare minimum.  A later patch will rework the
aarch64 bits.

gcc/
	* target.def (targetm.vectorize.init_cost): Replace with...
	(targetm.vectorize.create_costs): ...this.
	(targetm.vectorize.add_stmt_cost): Delete.
	(targetm.vectorize.finish_cost): Likewise.
	(targetm.vectorize.destroy_cost_data): Likewise.
	* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
	(TARGET_VECTORIZE_CREATE_COSTS): ...this.
	(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	* doc/tm.texi: Regenerate.
	* tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
	parameter.
	(vec_info::target_cost_data): Change from a void * to a vector_costs *.
	(vector_costs): New class.
	(init_cost): Take a vec_info and return a vector_costs.
	(dump_stmt_cost): Remove data parameter.
	(add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
	(add_stmt_costs): Likewise.
	(finish_cost): Replace data parameter with a vector_costs.
	(destroy_cost_data): Delete.
	* tree-vectorizer.c (dump_stmt_cost): Remove data argument and
	don't print it.
	(vec_info::vec_info): Remove the target_cost_data parameter and
	initialize the member variable to null instead.
	(vec_info::~vec_info): Delete target_cost_data instead of calling
	destroy_cost_data.
	(vector_costs::add_stmt_cost): New function.
	(vector_costs::finish_cost): Likewise.
	(vector_costs::record_stmt_cost): Likewise.
	(vector_costs::adjust_cost_for_freq): Likewise.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
	call to vec_info::vec_info.
	(vect_compute_single_scalar_iteration_cost): Update after above
	changes to costing interface.
	(vect_analyze_loop_operations): Likewise.
	(vect_estimate_min_profitable_iters): Likewise.
	(vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
	at the start_over point, where it needs to be recreated after
	trying without slp.  Update retry code accordingly.
	* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
	to vec_info::vec_info.
	(vect_slp_analyze_operation): Update after above changes to costing
	interface.
	(vect_bb_vectorization_profitable_p): Likewise.
	* targhooks.h (default_init_cost): Replace with...
	(default_vectorize_create_costs): ...this.
	(default_add_stmt_cost): Delete.
	(default_finish_cost, default_destroy_cost_data): Likewise.
	* targhooks.c (default_init_cost): Replace with...
	(default_vectorize_create_costs): ...this.
	(default_add_stmt_cost): Delete, moving logic to vector_costs instead.
	(default_finish_cost, default_destroy_cost_data): Delete.
	* config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
	vector_costs.  Add a constructor.
	(aarch64_init_cost): Replace with...
	(aarch64_vectorize_create_costs): ...this.
	(aarch64_add_stmt_cost): Replace with...
	(aarch64_vector_costs::add_stmt_cost): ...this.  Use record_stmt_cost
	to adjust the cost for inner loops.
	(aarch64_finish_cost): Replace with...
	(aarch64_vector_costs::finish_cost): ...this.
	(aarch64_destroy_cost_data): Delete.
	(TARGET_VECTORIZE_INIT_COST): Replace with...
	(TARGET_VECTORIZE_CREATE_COSTS): ...this.
	(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	* config/i386/i386.c (ix86_vector_costs): New structure.
	(ix86_init_cost): Replace with...
	(ix86_vectorize_create_costs): ...this.
	(ix86_add_stmt_cost): Replace with...
	(ix86_vector_costs::add_stmt_cost): ...this.  Use adjust_cost_for_freq
	to adjust the cost for inner loops.
	(ix86_finish_cost, ix86_destroy_cost_data): Delete.
	(TARGET_VECTORIZE_INIT_COST): Replace with...
	(TARGET_VECTORIZE_CREATE_COSTS): ...this.
	(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): Replace with...
	(TARGET_VECTORIZE_CREATE_COSTS): ...this.
	(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
	(TARGET_VECTORIZE_FINISH_COST): Likewise.
	(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
	(rs6000_cost_data): Inherit from vector_costs.
	Add a constructor.  Drop loop_info, cost and costing_for_scalar
	in favor of the corresponding vector_costs member variables.
	Add "m_" to the names of the remaining member variables and
	initialize them.
	(rs6000_density_test): Replace with...
	(rs6000_cost_data::density_test): ...this.
	(rs6000_init_cost): Replace with...
	(rs6000_vectorize_create_costs): ...this.
	(rs6000_update_target_cost_per_stmt): Replace with...
	(rs6000_cost_data::update_target_cost_per_stmt): ...this.
	(rs6000_add_stmt_cost): Replace with...
	(rs6000_cost_data::add_stmt_cost): ...this.  Use adjust_cost_for_freq
	to adjust the cost for inner loops.
	(rs6000_adjust_vect_cost_per_loop): Replace with...
	(rs6000_cost_data::adjust_vect_cost_per_loop): ...this.
	(rs6000_finish_cost): Replace with...
	(rs6000_cost_data::finish_cost): ...this.  Group loop code
	into a single if statement and pass the loop_vinfo down to
	subroutines.
	(rs6000_destroy_cost_data): Delete.
2021-11-04 12:31:17 +00:00
Martin Liska af976d90fa libsanitizer: update LOCAL_PATCHES
libsanitizer/ChangeLog:

	* LOCAL_PATCHES: Update git revision.
2021-11-04 13:26:58 +01:00
H.J. Lu 65ade6a34c libsanitizer: Apply local patches 2021-11-04 13:26:17 +01:00
Martin Liska 0cedf1fb76 lisanitizer: Apply autoreconf. 2021-11-04 13:26:05 +01:00
Martin Liska cb0437584b libsanitizer: merge from master (c86b4503a94c277534ce4b9a5c015a6ac151b98a). 2021-11-04 13:24:53 +01:00
Aldy Hernandez bb27f5e9ec Convert arrays in ssa pointer_equiv_analyzer to auto_vec's.
The problem in this PR is an off-by-one bug.  We should've allocated
num_ssa_names + 1.  However, in fixing this, I noticed that
num_ssa_names can change between queries, so I have replaced the array
with an auto_vec and added code to grow the vector as necessary.

Tested on x86-64 Linux.

	PR tree-optimization/103062

gcc/ChangeLog:

	PR tree-optimization/103062
	* value-pointer-equiv.cc (ssa_equiv_stack::ssa_equiv_stack):
	Increase size of allocation by 1.
	(ssa_equiv_stack::push_replacement): Grow as needed.
	(ssa_equiv_stack::get_replacement): Same.
	(pointer_equiv_analyzer::pointer_equiv_analyzer): Same.
	(pointer_equiv_analyzer::~pointer_equiv_analyzer): Remove delete.
	(pointer_equiv_analyzer::set_global_equiv): Grow as needed.
	(pointer_equiv_analyzer::get_equiv): Same.
	(pointer_equiv_analyzer::get_equiv_expr): Remove const.
	* value-pointer-equiv.h (class pointer_equiv_analyzer): Remove
	const markers.  Use auto_vec instead of tree *.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr103062.c: New test.
2021-11-04 11:48:04 +01:00
Jonathan Wakely a45d577b2b libstdc++: Refactor emplace-like functions in std::variant
libstdc++-v3/ChangeLog:

	* include/std/variant (__detail::__variant::__emplace): New
	function template.
	(_Copy_assign_base::operator=): Reorder conditions to match
	bulleted list of effects in the standard. Use __emplace instead
	of _M_reset followed by _Construct.
	(_Move_assign_base::operator=): Likewise.
	(__construct_by_index): Remove.
	(variant::emplace): Use __emplace instead of _M_reset followed
	by __construct_by_index.
	(variant::swap): Hoist valueless cases out of visitor. Use
	__emplace to replace _M_reset followed by _Construct.
2021-11-04 09:36:10 +00:00
Jonathan Wakely 30ab6d9e43 libstdc++: Optimize std::variant traits and improve diagnostics
By defining additional partial specializations of _Nth_type we can
reduce the number of recursive instantiations needed to get from N to 0.
We can also use _Nth_type in variant_alternative, to take advantage of
that new optimization.

By adding a static_assert to variant_alternative we get a nicer error
than 'invalid use of incomplete type'.

By defining partial specializations of std::variant_size_v for the
common case we can avoid instantiating the std::variant_size class
template.

The __tuple_count class template and __tuple_count_v variable template
can be simplified to a single variable template, __count.

By adding a deleted constructor to the _Variant_union primary template
we can (very slightly) improve diagnostics for invalid attempts to
construct a std::variant with an out-of-range index. Instead of a
confusing error about "too many initializers for ..." we get a call to a
deleted function.

By using _Nth_type instead of variant_alternative (for cv-unqualified
variant types) we avoid instantiating variant_alternative.

By adding deleted overloads of variant::emplace we get better
diagnostics for emplace<invalid-index> or emplace<invalid-type>. Instead
of getting errors explaining why each of the four overloads wasn't
valid, we just get one error about calling a deleted function.

libstdc++-v3/ChangeLog:

	* include/std/variant (_Nth_type): Define partial
	specializations to reduce number of instantiations.
	(variant_size_v): Define partial specializations to avoid
	instantiations.
	(variant_alternative): Use _Nth_type. Add static assert.
	(__tuple_count, __tuple_count_v): Replace with ...
	(__count): New variable template.
	(_Variant_union): Add deleted constructor.
	(variant::__to_type): Use _Nth_type.
	(variant::emplace): Use _Nth_type. Add deleted overloads for
	invalid types and indices.
2021-11-04 09:36:09 +00:00
Jonathan Wakely 7551a99574 libstdc++: Fix handling of const types in std::variant [PR102912]
Prior to r12-4447 (implementing P2231R1 constexpr changes) we didn't
construct the correct member of the union in __variant_construct_single,
we just plopped an object in the memory occupied by the union:

  void* __storage = std::addressof(__lhs._M_u);
  using _Type = remove_reference_t<decltype(__rhs_mem)>;
  ::new (__storage) _Type(std::forward<decltype(__rhs_mem)>(__rhs_mem));

We didn't care whether we had variant<int, const int>, we would just
place an int (or const int) into the storage, and then set the _M_index
to say which one it was.

In the new constexpr-friendly code we use std::construct_at to construct
the union object, which constructs the active member of the right type.
But now we need to know exactly the right type. We have to distinguish
between alternatives of type int and const int, and we have to be able
to find a const int (or const std::string, as in the OP) among the
alternatives. So my change from remove_reference_t<decltype(__rhs_mem)>
to remove_cvref_t<_Up> was wrong. It strips the const from const int,
and then we can't find the index of the const int alternative.

But just using remove_reference_t doesn't work either. When the copy
assignment operator of std::variant<int> uses __variant_construct_single
it passes a const int& as __rhs_mem, but if we don't strip the const
then we try to find const int among the alternatives, and *that* fails.
Similarly for the copy constructor, which also uses a const int& as the
initializer for a non-const int alternative.

The root cause of the problem is that __variant_construct_single doesn't
know the index of the type it's supposed to construct, and the new
_Variant_storage::__index_of<_Type> helper doesn't work if __rhs_mem and
the alternative being constructed have different const-qualification. We
need to replace __variant_construct_single with something that knows the
index of the alternative being constructed. All uses of that function do
actually know the index, but that context is lost by the time we call
__variant_construct_single. This patch replaces that function and
__variant_construct, inlining their effects directly into the callers.

libstdc++-v3/ChangeLog:

	PR libstdc++/102912
	* include/std/variant (_Variant_storage::__index_of): Remove.
	(__variant_construct_single): Remove.
	(__variant_construct): Remove.
	(_Copy_ctor_base::_Copy_ctor_base(const _Copy_ctor_base&)): Do
	construction directly instead of using __variant_construct.
	(_Move_ctor_base::_Move_ctor_base(_Move_ctor_base&&)): Likewise.
	(_Move_ctor_base::_M_destructive_move()): Remove.
	(_Move_ctor_base::_M_destructive_copy()): Remove.
	(_Copy_assign_base::operator=(const _Copy_assign_base&)): Do
	construction directly instead of using _M_destructive_copy.
	(variant::swap): Do construction directly instead of using
	_M_destructive_move.
	* testsuite/20_util/variant/102912.cc: New test.
2021-11-04 09:36:09 +00:00
Richard Biener fa62db42b9 VN/PRE TLC
This removes an always true parameter of vn_nary_op_insert_into and moves
valueization to the two callers of vn_nary_op_compute_hash instead of doing it
therein where this function name does not suggest such thing.
Also remove extra valueization from PRE phi-translation.

2021-11-03  Richard Biener  <rguenther@suse.de>

	* tree-ssa-sccvn.c (vn_nary_op_insert_into): Remove always
	true parameter and inline valueization.
	(vn_nary_op_lookup_1): Inline valueization from ...
	(vn_nary_op_compute_hash): ... here and remove it here.
	* tree-ssa-pre.c (phi_translate_1): Do not valueize
	before vn_nary_lookup_pieces.
	(get_representative_for): Mark created SSA representatives
	as visited.
2021-11-04 10:15:36 +01:00
Jiufu Guo f75e56f46d Update dg-require-effective-target for pr101145 cases
For test cases pr101145*.c, some types are not able to be
vectorized on some targets. This patch updates
dg-require-effective-target according to test cases.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr101145_1.c: Update case.
	* gcc.dg/vect/pr101145_2.c: Update case.
	* gcc.dg/vect/pr101145_3.c: Update case.
2021-11-04 17:13:14 +08:00
Martin Liska b9003cf734 Disable warning for an ASAN test-case.
gcc/testsuite/ChangeLog:

	* g++.dg/asan/asan_test.C: Disable one warning.
2021-11-04 09:54:00 +01:00
Richard Sandiford 518f865f4b simplify-rtx: Fix vec_select index check
Vector lane indices follow memory (array) order, so lane 0 corresponds
to the high element rather than the low element on big-endian targets.

This was causing quite a few execution failures on aarch64_be,
such as gcc.c-torture/execute/pr47538.c.

gcc/
	* simplify-rtx.c (simplify_context::simplify_gen_vec_select): Assert
	that the operand has a vector mode.  Use subreg_lowpart_offset
	to test whether an index corresponds to the low part.

gcc/testsuite/
	* gcc.dg/rtl/aarch64/big-endian-cse-1.c: New test.
2021-11-04 08:28:44 +00:00
Richard Sandiford 95318d469f Fix RTL frontend handling of const_vectors
The RTL frontend makes sure that CONST_INTs use shared rtxes where
appropriate.  We should do the same thing for CONST_VECTORs,
reusing CONST0_RTX, CONST1_RTX and CONSTM1_RTX.  This also has
the effect of setting CONST_VECTOR_NELTS_PER_PATTERN and
CONST_VECTOR_NPATTERNS.

While looking at where to add that, I noticed we had some dead #includes
in read-rtl.c.  Some of the stuff that read-rtl-function.c does was once
in that file instead.

gcc/
	* read-rtl.c: Remove dead !GENERATOR_FILE block.
	* read-rtl-function.c (function_reader::consolidate_singletons):
	Generate canonical CONST_VECTORs.
2021-11-04 08:28:44 +00:00
liuhongt bc9c8e5f8a Extend vternlog define_insn_and_split to memory_operand to enable more optimziation.
gcc/ChangeLog:

	PR target/101989
	* config/i386/predicates.md (reg_or_notreg_operand): Rename to ..
	(regmem_or_bitnot_regmem_operand): .. and extend to handle
	memory_operand.
	* config/i386/sse.md (*<avx512>_vpternlog<mode>_1): Force_reg
	the operands which are required to be register_operand.
	(*<avx512>_vpternlog<mode>_2): Ditto.
	(*<avx512>_vpternlog<mode>_3): Ditto.
	(*<avx512>_vternlog<mode>_all): Disallow embeded broadcast for
	vector HFmodes since it's not a real AVX512FP16 instruction.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr101989-3.c: New test.
2021-11-04 16:09:52 +08:00
liuhongt 22ce7382fc Simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a,b).
a and b are same type as the truncation type and has less precision
than extend type.

gcc/ChangeLog:

	PR target/102464
	* match.pd: simplify (trunc)copysign((extend)a, (extend)b) to
	.COPYSIGN (a,b) when a and b are same type as the truncation
	type and has less precision than extend type.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr102464-copysign-1.c: New test.
2021-11-04 16:09:46 +08:00
Richard Biener d0d428c4ce Update TARGET_MEM_REF documentation
This updates the internals manual documentation of TARGET_MEM_REF
and amends MEM_REF.  The former was seriously out of date.

2021-11-04  Richard Biener  <rguenther@suse.de>

gcc/
	* doc/generic.texi: Update TARGET_MEM_REF and MEM_REF
	documentation.
2021-11-04 08:41:58 +01:00
Hongyu Wang 3fd0723f0a i386: Auto vectorize sdot_prod, usdot_prod with VNNI instruction.
AVX512VNNI/AVXVNNI has vpdpwssd for HImode, vpdpbusd for QImode, so
Adjust HImode sdot_prod expander and add QImode usdot_prod expander
to enhance vectorization for dotprod.

gcc/ChangeLog:

	* config/i386/sse.md (VI2_AVX512VNNIBW): New mode iterator.
	(VI1_AVX512VNNI): Likewise.
	(SDOT_VPDP_SUF): New mode_attr.
	(VI1SI): Likewise.
	(vi1si): Likewise.
	(sdot_prod<mode>): Use VI2_AVX512F iterator, expand to
	vpdpwssd when VNNI targets available.
	(usdot_prod<mode>): New expander for vector QImode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vnni-auto-vectorize-1.c: New test.
	* gcc.target/i386/vnni-auto-vectorize-2.c: Ditto.
2021-11-04 14:41:30 +08:00
Hongyu Wang 7fcc22dae7 i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.
_tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by
macro, so the parameters should be wrapped by parentheses to accept
expressions.

gcc/ChangeLog:

	* config/i386/amxtileintrin.h (_tile_loadd_internal): Add
	parentheses to base and stride.
	(_tile_stream_loadd_internal): Likewise.
	(_tile_stored_internal): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/amxtile-3.c: New test.
2021-11-04 13:01:16 +08:00
Marek Polacek cd389e5f94 testsuite: Fix g++.dg/opt/pr102970.C
This test uses a generic lambda, only available since C++14, so don't
run it in earlier modes.

gcc/testsuite/ChangeLog:

	* g++.dg/opt/pr102970.C: Only run in C++14 and up.
2021-11-03 20:40:28 -04:00
GCC Administrator 18ae471f7b Daily bump. 2021-11-04 00:16:32 +00:00
Maciej W. Rozycki c79399c7e1 MAINTAINERS: Clarify the policy WRT the Write After Approval list
* MAINTAINERS: Clarify the policy WRT the Write After Approval
	list.
2021-11-03 17:05:48 +00:00
Maciej W. Rozycki a31056e919 RISC-V: Fix register class subset checks for CLASS_MAX_NREGS
Fix the register class subset checks in the determination of the maximum
number of consecutive registers needed to hold a value of a given mode.

The number depends on whether a register is a general-purpose or a
floating-point register, so check whether the register class requested
is a subset (argument 1 to `reg_class_subset_p') rather than superset
(argument 2) of GR_REGS or FP_REGS class respectively.

	gcc/
	* config/riscv/riscv.c (riscv_class_max_nregs): Swap the
	arguments to `reg_class_subset_p'.
2021-11-03 17:05:48 +00:00
Jonathan Wakely 1e7a269856 libstdc++: Fix regression in std::list::sort [PR66742]
The standard does not require const-correct comparisons in list::sort.

libstdc++-v3/ChangeLog:

	PR libstdc++/66742
	* include/bits/list.tcc (list::sort): Use mutable iterators for
	comparisons.
	* include/bits/stl_list.h (_Scratch_list::_Ptr_cmp): Likewise.
	* testsuite/23_containers/list/operations/66742.cc: Check
	non-const comparisons.
2021-11-03 15:15:27 +00:00
Joseph Myers 600dcd74b8 c: Fold implicit integer-to-floating conversions in static initializers with -frounding-math [PR103031]
Recent fixes to avoid inappropriate folding of some conversions to
floating-point types with -frounding-math also prevented such folding
in C static initializers, when folding (in the default rounding mode,
exceptions discarded) is required for correctness.

Folding for static initializers is handled via functions in
fold-const.c calling START_FOLD_INIT and END_FOLD_INIT to adjust flags
such as flag_rounding_math that should not apply in static initializer
context, but no such function was being called for the folding of
these implicit conversions to the type of the object being
initialized, only for explicit conversions as part of the initializer.

Arrange for relevant folding (a fold call in convert, in particular)
to use this special initializer handling (via a new fold_init
function, in particular).

Because convert is used by language-independent code but defined in
each front end, this isn't as simple as just adding a new default
argument to it.  Instead, I added a new convert_init function; that
then gets called by c-family code, and C and C++ need convert_init
implementations (the C++ one does nothing different from convert and
will never actually get called because the new convert_and_check
argument will never be true from C++), but other languages don't.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
	PR c/103031
	* fold-const.c (fold_init): New function.
	* fold-const.h (fold_init): New prototype.

gcc/c-family/
	PR c/103031
	* c-common.c (convert_and_check): Add argument init_const.  Call
	convert_init if init_const.
	* c-common.h (convert_and_check): Update prototype.
	(convert_init): New prototype.

gcc/c/
	PR c/103031
	* c-convert.c (c_convert): New function, based on convert.
	(convert): Make into wrapper of c_convert.
	(convert_init): New function.
	* c-typeck.c (enum impl_conv): Add ic_init_const.
	(convert_for_assignment): Handle ic_init_const like ic_init.  Add
	new argument to convert_and_check call.
	(digest_init): Pass ic_init_const to convert_for_assignment for
	initializers required to be constant.

gcc/cp/
	PR c/103031
	* cvt.c (convert_init): New function.

gcc/testsuite/
	PR c/103031
	* gcc.dg/init-rounding-math-1.c: New test.
2021-11-03 14:59:22 +00:00
Andrew MacLeod 502ffb1f38 Switch vrp2 to ranger.
This patch flips the default for the VRP2 pass to execute ranger vrp rather
than the assert_expr version of VRP.

	* params.opt (param_vrp2_mode): Make ranger the default for VRP2.
2021-11-03 10:37:24 -04:00
Andrew MacLeod 1410b20801 Testcase adjustments for pass vrp1.
Unify testcases for the vrp1 pass so they will work with the output from either
VRP or ranger.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr23744.c: Tweak output checks.
	* gcc.dg/tree-ssa/vrp07.c: Ditto.
	* gcc.dg/tree-ssa/vrp08.c: Ditto.
	* gcc.dg/tree-ssa/vrp09.c: Ditto.
	* gcc.dg/tree-ssa/vrp20.c: Ditto.
	* gcc.dg/tree-ssa/vrp92.c: Ditto.
	* jit.dg/test-sum-of-squares.c: Ditto.
2021-11-03 10:13:32 -04:00
Andrew MacLeod 6d936684fc For ranges, PHIs don't need to process arg == def.
If an argument of a phi is the same as the DEF of the phi, then the range
on the incoming edge doesn't need to be taken into account since it can't
be anything other than itself.

	* gimple-range-fold.cc (fold_using_range::range_of_phi): Don't import
	a range from edge if arg == phidef.
2021-11-03 10:13:32 -04:00
Andrew MacLeod b18394ce15 Check for constant builtin value first.
The original code imported from EVRP for evaluating built_in_constant_p
didn't check to see if the value was a constant before checking the
inlining flag.  Now we check for a constant first.

	* gimple-range-fold.cc (fold_using_range::range_of_builtin_call): Test
	for constant before any other processing.
2021-11-03 10:13:32 -04:00
Andrew MacLeod 309bb7ff6e Fix --param=ranger-debug=all to include a trace.
A recent change made each debug flag its own value, but the 'all' value was
not adjusted properly and 'trace' was left out.

	* flag-types.h (RANGER_DEBUG_ALL): Fix values.
2021-11-03 10:13:32 -04:00
Andrew MacLeod fc40767520 Provide some context to folding via ranger.
Provide an internal mechanism to supply context to range_of_expr for calls
to ::fold_stmt.

	* gimple-range.cc (gimple_ranger::gimple_ranger): Initialize current_bb.
	(gimple_ranger::range_of_expr): Pick up range_on_entry when there is
	no explcit context and current_bb is set.
	(gimple_ranger::fold_stmt): New.
	* gimple-range.h (current_bb, fold_stmt): New.
	* tree-vrp.c (rvrp_folder::fold_stmt): Call ranger's fold_stmt.
2021-11-03 10:01:21 -04:00
Richard Biener 1967fd8f21 tree-optimization/102970 - remap cliques when translating over backedges
The following makes sure to remap (or rather drop for simplicity)
dependence info encoded in MR_DEPENDENCE_CLIQUE when PRE PHI translation
translates a reference over a backedge since that ends up interleaving
two different loop iterations which boils down to two different
inline copies.

2021-11-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/102970
	* tree-ssa-pre.c (phi_translate_1): Drop clique and base
	when translating a MEM_REF over a backedge.

	* g++.dg/opt/pr102970.C: New testcase.
2021-11-03 15:00:10 +01:00
Philipp Tomsich 67b0d47e20 aarch64: enable Ampere-1 CPU
This adds support and a basic turning model for the Ampere Computing
"Ampere-1" CPU.

The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).

This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.

gcc/ChangeLog:

	* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1	core.
	* config/aarch64/aarch64-tune.md: Regenerate.
	* config/aarch64/aarch64-cost-tables.h: Add extra costs for Ampere-1.
	* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
	* doc/invoke.texi: Add documentation for Ampere-1 core.
2021-11-03 14:59:19 +01:00
Wilco Dijkstra a195c7270e AArch64: Improve GOT addressing
Improve GOT addressing by treating the instructions as a pair.  This reduces
register pressure and improves code quality significantly.  SPECINT2017
improves by 0.6% with -fPIC and codesize is 0.73% smaller.  Perlbench has
0.9% smaller codesize, 1.5% fewer executed instructions and is 1.8% faster
on Neoverse N1.

ChangeLog:
2021-11-02  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.md (movsi): Add alternative for GOT accesses.
	(movdi): Likewise.
	(ldr_got_small_<mode>): Remove pattern.
	(ldr_got_small_sidi): Likewise.
	* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Keep
	GOT accesses as moves.
	(aarch64_print_operand): Correctly print got_lo12 in L specifier.
	(aarch64_mov_operand_p): Make GOT accesses valid move operands.
	* config/aarch64/constraints.md: Add new constraint Usw for GOT access.
2021-11-03 13:46:05 +00:00
Martin Liska 4096eb50d1 gcov: Remove dead variable.
gcc/ChangeLog:

	* gcov.c (read_line): Remove dead variable.
2021-11-03 14:30:01 +01:00
Martin Liska 2d01bef2f2 Rename predicate class to ipa_predicate
PR bootstrap/102828

gcc/ChangeLog:

	* ipa-fnsummary.c (edge_predicate_pool): Rename predicate class to ipa_predicate.
	(ipa_fn_summary::account_size_time): Likewise.
	(edge_set_predicate): Likewise.
	(set_hint_predicate): Likewise.
	(add_freqcounting_predicate): Likewise.
	(evaluate_conditions_for_known_args): Likewise.
	(evaluate_properties_for_edge): Likewise.
	(remap_freqcounting_preds_after_dup): Likewise.
	(ipa_fn_summary_t::duplicate): Likewise.
	(set_cond_stmt_execution_predicate): Likewise.
	(set_switch_stmt_execution_predicate): Likewise.
	(compute_bb_predicates): Likewise.
	(will_be_nonconstant_expr_predicate): Likewise.
	(will_be_nonconstant_predicate): Likewise.
	(phi_result_unknown_predicate): Likewise.
	(predicate_for_phi_result): Likewise.
	(analyze_function_body): Likewise.
	(compute_fn_summary): Likewise.
	(summarize_calls_size_and_time): Likewise.
	(estimate_calls_size_and_time): Likewise.
	(ipa_call_context::estimate_size_and_time): Likewise.
	(remap_edge_summaries): Likewise.
	(remap_freqcounting_predicate): Likewise.
	(ipa_merge_fn_summary_after_inlining): Likewise.
	(ipa_update_overall_fn_summary): Likewise.
	(read_ipa_call_summary): Likewise.
	(inline_read_section): Likewise.
	* ipa-fnsummary.h (struct ipa_freqcounting_predicate): Likewise.
	* ipa-predicate.c (predicate::add_clause): Likewise.
	(ipa_predicate::add_clause): Likewise.
	(predicate::or_with): Likewise.
	(ipa_predicate::or_with): Likewise.
	(predicate::evaluate): Likewise.
	(ipa_predicate::evaluate): Likewise.
	(predicate::probability): Likewise.
	(ipa_predicate::probability): Likewise.
	(dump_condition): Likewise.
	(dump_clause): Likewise.
	(predicate::dump): Likewise.
	(ipa_predicate::dump): Likewise.
	(predicate::debug): Likewise.
	(ipa_predicate::debug): Likewise.
	(predicate::remap_after_duplication): Likewise.
	(ipa_predicate::remap_after_duplication): Likewise.
	(predicate::remap_after_inlining): Likewise.
	(ipa_predicate::remap_after_inlining): Likewise.
	(predicate::stream_in): Likewise.
	(ipa_predicate::stream_in): Likewise.
	(predicate::stream_out): Likewise.
	(ipa_predicate::stream_out): Likewise.
	(add_condition): Likewise.
	* ipa-predicate.h (class predicate): Likewise.
	(class ipa_predicate): Likewise.
	(add_condition): Likewise.
2021-11-03 14:00:46 +01:00
Richard Biener 73658e70d9 Make sbitmap bitmap_set_bit and bitmap_clear_bit return changed state
The following adjusts the sbitmap bitmap_set_bit and bitmap_clear_bit
APIs to match that of bitmap by returning a bool indicating whether
the bitmap was changed.  I've also changed bitmap_bit_p to return
a bool rather than an int and made use of the sbitmap bitmap_set_bit
API change in one place.

2021-11-03  Richard Biener  <rguenther@suse.de>

	* bitmap.h (bitmap_bit_p): Change the return type to bool.
	* bitmap.c (bitmap_bit_p): Likewise.
	* sbitmap.h (bitmap_bit_p): Likewise.
	(bitmap_set_bit): Return whether the bit changed.
	(bitmap_clear_bit): Likewise.
	* tree-ssa.c (verify_vssa): Make use of the changed state
	from bitmap_set_bit.
2021-11-03 11:14:22 +01:00
Richard Biener c081d0a3b0 middle-end/103033 - drop native_interpret_expr with .DEFERRED_INIT expansion
This drops the use of native_interpret_expr which can fail even though
can_native_interpret_expr_p returns true in favor of simply folding
the VIEW_CONVERT_EXPR punning.

2021-11-03  Richard Biener  <rguenther@suse.de>

	PR middle-end/103033
	* internal-fn.c (expand_DEFERRED_INIT): Elide the
	native_interpret_expr path in favor of folding the
	VIEW_CONVERT_EXPR generated when punning the RHS.
2021-11-03 11:12:46 +01:00
Stefan Schulze Frielinghaus ea2ab805ac IBM Z: Free bbs in s390_loop_unroll_adjust
gcc/ChangeLog:

	* config/s390/s390.c (s390_loop_unroll_adjust): In case of early
	exit free bbs.
2021-11-03 09:39:27 +01:00
Jan Hubicka 62af7d9402 Fix wrong code caulsed by retslot EAF flags propagation [PR103040]
Fixe (quite nasty) thinko in how I propagate EAF flags from callee
to caller.  In this case some flags needs to be changed.  In particular
  - EAF_NOT_RETURNED in callee does not really mean EAF_NOT_RETURNED in caller
    since we speak of different return values
  - if callee escapes the parametr, we caller may return it
  - for retslot the rewritting is even bit more funny, since escaping to of
    return slot to return slot is not really an escape, however escape of
    argument to itself is.

This patch should correct all of the cases above and does fix the testcase from PR103040.

Bootstrapped/regtested x86_64 with all languages.  Also lto-bootstrapped.

gcc/ChangeLog:

	PR ipa/103040
	* ipa-modref.c (callee_to_caller_flags): New function.
	(modref_eaf_analysis::analyze_ssa_name): Use it.
	(ipa_merge_modref_summary_after_inlining): Fix whitespace.

gcc/testsuite/ChangeLog:

	* g++.dg/torture/pr103040.C: New test.
2021-11-03 01:45:47 +01:00
GCC Administrator b4df2dd3f4 Daily bump. 2021-11-03 00:16:30 +00:00
Jonathan Wakely 4f032929ac libstdc++: Add some noexcept to std::valarray
libstdc++-v3/ChangeLog:

	* include/std/valarray (valarray::valarray()): Add noexcept.
	(valarray::operator[]): Likewise.
2021-11-03 00:16:01 +00:00
Jan Hubicka 1fefb6cf62 Revert accidental commit.
2021-11-02  Jan Hubicka  <hubicka@ucw.cz>

	* ipa-modref.c (modref_eaf_analysis::analyze_ssa_name): Revert
	accidental commit.
2021-11-02 23:09:27 +01:00
Roger Sayle 2a83259f83 x86_64: Improved implementation of TImode rotations.
This simple patch improves the implementation of 128-bit (TImode)
rotations on x86_64 (a missed optimization opportunity spotted
during the recent V1TImode improvements).

Currently, the function:

unsigned __int128 rotrti3(unsigned __int128 x, unsigned int i) {
  return (x >> i) | (x << (128-i));
}

produces:

rotrti3:
        movq    %rsi, %r8
        movq    %rdi, %r9
        movl    %edx, %ecx
        movq    %rdi, %rsi
        movq    %r9, %rax
        movq    %r8, %rdx
        movq    %r8, %rdi
        shrdq   %r8, %rax
        shrq    %cl, %rdx
        xorl    %r8d, %r8d
        testb   $64, %cl
        cmovne  %rdx, %rax
        cmovne  %r8, %rdx
        negl    %ecx
        andl    $127, %ecx
        shldq   %r9, %rdi
        salq    %cl, %rsi
        xorl    %r9d, %r9d
        testb   $64, %cl
        cmovne  %rsi, %rdi
        cmovne  %r9, %rsi
        orq     %rdi, %rdx
        orq     %rsi, %rax
        ret

with this patch, GCC will now generate the much nicer:
rotrti3:
        movl    %edx, %ecx
        movq    %rdi, %rdx
        shrdq   %rsi, %rdx
        shrdq   %rdi, %rsi
        andl    $64, %ecx
        movq    %rdx, %rax
        cmove   %rsi, %rdx
        cmovne  %rsi, %rax
        ret

Even I wasn't expecting the optimizer's choice of the final three
instructions; a thing of beauty.  For rotations larger than 64,
the lowpart and the highpart (%rax and %rdx) are transposed, and
it would be nice to have a conditional swap/exchange.  The inspired
solution the compiler comes up with is to store/duplicate the same
value in both %rax/%rdx, and then use complementary conditional moves
to either update the lowpart or highpart, which cleverly avoids the
potential decode-stage pipeline stall (on some microarchitectures)
from having multiple instructions conditional on the same condition.
See X86_TUNE_ONE_IF_CONV_INSN, and notice there are two such stalls
in the original expansion of rot[rl]ti3.

2021-11-02  Roger Sayle  <roger@nextmovesoftware.com>
	    Uroš Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.md (<any_rotate>ti3): Provide expansion for
	rotations by non-constant amounts.
2021-11-02 21:58:32 +00:00
Jan Hubicka 18f0873d1e ipa-modref cleanup
A small refactoring of ipa-modref to make it bit more
C++y by moving logic analyzing ssa name flags to a class
and I also moved the anonymous namespace markers so we do not
export unnecessary stuff.  There are no functional changes.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

	* ipa-modref.c: Fix anonymous namespace placement.
	(class modref_eaf_analysis): New class.
	(analyze_ssa_name_flags): Turn to ...
	(modref_eaf_analysis::analyze_ssa_name): ... this one.
	(merge_call_lhs_flags): Turn to ...
	(modref_eaf_analysis::merge_call_lhs_flags): .. this one
	(modref_eaf_analysis::merge_with_ssa_name): New member function.
	(record_escape_points): Turn to ...
	(modref_eaf_analysis::record_escape_points): ... this one.
	(analyze_parms): Updat
	(ipa_merge_modref_summary_after_inlining): Move to the end of file.
2021-11-02 22:08:56 +01:00
Jan Hubicka a70c05120a Static chain support in ipa-modref
Teach ipa-modref about the static chain that is, like
retslot, a hiden argument.  The patch is pretty much symemtric to what
was done for retslot handling and I verified it does the intended job
for Ada LTO bootstrap.

gcc/ChangeLog:

	* gimple.c (gimple_call_static_chain_flags): New function.
	* gimple.h (gimple_call_static_chain_flags): Declare
	* ipa-modref.c (modref_summary::modref_summary): Initialize
	static_chain_flags.
	(modref_summary_lto::modref_summary_lto): Likewise.
	(modref_summary::useful_p): Test static_chain_flags.
	(modref_summary_lto::useful_p): Likewise.
	(struct modref_summary_lto): Add static_chain_flags.
	(modref_summary::dump): Dump static_chain_flags.
	(modref_summary_lto::dump): Likewise.
	(struct escape_point): Add static_cahin_arg.
	(analyze_ssa_name_flags): Use gimple_call_static_chain_flags.
	(analyze_parms): Handle static chains.
	(modref_summaries::duplicate): Duplicate static_chain_flags.
	(modref_summaries_lto::duplicate): Likewise.
	(modref_write): Stream static_chain_flags.
	(read_section): Likewise.
	(modref_merge_call_site_flags): Handle static_chain_flags.
	* ipa-modref.h (struct modref_summary): Add static_chain_flags.
	* tree-ssa-structalias.c (handle_rhs_call): Use
	gimple_static_chain_flags.

gcc/testsuite/ChangeLog:

	* gcc.dg/ipa/modref-3.c: New test.
2021-11-02 18:57:51 +01:00
Richard Biener 164bbf701f tree-optimization/103029 - ensure vect loop versioning constraint on PHIs
PHI nodes in vectorizer loop versioning need to maintain the same
order of PHI arguments to not disturb SLP discovery.  The following
adds an assertion and mitigation in case loop versioning breaks this
which happens more often after the recent reorg.

2021-11-02  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103029
	* tree-vect-loop-manip.c (vect_loop_versioning): Ensure
	the PHI nodes in the loop maintain their original operand
	order.
2021-11-02 18:49:42 +01:00
Jan Hubicka f19791565d addS EAF_NOT_RETURNED_DIRECTLY
addS EAF_NOT_RETURNED_DIRECTLY which works similarly as
EAF_NODIRECTESCAPE.  Values pointed to by a given argument may be returned but
not the argument itself.  This helps PTA quite noticeably because we mostly
care about tracking points to which given memory location can escape.

gcc/ChangeLog:

	* tree-core.h (EAF_NOT_RETURNED_DIRECTLY): New flag.
	(EAF_NOREAD): Renumber.
	* ipa-modref.c (dump_eaf_flags): Dump EAF_NOT_RETURNED_DIRECTLY.
	(remove_useless_eaf_flags): Handle EAF_NOT_RETURNED_DIRECTLY
	(deref_flags): Likewise.
	(modref_lattice::init): Likewise.
	(modref_lattice::merge): Likewise.
	(merge_call_lhs_flags): Likewise.
	(analyze_ssa_name_flags): Likewise.
	(modref_merge_call_site_flags): Likewise.
	* tree-ssa-structalias.c (handle_call_arg): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/ipa/modref-1.C: Update template.
	* gcc.dg/tree-ssa/modref-10.c: New test.
2021-11-02 18:43:17 +01:00