Whilst debugging state explosions seen when enabling taint detection
with -fanalyzer (PR analyzer/103533), I noticed that constraint
manager instances could contain stray, redundant constants, such
as this instance:
constraint_manager:
equiv classes:
ec0: {(int)0 == [m_constant]‘0’}
ec1: {(size_t)4 == [m_constant]‘4’}
constraints:
where there are two equivalence classes, each just containing a
constant, with no constraints using them.
This patch makes constraint_manager::canonicalize more aggressive
about purging state, handling the case of purging a redundant
EC containing just a constant.
gcc/analyzer/ChangeLog:
PR analyzer/103533
* constraint-manager.cc (equiv_class::contains_non_constant_p):
New.
(constraint_manager::canonicalize): Call it when determining
redundant ECs.
(selftest::test_purging): New selftest.
(selftest::run_constraint_manager_tests): Likewise.
* constraint-manager.h (equiv_class::contains_non_constant_p):
New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This patch does a little bit of cleanup by removing some unused
arguments, or marking them as unused. It also removes the function
ctfc_debuginfo_early_finish_p and the corresponding hook macro
definition, which are not used by GCC.
gcc/
* config/bpf/bpf.c (bpf_handle_preserve_access_index_attribute):
Mark arguments `args' and flags' as unused.
(bpf_core_newdecl): Remove unused local `newdecl'.
(bpf_core_newdecl): Remove unused argument `loc'.
(ctfc_debuginfo_early_finish_p): Remove unused function.
(TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P): Remove definition.
(bpf_core_walk): Do not pass a location to bpf_core_newdecl.
When compiling an optabs.ii at -O2 with a release-checking build,
there were 6,643,575 calls to gimple_outgoing_range_stmt_p. 96.8% of
them were for blocks with a single successor, which never have a control
statement that generates new range info. This patch therefore adds a
shortcut for that case.
This gives a ~1% compile-time improvement for the test.
I tried making the function inline (in the header) so that the
single_succ_p didn't need to be repeated, but it seemed to make things
slightly worse.
gcc/
* gimple-range-edge.cc (gimple_outgoing_range::edge_range_p): Add
a shortcut for blocks with single successors.
* gimple-range-gori.cc (gori_map::calculate_gori): Likewise.
When compiling an optabs.ii at -O2 with a release-checking build,
the hottest function in the profile was irange_union. This patch
tries to optimise it a bit. The specific changes are:
- Use quick_push rather than safe_push, since the final number
of entries is known in advance.
- Avoid assigning wi::to_wide & co. to a temporary wide_int,
such as in:
wide_int val_j = wi::to_wide (res[j]);
wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST
storage. Assigning the result to wide_int forces an unnecessary
copy to temporary storage.
This is one area where "auto" helps a lot. In the end though,
it seemed more readable to inline the wi::to_*s rather than
use auto.
- Use to_widest_int rather than to_wide_int. Both are functionally
correct, but to_widest_int is more efficient, for three reasons:
- to_wide returns a wide-int representation in which the most
significant element might not be canonically sign-extended.
This is because we want to allow the storage of an INTEGER_CST
like 0x1U << 31 to be accessed directly with both a wide_int view
(where only 32 bits matter) and a widest_int view (where many more
bits matter, and where the 32 bits are zero-extended to match the
unsigned type). However, operating on uncanonicalised wide_int
forms is less efficient than operating on canonicalised forms.
- to_widest_int has a constant rather than variable precision and
there are never any redundant upper bits to worry about.
- Using widest_int avoids the need for an overflow check, since
there is enough precision to add 1 to any IL constant without
wrap-around.
This gives a ~2% compile-time speed up with the test above.
I also tried adding a path for two single-pair ranges, but it
wasn't a win.
gcc/
* value-range.cc (irange::irange_union): Use quick_push rather
than safe_push. Use widest_int rather than wide_int. Avoid
assigning wi::to_* results to wide*_int temporaries.
Before walking the CFG and filling all cache entries, check if the
same information is available in a dominator.
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Check for
a range from dominators before filling the cache.
(ranger_cache::range_from_dom): New.
* gimple-range-cache.h (ranger_cache::range_from_dom): Add prototype.
There are times we only need to know if any edge from a block can calculate
a range.
* gimple-range-gori.h (class gori_compute):: Add prototypes.
* gimple-range-gori.cc (gori_compute::has_edge_range_p): Add alternate
API for basic block. Call for edge alterantive.
(gori_compute::may_recompute_p): Ditto.
Add
commit 70b043845d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Nov 30 05:31:26 2021 -0800
libsanitizer: Use SSE to save and restore XMM registers
to LOCAL_PATCHES.
* LOCAL_PATCHES: Add commit 70b043845d.
Use SSE, instead of AVX, to save and restore XMM registers to support
processors without AVX. The affected codes are unused in upstream since
https://github.com/llvm/llvm-project/commit/66d4ce7e26a5
and will be removed in
https://reviews.llvm.org/D112604
This fixed
FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O0 execution test
FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O2 execution test
on machines without AVX.
PR sanitizer/103466
* tsan/tsan_rtl_amd64.S (__tsan_trace_switch_thunk): Replace
vmovdqu with movdqu.
(__tsan_report_race_thunk): Likewise.
The recent fix to PR103527 exposed an issue with how the various
special casing for AVX512 masks in vect_build_gather_load_calls
are handled. The following makes that more obvious, fixing the
miscompile of 403.gcc.
2021-12-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/103581
* tree-vect-stmts.c (vect_build_gather_load_calls): Properly
guard all the AVX512 mask cases.
* gcc.dg/vect/pr103581.c: New testcase.
When SLP reduction chain vectorization support added handling of
an outer conversion in the chain picking a failed reduction up
as SLP reduction that broke the invariant that the whole reduction
was forward reachable. The following plugs that hole noting
a future enhancement possibility.
2021-12-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/103544
* tree-vect-slp.c (vect_analyze_slp): Only add a SLP reduction
opportunity if the stmt in question is the reduction root.
(dot_slp_tree): Add missing check for NULL child.
* gcc.dg/vect/pr103544.c: New testcase.
CSE uses equivalence classes to keep track of expressions that all have the same
values at the current point in the program.
Normal equivalences through SETs only insert and perform lookups in this set but
equivalence determined from comparisons, e.g.
(insn 46 44 47 7 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 105 [ iD.2893 ])
(const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
(expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
(nil)))
creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).
This causes a merge to happen between the two equivalence sets denoted by
(const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.
The operation happens through merge_equiv_classes however this function has an
invariant that the classes to be merge not contain any duplicates. This is
because it frees entries before merging.
The given testcase when using the supplied flags trigger an ICE due to the
equivalence set being
(rr) p dump_class (class1)
Equivalence chain for (reg:SI 105 [ iD.2893 ]):
(reg:SI 105 [ iD.2893 ])
$3 = void
(rr) p dump_class (class2)
Equivalence chain for (const_int 0 [0]):
(const_int 0 [0])
(reg:SI 97 [ _10 ])
(reg:SI 97 [ _10 ])
$4 = void
This happens because the original INSN being recorded is
(insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
(const_vector:V1SI [
(const_int 0 [0])
])) "cse.c":11:9 1363 {*movv1si_internal}
(expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
(nil)))
and we end up generating two equivalences. the first one is simply that
reg:SI 97 is 0. The second one is that 0 can be extracted from the V1SI, so
subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0. This nested subreg gets folded away
to just reg:SI 97 and we re-insert the same equivalence.
This patch changes it so that if the nunits of a subreg is 1 then don't generate
a vec_select from the subreg as the subreg will be folded away and we get a dup.
gcc/ChangeLog:
PR rtl-optimization/103404
* cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
subreg.
gcc/testsuite/ChangeLog:
PR rtl-optimization/103404
* gcc.target/i386/pr103404.c: New test.
When moves between integer and sse registers are cheap.
2021-12-06 Hongtao Liu <Hongtao.liu@intel.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/95740
* config/i386/i386.c (ix86_preferred_reload_class): Allow
integer regs when moves between register units are cheap.
* config/i386/i386.h (INT_SSE_CLASS_P): New.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr95740.c: New test.
This is the original binutils bugzilla report,
https://sourceware.org/bugzilla/show_bug.cgi?id=28509
And this is the first version of the proposed binutils patch,
https://sourceware.org/pipermail/binutils/2021-November/118398.html
After applying the binutils patch, I get the the unexpected error when
building libgcc,
/scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42:
/scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld: relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not be used when making a shared object; recompile with -fPIC
Therefore, this patch add an extra hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead.
The solution is similar to glibc as follows,
https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b
libgcc/ChangeLog:
* config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target it since it is non-preemptible.
* config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and
HIDDEN_DEF.
This moves the GTY declaration of the meta-data indentifier
array into the header that enumerates these and provides
shorthand defines for them. This avoids a problem seen with
a relocatable PCH implementation.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/objc/ChangeLog:
* objc-next-metadata-tags.h (objc_rt_trees): Declare here.
* objc-next-runtime-abi-01.c: Remove from here.
* objc-next-runtime-abi-02.c: Likewise.
* objc-runtime-shared-support.c: Reorder headers, provide
a GTY declaration the definition of objc_rt_trees.
The new builtin machinery has an early exit, so move the AIX-specific
builtins before the new machinery.
gcc/ChangeLog:
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Move
AIX math builtin initialization before new_builtins_are_live.
Implements moste of OpenMP 5.1 atomic extensions,
except that 'compare' is parsed but rejected during
resolution. (As the trans-openmp.c handling is missing.)
gcc/fortran/ChangeLog:
* dump-parse-tree.c (show_omp_clauses): Handle
weak/compare/fail clause.
* gfortran.h (gfc_omp_clauses): Add weak, compare, fail.
* openmp.c (enum omp_mask1, gfc_match_omp_clauses,
OMP_ATOMIC_CLAUSES): Update for new clauses.
(gfc_match_omp_atomic): Update for 5.1 atomic changes.
(is_conversion): Support widening in one go.
(is_scalar_intrinsic_expr): New.
(resolve_omp_atomic): Update for 5.1 atomic changes.
* parse.c (parse_omp_oacc_atomic): Update for compare.
* resolve.c (gfc_resolve_blocks): Update asserts.
* trans-openmp.c (gfc_trans_omp_atomic): Handle new clauses.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/atomic-2.f90: Move now supported code to ...
* gfortran.dg/gomp/atomic.f90: here.
* gfortran.dg/gomp/atomic-10.f90: New test.
* gfortran.dg/gomp/atomic-12.f90: New test.
* gfortran.dg/gomp/atomic-15.f90: New test.
* gfortran.dg/gomp/atomic-16.f90: New test.
* gfortran.dg/gomp/atomic-17.f90: New test.
* gfortran.dg/gomp/atomic-18.f90: New test.
* gfortran.dg/gomp/atomic-19.f90: New test.
* gfortran.dg/gomp/atomic-20.f90: New test.
* gfortran.dg/gomp/atomic-22.f90: New test.
* gfortran.dg/gomp/atomic-24.f90: New test.
* gfortran.dg/gomp/atomic-25.f90: New test.
* gfortran.dg/gomp/atomic-26.f90: New test.
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.1): Update status.
This fixes a -Wuninitialized warning for std::cmatch m1, m2; m1=m2;
Also name the template parameters in the forward declaration, to get rid
of the <template-parameter-1-1> noise in diagnostics.
libstdc++-v3/ChangeLog:
PR libstdc++/103549
* include/bits/regex.h (match_results): Give names to template
parameters in first declaration.
(match_results::_M_begin): Add default member-initializer.
P1272R4 has added to the std::byteswap new stuff to me quite unrelated
clarification for std::bit_cast.
The patch treats it as DR, applying to all languages.
We no longer diagnose if padding bits are stored into unsigned char
or std::byte result, fields or bitfields, instead arrange for that result,
those fields or bitfields to get indeterminate value (empty
CONSTRUCTOR with CONSTRUCTOR_NO_ZEROING or just leaving the member's
initializer out and setting CONSTRUCTOR_NO_ZEROING on parent).
We still have a bug that we don't diagnose in lots of places lvalue-to-rvalue
conversions of indeterminate values or class objects with some indeterminate
members.
2021-12-04 Jakub Jelinek <jakub@redhat.com>
* cp-tree.h (is_byte_access_type_not_plain_char): Declare.
* tree.c (is_byte_access_type_not_plain_char): New function.
* constexpr.c (clear_uchar_or_std_byte_in_mask): New function.
(cxx_eval_bit_cast): Don't error about padding bits if target
type is unsigned char or std::byte, instead return no clearing
ctor. Use clear_uchar_or_std_byte_in_mask.
* g++.dg/cpp2a/bit-cast11.C: New test.
* g++.dg/cpp2a/bit-cast12.C: New test.
* g++.dg/cpp2a/bit-cast13.C: New test.
* g++.dg/cpp2a/bit-cast14.C: New test.
The https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557903.html
change broke the following testcases. The problem is when a pragma
namespace allows expansion (i.e. p->is_nspace && p->allow_expansion),
e.g. the omp or acc namespaces do, then when parsing the second pragma
token we do it with pfile->state.in_directive set,
pfile->state.prevent_expansion clear and pfile->state.in_deferred_pragma
clear (the last one because we don't know yet if it will be a deferred
pragma or not). If the pragma line only contains a single name
and newline after it, and there exists a function-like macro with the
same name, the preprocessor needs to peek in funlike_invocation_p
the next token whether it isn't ( but in this case it will see a newline.
As pfile->state.in_directive is set, we don't read anything after the
newline, pfile->buffer->need_line is set and CPP_EOF is lexed, which
funlike_invocation_p doesn't push back. Because name is a function-like
macro and on the pragma line there is no ( after the name, it isn't
expanded, and control flow returns to do_pragma. If name is valid
deferred pragma, we set pfile->state.in_deferred_pragma (and really
need it set so that e.g. end_directive later on doesn't eat all the
tokens from the pragma line).
Before Nathan's change (which unfortunately didn't contain rationale
on why it is better to do it like that), this wasn't a problem,
next _cpp_lex_direct called when we want next token would return
CPP_PRAGMA_EOF when it saw buffer->need_line, which would turn off
pfile->state.in_deferred_pragma and following get token would already
read the next line. But Nathan's patch replaced it with an assertion
failure that now triggers and CPP_PRAGMA_EOL is done only when lexing
the '\n'. Except for this special case that works fine, but in
this case it doesn't because when peeking the token we still didn't know
that it will be a deferred pragma.
I've tried to fix that up in do_pragma by detecting this and pushing
CPP_PRAGMA_EOL as lookahead, but that doesn't work because end_directive
still needs to see pfile->state.in_deferred_pragma set.
So, this patch affectively reverts part of Nathan's change, CPP_PRAGMA_EOL
addition isn't done only when parsing the '\n', but is now done in both
places, in the first one instead of the assertion failure.
2021-12-04 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/102432
* lex.c (_cpp_lex_direct): If buffer->need_line while
pfile->state.in_deferred_pragma, return CPP_PRAGMA_EOL token instead
of assertion failure.
* c-c++-common/gomp/pr102432.c: New test.
* c-c++-common/goacc/pr102432.c: New test.
When -fif-conversion2 is enabled, we attempt to replace conditional
branches around unconditional traps with conditional traps. That
canonicalizes compares, which may change an immediate that barely fits
into one that doesn't.
The compare for the trap is first checked using the predicates of
cbranch predicates, and then, compare and conditional trap insns are
emitted and recognized.
In the failing s390x testcase, i <=u 0xffff_ffff is canonicalized into
i <u 0x1_0000_0000, and the latter immediate doesn't fit. The insn
predicates (both cbranch and cmpdi_ccu) happily accept it, since the
register allocator has no trouble getting them into registers. The
problem is that ifcvt2 runs after reload, so we recognize the compare
insn successfully, but later on we barf when we find that none of the
constraints fit.
This patch arranges for the trap_if-issuing bits in ifcvt to validate
post-reload insns using a stricter test that also checks that operands
fit the constraints.
for gcc/ChangeLog
PR rtl-optimization/103028
* ifcvt.c (find_cond_trap): Validate new insns more strictly
after reload.
for gcc/testsuite/ChangeLog
PR rtl-optimization/103028
* gcc.dg/pr103028.c: New.
This introduces a new RAII type to simplify the emplace members which
currently use try-catch blocks to deallocate a node if an exception is
thrown by the comparisons done during insertion. The new type is created
on the stack and manages the allocation of a new node and deallocates it
in the destructor if it wasn't inserted into the tree. It also provides
helper functions for doing the insertion, releasing ownership of the
node to the tree.
Also, we don't need to use long qualified names if we put the return
type after the nested-name-specifier.
libstdc++-v3/ChangeLog:
* include/bits/stl_tree.h (_Rb_tree::_Auto_node): Define new
RAII helper for creating and inserting new nodes.
(_Rb_tree::_M_insert_node): Use trailing-return-type to simplify
out-of-line definition.
(_Rb_tree::_M_insert_lower_node): Likewise.
(_Rb_tree::_M_insert_equal_lower_node): Likewise.
(_Rb_tree::_M_emplace_unique): Likewise. Use _Auto_node.
(_Rb_tree::_M_emplace_equal): Likewise.
(_Rb_tree::_M_emplace_hint_unique): Likewise.
(_Rb_tree::_M_emplace_hint_equal): Likewise.
We can make some function signatures shorter to print by omitting redundant
nested-name-specifiers in the rest of the declarator.
gcc/cp/ChangeLog:
* error.c (current_dump_scope): New variable.
(dump_scope): Check it.
(dump_function_decl): Set it.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/scope1.C: New test.
PR101324 shows a problem in disabling shrink-wrapping when using -mrop-protect
when there is a attribute optimize/pragma. The fix envolves moving the handling
of flag_shrink_wrap so it gets re-disbled when we change or add options.
2021-12-03 Martin Liska <mliska@suse.cz>
gcc/
PR target/101324
* config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
disabling of shrink-wrapping when using -mrop-protect from here...
(rs6000_override_options_after_change): ...to here.
2021-12-03 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR target/101324
* gcc.target/powerpc/pr101324.c: New test.
This patch adds a new effective-target function that tests whether
it is safe to emit the ROP-protect instructions and updates the
ROP test cases to use it.
2021-12-03 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_rop_ok): New function.
* gcc.target/powerpc/rop-1.c: Use it.
* gcc.target/powerpc/rop-2.c: Likewise.
* gcc.target/powerpc/rop-3.c: Likewise.
* gcc.target/powerpc/rop-4.c: Likewise.
* gcc.target/powerpc/rop-5.c: Likewise.
gcc/fortran/ChangeLog:
PR fortran/103505
* array.c (match_array_element_spec): Try to simplify array
element specifications to improve early checking.
* expr.c (gfc_try_simplify_expr): New. Try simplification of an
expression via gfc_simplify_expr. When an error occurs, roll
back.
* gfortran.h (gfc_try_simplify_expr): Declare it.
gcc/testsuite/ChangeLog:
PR fortran/103505
* gfortran.dg/pr103505.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
In r11-4758, I tried to fix this problem:
int &&i = 0;
decltype(auto) j = i; // should behave like int &&j = i; error
wherein do_auto_deduction was getting confused with a REFERENCE_REF_P
and it didn't realize its operand was a name, not an expression, and
deduced the wrong type.
Unfortunately that fix broke this:
int&& r = 1;
decltype(auto) rr = (r);
where 'rr' should be 'int &' since '(r)' is an expression, not a name. But
because I stripped the INDIRECT_REF with the r11-4758 change, we deduced
'rr's type as if decltype had gotten a name, resulting in 'int &&'.
I suspect I thought that the REF_PARENTHESIZED_P check when setting
'bool id' in do_auto_deduction would handle the (r) case, but that's not
the case; while the documentation for REF_PARENTHESIZED_P specifically says
it can be set in INDIRECT_REF, we don't actually do so.
This patch sets REF_PARENTHESIZED_P even on REFERENCE_REF_P, so that
do_auto_deduction can use it.
It also removes code in maybe_undo_parenthesized_ref that I think is
dead -- and we don't hit it while running dg.exp. To adduce more data,
it also looks dead here:
https://splichal.eu/lcov/gcc/cp/semantics.c.gcov.html
(It's dead since r9-1417.)
Also add a fixed test for c++/81176.
PR c++/103403
gcc/cp/ChangeLog:
* cp-gimplify.c (cp_fold): Don't recurse if maybe_undo_parenthesized_ref
doesn't change its argument.
* pt.c (do_auto_deduction): Don't strip REFERENCE_REF_P trees if they
are REF_PARENTHESIZED_P. Use stripped_init when checking for
id-expression.
* semantics.c (force_paren_expr): Set REF_PARENTHESIZED_P on
REFERENCE_REF_P trees too.
(maybe_undo_parenthesized_ref): Remove dead code.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/decltype-auto2.C: New test.
* g++.dg/cpp1y/decltype-auto3.C: New test.
* g++.dg/cpp1y/decltype-auto4.C: New test.
* g++.dg/cpp1z/decomp-decltype1.C: New test.
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
and store, independent of -mprefer-vector-width=bits:
1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
which are enabled for Intel Sapphire Rapids processor.
2. Add -mmove-max=bits to set the maximum number of bits can be moved from
memory to memory efficiently. The default value is derived from
X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
preferred vector width.
3. Add -mstore-max=bits to set the maximum number of bits can be stored to
memory efficiently. The default value is derived from
X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
preferred vector width.
gcc/
PR target/103269
* config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
and PVW_NONE to ix86_target_string.
* config/i386/i386-options.c (ix86_target_string): Add arguments
for move_max and store_max.
(ix86_target_string::add_vector_width): New lambda.
(ix86_debug_options): Pass ix86_move_max and ix86_store_max to
ix86_target_string.
(ix86_function_specific_print): Pass ptr->x_ix86_move_max and
ptr->x_ix86_store_max to ix86_target_string.
(ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
x_ix86_store_max.
(ix86_option_override_internal): Set the default x_ix86_move_max
and x_ix86_store_max.
* config/i386/i386-options.h (ix86_target_string): Add
prefer_vector_width and prefer_vector_width.
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
PVW_AVX512. Use 32 if ix86_move_max or ix86_store_max >=
PVW_AVX256.
(STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
Use 32 if ix86_store_max >= PVW_AVX256.
* config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
(X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
* doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.
gcc/testsuite/
PR target/103269
* gcc.target/i386/pieces-memcpy-17.c: New test.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memcpy-20.c: Likewise.
* gcc.target/i386/pieces-memcpy-21.c: Likewise.
* gcc.target/i386/pieces-memset-45.c: Likewise.
* gcc.target/i386/pieces-memset-46.c: Likewise.
* gcc.target/i386/pieces-memset-47.c: Likewise.
* gcc.target/i386/pieces-memset-48.c: Likewise.
* gcc.target/i386/pieces-memset-49.c: Likewise.
I discovered this bug while working on patches to remove the old built-ins
infrastructure. I missed a spot in converting from the rs6000_builtins enum to
the rs6000_gen_builtins_enum. This fixes it. The fix is technically not right
if new_builtins_are_enabled were to be set to zero, but we're not going to do
that anymore, and the remnants of that code will be removed shortly.
2021-12-02 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Fix builtin
identifiers.
The following example
void f5(float * restrict z0, float * restrict z1, float *restrict x,
float * restrict y, float c, int n)
{
for (int i = 0; i < n; i++) {
float a = x[i];
float b = y[i];
if (a > b) {
z0[i] = a + b;
if (a > c) {
z1[i] = a - b;
}
}
}
}
generates currently:
ptrue p3.b, all
ld1w z1.s, p1/z, [x2, x5, lsl 2]
ld1w z2.s, p1/z, [x3, x5, lsl 2]
fcmgt p0.s, p3/z, z1.s, z0.s
fcmgt p2.s, p1/z, z1.s, z2.s
fcmgt p0.s, p0/z, z1.s, z2.s
and p0.b, p0/z, p1.b, p1.b
The conditions for a > b and a > c become separate comparisons.
After this patch we generate:
ld1w z1.s, p0/z, [x2, x5, lsl 2]
ld1w z2.s, p0/z, [x3, x5, lsl 2]
fcmgt p1.s, p0/z, z1.s, z2.s
fcmgt p1.s, p1/z, z1.s, z0.s
Where the condition a > b && a > c are folded by using the predicate result of
the previous compare and thus allows the removal of one of the compares.
When never a mask is being generated from an BIT_AND we mask the operands of
the and instead and then just AND the result.
This allows us to be able to CSE the masks and generate the right combination.
However because re-assoc will try to re-order the masks in the & we have to now
perform a small local CSE on the vectorized loop is vectorization is successful.
Note: This patch series is working incrementally towards generating the most
efficient code for this and other loops in small steps.
gcc/ChangeLog:
* tree-vect-stmts.c (prepare_load_store_mask): Rename to...
(prepare_vec_mask): ...This and record operations that have already been
masked.
(vectorizable_call): Use it.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vectorizer.h (class _loop_vec_info): Add vec_cond_masked_set.
(vec_cond_masked_set_type, tree_cond_mask_hash): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pred-combine-and.c: New test.
1. On some targets, like PowerPC, reference to ifunc function resolver
must be non-local so that compiler will properly emit PLT call. Add
TARGET_IFUNC_REF_LOCAL_OK to allow binding indirect function resolver
locally for targets which don't require special PLT call sequence.
2. Add ix86_call_use_plt_p to call local ifunc function resolvers via
PLT.
gcc/
PR target/51469
PR target/83782
* target.def (ifunc_ref_local_ok): Add a target hook.
* varasm.c (default_binds_local_p_3): Force indirect function
resolver non-local only if targetm.ifunc_ref_local_ok returns
false.
* config/i386/i386-expand.c (ix86_expand_call): Call
ix86_call_use_plt_p to check if PLT should be used.
* config/i386/i386-protos.h (ix86_call_use_plt_p): New.
* config/i386/i386.c (output_pic_addr_const): Call
ix86_call_use_plt_p to check if "@PLT" is needed.
(ix86_call_use_plt_p): New.
(TARGET_IFUNC_REF_LOCAL_OK): New.
* doc/tm.texi.in: Add TARGET_IFUNC_REF_LOCAL_OK.
* doc/tm.texi: Regenerated.
gcc/testsuite/
PR target/51469
PR target/83782
* gcc.target/i386/pr83782-1.c: New test.
* gcc.target/i386/pr83782-2.c: Likewise.
ubsan.exp cycles through torture options, and that includes
-O2 -flto -fno-fat-lto-objects. But with those options
tree dump scans don't work for post-IPA passes, for dg-do
compile tests nothing after IPA is done. So we get an
unresolved testcase:
gcc.dg/ubsan/pr103456.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects : dump file does not exist
UNRESOLVED: gcc.dg/ubsan/pr103456.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-tree-dump-not objsz1 "maximum object size 0"
Fixed by adding -ffat-lto-objects so that we perform the post-IPA
passes.
2021-12-03 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103456
* gcc.dg/ubsan/pr103456.c: Add -ffat-lto-objects to dg-options.
The target attribute handling is very expensive and for the common case
from x86intrin.h where many functions get implicitly the same target
attribute, we can speed up compilation a lot by caching it.
The following patches both create a single entry cache, where they cache
for a particular target attribute argument list the resulting
DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_SPECIFIC_OPTIMIZATION
values from ix86_valid_target_attribute_p and use the cache if the
args are the same as last time and we start either from NULL values
of those, or from the recorded values for those from last time.
Compiling a simple:
#include <x86intrin.h>
int i;
testcase with ./cc1 -quiet -O2 -isystem include/ test.c
takes on my WS without the patches ~0.392s and with either of the
patches ~0.182s, i.e. roughly half the time as before.
For ./cc1plus -quiet -O2 -isystem include/ test.c
it is slightly worse, the speed up is from ~0.613s to ~0.403s.
The difference between the 2 patches is that the first one uses copy_list
while the second one uses a vec, so I think the second one has the advantage
of creating less GC garbage.
I've verified both patches achieve the same content of those
DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_SPECIFIC_OPTIMIZATION
nodes as before on x86intrin.h by doing debug_tree on those and comparing
the stderr from without these patches to with these patches.
2021-12-03 Jakub Jelinek <jakub@redhat.com>
* attribs.h (simple_cst_list_equal): Declare.
* attribs.c (simple_cst_list_equal): No longer static.
* config/i386/i386-options.c (target_attribute_cache): New variable.
(ix86_valid_target_attribute_p): Cache DECL_FUNCTION_SPECIFIC_TARGET
and DECL_FUNCTION_SPECIFIC_OPTIMIZATION based on args.
So, if we want to make PCH work for PIEs, I'd say we can:
1) add a new GTY option, say callback, which would act like
skip for non-PCH and for PCH would make us skip it but
remember for address bias translation
2) drop the skip for tree_translation_unit_decl::language
3) change get_unnamed_section to have const char * as
last argument instead of const void *, change
unnamed_section::data also to const char * and update
everything related to that
4) maybe add a host hook whether it is ok to support binaries
changing addresses (the only thing I'm worried is if
some host that uses function descriptors allocates them
dynamically instead of having them somewhere in the
executable)
5) maybe add a gengtype warning if it sees in GTY tracked
structure a function pointer without that new callback
option
Here is 1), 2), 3) implemented.
Note, on stdc++.h.gch/O2g.gch there are just those 10 relocations without
the second patch, with it a few more, but nothing huge. And for non-PIEs
there isn't really any extra work on the load side except freading two scalar
values and fseek.
2021-12-03 Jakub Jelinek <jakub@redhat.com>
PR pch/71934
gcc/
* ggc.h (gt_pch_note_callback): Declare.
* gengtype.h (enum typekind): Add TYPE_CALLBACK.
(callback_type): Declare.
* gengtype.c (dbgprint_count_type_at): Handle TYPE_CALLBACK.
(callback_type): New variable.
(process_gc_options): Add CALLBACK argument, handle callback
option.
(set_gc_used_type): Adjust process_gc_options caller, if callback,
set type to &callback_type.
(output_mangled_typename): Handle TYPE_CALLBACK.
(walk_type): Likewise. Handle callback option.
(write_types_process_field): Handle TYPE_CALLBACK.
(write_types_local_user_process_field): Likewise.
(write_types_local_process_field): Likewise.
(write_root): Likewise.
(dump_typekind): Likewise.
(dump_type): Likewise.
* gengtype-state.c (type_lineloc): Handle TYPE_CALLBACK.
(state_writer::write_state_callback_type): New method.
(state_writer::write_state_type): Handle TYPE_CALLBACK.
(read_state_callback_type): New function.
(read_state_type): Handle TYPE_CALLBACK.
* ggc-common.c (callback_vec): New variable.
(gt_pch_note_callback): New function.
(gt_pch_save): Stream out gt_pch_save function address and relocation
table.
(gt_pch_restore): Stream in saved gt_pch_save function address and
relocation table and apply relocations if needed.
* doc/gty.texi (callback): Document new GTY option.
* varasm.c (get_unnamed_section): Change callback argument's type and
last argument's type from const void * to const char *.
(output_section_asm_op): Change argument's type from const void *
to const char *, remove unnecessary cast.
* tree-core.h (struct tree_translation_unit_decl): Drop GTY((skip))
from language member.
* output.h (unnamed_section_callback): Change argument type from
const void * to const char *.
(struct unnamed_section): Use GTY((callback)) instead of GTY((skip))
for callback member. Change data member type from const void *
to const char *.
(struct noswitch_section): Use GTY((callback)) instead of GTY((skip))
for callback member.
(get_unnamed_section): Change callback argument's type and
last argument's type from const void * to const char *.
(output_section_asm_op): Change argument's type from const void *
to const char *.
* config/avr/avr.c (avr_output_progmem_section_asm_op): Likewise.
Remove unneeded cast.
* config/darwin.c (output_objc_section_asm_op): Change argument's type
from const void * to const char *.
* config/pa/pa.c (som_output_text_section_asm_op): Likewise.
(som_output_comdat_data_section_asm_op): Likewise.
* config/rs6000/rs6000.c (rs6000_elf_output_toc_section_asm_op):
Likewise.
(rs6000_xcoff_output_readonly_section_asm_op): Likewise. Instead
of dereferencing directive hardcode variable names and decide based on
whether directive is NULL or not.
(rs6000_xcoff_output_readwrite_section_asm_op): Change argument's type
from const void * to const char *.
(rs6000_xcoff_output_tls_section_asm_op): Likewise. Instead
of dereferencing directive hardcode variable names and decide based on
whether directive is NULL or not.
(rs6000_xcoff_output_toc_section_asm_op): Change argument's type
from const void * to const char *.
(rs6000_xcoff_asm_init_sections): Adjust get_unnamed_section callers.
gcc/c-family/
* c-pch.c (struct c_pch_validity): Remove pch_init member.
(pch_init): Don't initialize v.pch_init.
(c_common_valid_pch): Don't warn and punt if .text addresses change.
libcpp/
* include/line-map.h (class line_maps): Add GTY((callback)) to
reallocator and round_alloc_size members.
This patch fixes a case of setting array low-bounds, found for particular uses
of SOURCE=/MOLD=. This adjusts the relevant part in gfc_trans_allocate() to
set e3_has_nodescriptor only for non-named arrays.
2021-12-03 Tobias Burnus <tobias@codesourcery.com>
gcc/fortran/ChangeLog:
* trans-stmt.c (gfc_trans_allocate): Set e3_has_nodescriptor to true
only for non-named arrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/allocate_with_source_26.f90: Adjust testcase.
* gfortran.dg/allocate_with_mold_4.f90: New testcase.