Commit Graph

190100 Commits

Author SHA1 Message Date
Richard Biener
2acbc4eba3 Avoid some -Wunreachable-code-ctrl
This cleans up unreachable code diagnosed by -Wunreachable-code-ctrl.
It largely follows the previous series but discovers a few extra
cases, namely dead code after break or continue or loops without
exits.

2021-11-29  Richard Biener  <rguenther@suse.de>

gcc/c/
	* gimple-parser.c (c_parser_gimple_postfix_expression):
	avoid unreachable code after break.

gcc/
	* cfgrtl.c (skip_insns_after_block): Refactor code to
	be more easily readable.
	* expr.c (op_by_pieces_d::run): Remove unreachable
	assert.
	* sched-deps.c (sched_analyze): Remove unreachable
	gcc_unreachable.
	* sel-sched-ir.c (in_same_ebb_p): Likewise.
	* tree-ssa-alias.c (nonoverlapping_refs_since_match_p):
	Remove unreachable code.
	* tree-vect-slp.c (vectorize_slp_instance_root_stmt):
	Refactor to avoid unreachable loop iteration.
	* tree.c (walk_tree_1): Remove unreachable break.
	* vec-perm-indices.c (vec_perm_indices::series_p): Remove
	unreachable return.

gcc/cp/
	* parser.c (cp_parser_postfix_expression): Remove
	unreachable code.
	* pt.c (tsubst_expr): Remove unreachable breaks.

gcc/fortran/
	* frontend-passes.c (gfc_expr_walker): Remove unreachable
	break.
	* scanner.c (skip_fixed_comments): Remove unreachable
	gcc_unreachable.
	* trans-expr.c (gfc_expr_is_variable): Refactor to make
	control flow more obvious.
2021-11-30 08:23:26 +01:00
Kewen Lin
6c7d489a1e rs6000: Remove builtin mask check from builtin_decl [PR102347]
As the discussion in PR102347, currently builtin_decl is invoked so
early, it's when making up the function_decl for builtin functions,
at that time the rs6000_builtin_mask could be wrong for those
builtins sitting in #pragma/attribute target functions, though it
will be updated properly later when LTO processes all nodes.

This patch is to align with the practice i386 port adopts, also
align with r10-7462 by relaxing builtin mask checking in some places.

gcc/ChangeLog:

	PR target/102347
	* config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin mask
	check.

gcc/testsuite/ChangeLog:

	PR target/102347
	* gcc.target/powerpc/pr102347.c: New test.
2021-11-29 21:22:32 -06:00
Kewen Lin
aca68829d7 rs6000: Modify the way for extra penalized cost
This patch follows the discussions here[1][2], where Segher
pointed out the existing way to guard the extra penalized
cost for strided/elementwise loads with a magic bound does
not scale.

The way with nunits * stmt_cost can get one much
exaggerated penalized cost, such as: for V16QI on P8, it's
16 * 20 = 320, that's why we need one bound.  To make it
better and more readable, the penalized cost is simplified
as:

    unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
    unsigned extra_cost = nunits * adjusted_cost;

For V2DI/V2DF, it uses 2 penalized cost for each scalar load
while for the other modes, it uses 1.  It's mainly concluded
from the performance evaluations.  One thing might be
related is that: More units vector gets constructed, more
instructions are used.  It has more chances to schedule them
better (even run in parallelly when enough available units
at that time), so it seems reasonable not to penalize more
for them.

The SPEC2017 evaluations on Power8/Power9/Power10 at option
sets O2-vect and Ofast-unroll show this change is neutral.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html

gcc/ChangeLog:

	* config/rs6000/rs6000.c
	(rs6000_cost_data::update_target_cost_per_stmt): Adjust the way to
	compute extra penalized cost.  Remove useless parameter.
	(rs6000_cost_data::rs6000_add_stmt_cost): Adjust the call to function
	update_target_cost_per_stmt.
2021-11-29 21:22:27 -06:00
Kewen Lin
bcb163eee8 visium: Revert commit r12-5332
This reverts commit b8ce19bb1a
(r12-5332) "visium: Fix non-robust split condition in
define_insn_and_split".

Jeff found newlib failed to build for visium port since
r12-5332, as Eric confirmed, those split conditions in the
related define_insn_and_splits are intentional not to join
with insn condition (&&), since insn condition won't hold
after reload and the proposed concatenation will make the
splitting never happen wrongly.
2021-11-29 19:36:52 -06:00
Andrew MacLeod
ab202b659d Don't reuse reference after potential resize.
When a new def chain is requested, any existing reference may no longer
be valid, so just use the object directly.

	PR tree-optimization/103467
	* gimple-range-gori.cc (range_def_chain::register_dependency): Don't
	use an object reference after a potential resize.
2021-11-29 20:01:08 -05:00
GCC Administrator
87cd82c81d Daily bump. 2021-11-30 00:16:44 +00:00
David Malcolm
1329021771 analyzer: further false leak fixes due to overzealous state merging [PR103217]
Commit r12-5424-gf573d35147ca8433c102e1721d8c99fc432cb44b fixed a false
positive from -Wanalyzer-malloc-leak due to overzealous state merging,
erroneously merging two different svalues bound to a particular part
of the store when one has sm-state.

A further case was discovered by the reporter of PR analyzer/103217,
which this patch fixes.  In this variant, different states have set
different fields of a struct, and on attempting to merge them, the
states have a different set of binding keys, leading to one state
having an svalue with sm-state, and its peer state having a NULL value
for that binding key.  The state merger code was erroneously treating
them as mergeable to "UNKNOWN".  This followup patch fixes things by
rejecting such mergers if the non-NULL svalue is not mergeable with
"UNKNOWN".

gcc/analyzer/ChangeLog:
	PR analyzer/103217
	* store.cc (binding_cluster::can_merge_p): For the "key is bound"
	vs "key is not bound" merger case, check that the bound svalue
	is mergeable before merging it to "unknown", rejecting the merger
	otherwise.

gcc/testsuite/ChangeLog:
	PR analyzer/103217
	* gcc.dg/analyzer/pr103217-2.c: New test.
	* gcc.dg/analyzer/pr103217-3.c: New test.
	* gcc.dg/analyzer/pr103217-4.c: New test.
	* gcc.dg/analyzer/pr103217-5.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-29 18:50:56 -05:00
Uros Bizjak
ca5667e867 i386: Fix and improve movhi_internal and movhf_internal some more.
An (*v,C) alternative can be added to movhi_internal to directly load
HImode constant 0 to xmm register. Also, V4SFmode moves can be used
for xmm->xmm moves instead of TImode moves when optimizing for size.
Fix invalid %vpinsrw insn template, which needs to duplicate %xmm
register for AVX targets.

Optimize GPR moves in movhf_internal in the same way as in movhi_internal.
Fix pinsrw and pextrw templates for AVX targets. Use sselog1
instead of sselog type.  Also, handle TARGET_SSE_PARTIAL_REG_DEPENDENCY
and TARGET_SSE_SPLIT_REGS targets.

2021-11-29  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/102811
	* config/i386/i386.md (*movhi_internal): Introduce (*v,C) alternative.
	Do not allocate non-GPR registers.  Optimize xmm->xmm moves when
	optimizing for size.  Fix vpinsrw insn template.
	(*movhf_internal): Fix pinsrw and pextrw insn templates for
	AVX targets. Use sselog1 type instead of sselog.  Optimize GPR moves.
	Optimize xmm->xmm moves for TARGET_SSE_PARTIAL_REG_DEPENDENCY
	and TARGET_SSE_SPLIT_REGS targets.
2021-11-29 22:17:20 +01:00
Martin Sebor
f81c5a86dc Prune out valid -Winfinite-recursion [PR103469].
gcc/testsuite/ChangeLog:
	PR testsuite/103469
	* c-c++-common/attr-retain-5.c: Prune out valid warning.
	* c-c++-common/attr-retain-6.c: Same.
	* c-c++-common/attr-retain-9.c: Same.
2021-11-29 13:16:51 -07:00
Eric Gallager
ed7894c490 Fix autoconf regeneration slip-up.
A stray _AC_FINALIZE somehow snuck into g:909b30a; this should fix it.

gcc/ChangeLog:

	* configure: Re-regenerate.
2021-11-29 14:50:02 -05:00
Eric Gallager
909b30a17e Make etags path used by build system configurable
This commit allows users to specify a path to their "etags"
executable for use when doing "make tags".
I based this patch off of this one from upstream automake:
https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29
This means that I just supplied variables that the user can override
for the tags programs, rather than having the configure scripts
actually check for them. I handle etags and ctags separately because
the intl subdirectory has separate targets for them. This commit
only affects the subdirectories that use handwritten Makefiles; the
ones that use automake will have to wait until we update the version
of automake used to be 1.16.4 or newer before they'll be fixed.

Addresses #103021

gcc/ChangeLog:

	PR other/103021
	* Makefile.in: Substitute CTAGS, ETAGS, and CSCOPE
	variables. Use ETAGS variable in TAGS target.
	* configure: Regenerate.
	* configure.ac: Allow CTAGS, ETAGS, and CSCOPE
	variables to be overridden.

gcc/ada/ChangeLog:

	PR other/103021
	* gcc-interface/Make-lang.in: Use ETAGS variable in
	TAGS target.

gcc/c/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/cp/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/d/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/fortran/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/go/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/objc/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/objcp/ChangeLog:

	PR other/103021
	* Make-lang.in: Use ETAGS variable in TAGS target.

intl/ChangeLog:

	PR other/103021
	* Makefile.in: Use ETAGS variable in TAGS target,
	CTAGS variable in CTAGS target, and MKID variable
	in ID target.
	* configure: Regenerate.
	* configure.ac: Allow CTAGS, ETAGS, and MKID
	variables to be overridden.

libcpp/ChangeLog:

	PR other/103021
	* Makefile.in: Use ETAGS variable in TAGS target.
	* configure: Regenerate.
	* configure.ac: Allow ETAGS variable to be overridden.

libiberty/ChangeLog:

	PR other/103021
	* Makefile.in: Use ETAGS variable in TAGS target.
	* configure: Regenerate.
	* configure.ac: Allow ETAGS variable to be overridden.
2021-11-29 13:24:12 -05:00
Paul A. Clarke
85289ba36c rs6000: Add Power10 optimization for most _mm_movemask*
Power10 ISA added `vextract*` instructions which are realized in the
`vec_extractm` instrinsic.

Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
`_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.

2021-11-29  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
	when _ARCH_PWR10.
	* config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
	(_mm_movemask_epi8): Likewise.
2021-11-29 09:50:43 -06:00
Richard Biener
e2194a8b39 Fix RTL FE issue with premature return
This fixes an issue discovered by -Wunreachable-code-return

2021-11-29  Richard Biener  <rguenther@suse.de>

	* read-rtl-function.c (function_reader::read_rtx_operand):
	Return only after resetting m_in_call_function_usage.
2021-11-29 16:18:45 +01:00
Patrick Palka
1420ff3efc c++: redundant explicit 'this' capture before C++20 [PR100493]
As described in detail in the PR, in C++20 implicitly capturing 'this'
via a '=' capture default is deprecated, and in C++17 adding an explicit
'this' capture alongside a '=' capture default is diagnosed as redundant
(and is strictly speaking ill-formed).  This means it's impossible to
write, in a forward-compatible way, a C++17 lambda that has a '=' capture
default and that also captures 'this' (implicitly or explicitly):

  [=] { this; }      // #1 deprecated in C++20, OK in C++17
		     // GCC issues a -Wdeprecated warning in C++20 mode

  [=, this] { }      // #2 ill-formed in C++17, OK in C++20
		     // GCC issues an unconditional warning in C++17 mode

This patch resolves this dilemma by downgrading the warning for #2 into
a -pedantic one.  In passing, move it into the -Wc++20-extensions class
of warnings and adjust its wording accordingly.

	PR c++/100493

gcc/cp/ChangeLog:

	* parser.c (cp_parser_lambda_introducer): In C++17, don't
	diagnose a redundant 'this' capture alongside a by-copy
	capture default unless -pedantic.  Move the diagnostic into
	-Wc++20-extensions and adjust wording accordingly.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics.
	* g++.dg/cpp1z/lambda-this8.C: New test.
	* g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17
	to continue to diagnose redundant 'this' captures.
2021-11-29 07:52:47 -05:00
Roger Sayle
a5d269f0c1 x86_64: Improved V1TImode rotations by non-constant amounts.
This patch builds on the recent improvements to TImode rotations (and
Jakub's fixes to shldq/shrdq patterns).  Now that expanding a TImode
rotation can never fail, it is safe to allow general_operand constraints
on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns.
I've also made an additional tweak to ix86_expand_v1ti_to_ti to use
vec_extract via V2DImode, which avoid using memory and takes advantage
vpextrq on recent hardware.

For the following test case:

typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }

GCC with -O2 -mavx2 would previously generate:

rotr:   vmovdqa %xmm0, -24(%rsp)
        movq    -16(%rsp), %rdx
        movl    %edi, %ecx
        xorl    %esi, %esi
        movq    -24(%rsp), %rax
        shrdq   %rdx, %rax
        shrq    %cl, %rdx
        testb   $64, %dil
        cmovne  %rdx, %rax
        cmovne  %rsi, %rdx
        negl    %ecx
        xorl    %edi, %edi
        andl    $127, %ecx
        vmovq   %rax, %xmm2
        movq    -24(%rsp), %rax
        vpinsrq $1, %rdx, %xmm2, %xmm1
        movq    -16(%rsp), %rdx
        shldq   %rax, %rdx
        salq    %cl, %rax
        testb   $64, %cl
        cmovne  %rax, %rdx
        cmovne  %rdi, %rax
        vmovq   %rax, %xmm3
        vpinsrq $1, %rdx, %xmm3, %xmm0
        vpor    %xmm1, %xmm0, %xmm0
        ret

with this patch, we now generate:

rotr:	movl    %edi, %ecx
        vpextrq $1, %xmm0, %rax
        vmovq   %xmm0, %rdx
        shrdq   %rax, %rdx
        vmovq   %xmm0, %rsi
        shrdq   %rsi, %rax
        andl    $64, %ecx
        movq    %rdx, %rsi
        cmovne  %rax, %rsi
        cmove   %rax, %rdx
        vmovq   %rsi, %xmm0
        vpinsrq $1, %rdx, %xmm0, %xmm0
        ret

2021-11-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the
	conversion via V2DImode using vec_extractv2didi on TARGET_SSE2.
	* config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint
	on QImode shift amounts from const_int_operand to general_operand.

gcc/testsuite/ChangeLog
	* gcc.target/i386/sse2-v1ti-rotate.c: New test case.
2021-11-29 10:48:06 +00:00
Richard Biener
a3b31fe369 Remove unreachable gcc_unreachable () at the end of functions
It seems to be a style to place gcc_unreachable () after a
switch that handles all cases with every case returning.
Those are unreachable (well, yes!), so they will be elided
at CFG construction time and the middle-end will place
another __builtin_unreachable "after" them to note the
path doesn't lead to a return when the function is not declared
void.

So IMHO those explicit gcc_unreachable () serve no purpose,
if they could be replaced by a comment.  But since all cases
cover switches not handling a case or not returning will
likely cause some diagnostic to be emitted which is better
than running into an ICE only at runtime.

2021-11-24  Richard Biener  <rguenther@suse.de>

	* tree.h (reverse_storage_order_for_component_p): Remove
	spurious gcc_unreachable.
	* cfganal.c (dfs_find_deadend): Likewise.
	* fold-const-call.c (fold_const_logb): Likewise.
	(fold_const_significand): Likewise.
	* gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p):
	Likewise.

gcc/c-family/
	* c-format.c (check_format_string): Remove spurious
	gcc_unreachable.
2021-11-29 11:18:35 +01:00
Richard Biener
16507dea75 Remove unreachable returns
This removes unreachable return statements as diagnosed by
the -Wunreachable-code patch.  Some cases are more obviously
an improvement than others - in fact some may get you the idea
to replace them with gcc_unreachable () instead, leading to
cases of the 'Remove unreachable gcc_unreachable () at the end
of functions' patch.

2021-11-25  Richard Biener  <rguenther@suse.de>

	* vec.c (qsort_chk): Do not return the void return value
	from the noreturn qsort_chk_error.
	* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
	* df-scan.c (df_ref_equal_p): Likewise.
	* dwarf2out.c (is_base_type): Likewise.
	(add_const_value_attribute): Likewise.
	* fixed-value.c (fixed_arithmetic): Likewise.
	* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
	* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
	* graphite-isl-ast-to-gimple.c
	(gcc_expression_from_isl_expr_op): Likewise.
	(gcc_expression_from_isl_expression): Likewise.
	* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
	Likewise.
	* lto-streamer-in.c (lto_input_mode_table): Likewise.

gcc/c-family/
	* c-opts.c (c_common_post_options): Remove unreachable return.
	* c-pragma.c (handle_pragma_target): Likewise.
	(handle_pragma_optimize): Likewise.

gcc/c/
	* c-typeck.c (c_tree_equal): Remove unreachable return.
	* c-parser.c (get_matching_symbol): Likewise.

libgomp/
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
	return.
2021-11-29 11:17:22 +01:00
liuhongt
11d0a2af33 Optimize _Float16 usage for non AVX512FP16.
1. No memory is needed to move HI/HFmode between GPR and SSE registers
under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
AVX512FP16.
2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
initialization cound be eliminated.

gcc/ChangeLog:

	PR target/102811
	* config/i386/i386.c (inline_secondary_memory_needed): HImode
	move between GPR and SSE registers is supported under
	TARGET_SSE2 and above.
	* config/i386/i386.md (extendhfsf2): Optimize expander.
	(truncsfhf2): Ditto.
	* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
	align with V8HImode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr102811-2.c: New test.
	* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
	scan-assembler-times.
2021-11-29 17:46:00 +08:00
liuhongt
9519b694af Fix regression introduced by r12-5536.
There're several failures:
1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
%vpextrw should be used in output templates.
2. ICE in get_attr_memory for movhi_internal since some alternatives
are marked as TYPE_SSELOG.
use TYPE_SSELOG1 instead.

Also this patch fixs a typo and some latent bugs which are related to
moving HImode from/to sse register w/o TARGET_AVX512FP16.

gcc/ChangeLog:

	PR target/102811
	PR target/103463
	* config/i386/i386.c (ix86_secondary_reload): Without
	TARGET_SSE4_1, General register is needed to move HImode from
	sse register to memory.
	* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
	pextrw in output templates.
	* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
	MEM_P (operands[1]) and adjust mode/prefix/type attribute for
	alternatives related to sse register.
2021-11-29 17:45:57 +08:00
Richard Biener
85e91ad55a tree-optimization/103458 - avoid creating new loops in CD-DCE
When creating forwarders in CD-DCE we have to avoid creating loops
where we formerly did not consider those because of abnormal
predecessors.  At this point simply excuse us when there are any
abnormal predecessors.

2021-11-29  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103458
	* tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not
	create forwarders for blocks with abnormal predecessors.

	* gcc.dg/torture/pr103458.c: New testcase.
2021-11-29 09:20:27 +01:00
Richard Biener
5e5f880d04 Restore can_be_invalidated_p semantics to before refactoring
This restores the semantics of can_be_invalidated_p to the original
semantics of the function this was split out from tree-ssa-uninit.c.
The current semantics only ever look at the first predicate which
cannot be correct.

2021-11-26  Richard Biener  <rguenther@suse.de>

	* gimple-predicate-analysis.cc (can_be_invalidated_p):
	Restore semantics to the one before the split from
	tree-ssa-uninit.c.
2021-11-29 09:20:09 +01:00
Rasmus Villemoes
3e15df63ca libgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target
Since commit 78e49fb1bc (Introduce vxworks specific crtstuff support),
the generic crtbegin.o/crtend.o have been unnecessary to build. So
remove them from extra_parts.

This is effectively a revert of commit 9a5b8df70 (libgcc: add
crt{begin,end} for powerpc-wrs-vxworks target).

libgcc/
	* config.host (powerpc-wrs-vxworks): Do not add crtbegin.o and
	crtend.o to extra_parts.
2021-11-29 08:41:33 +01:00
Kewen Lin
300dbea126 rs6000/test: Add emulated gather test case
As verified, the emulated gather capability of vectorizer
(r12-2733) can help to speed up SPEC2017 510.parest_r on
Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and
Ofast lto.

This patch is to add a test case similar to the one in i386
to add testing coverage for 510.parest_r hotspots.

btw, different from the one in i386, this uses unsigned int
as INDEXTYPE since the unpack support for unsigned int
(r12-3134) also matters for the hotspots vectorization.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/vect-gather-1.c: New test.
2021-11-28 19:59:59 -06:00
Andrew Pinski
68332ab7ec Fix PR 19089: Environment variable TMP may yield gcc: abort
Even though I cannot reproduce the ICE any more, this is still
a bug. We check already to see if we can access the directory
but never check to see if the path is actually a directory.

This adds the check and now we reject the file as not usable
as a tmp directory.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

libiberty/ChangeLog:

	* make-temp-file.c (try_dir): Check to see if the dir
	is actually a directory.
2021-11-29 00:42:45 +00:00
GCC Administrator
2f0dd172bc Daily bump. 2021-11-29 00:16:16 +00:00
Andrew Pinski
32377c1019 Fix PR 62157: disclean in libsanitizer not working
So what is happening is DIST_SUBDIRS contains the conditional
directories which is wrong, so we need to force DIST_SUBDIRS
to be the same as SUBDIRS as recommened by the automake manual.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Also now make distclean works inside libsanitizer directory.

libsanitizer/ChangeLog:

	PR sanitizer/62157
	* Makefile.am: Force DIST_SUBDIRS to be SUBDIRS.
	* Makefile.in: Regenerate.
	* asan/Makefile.in: Likewise.
	* hwasan/Makefile.in: Likewise.
	* interception/Makefile.in: Likewise.
	* libbacktrace/Makefile.in: Likewise.
	* lsan/Makefile.in: Likewise.
	* sanitizer_common/Makefile.in: Likewise.
	* tsan/Makefile.in: Likewise.
	* ubsan/Makefile.in: Likewise.
2021-11-28 22:40:36 +00:00
Jan Hubicka
2899d49e37 Compare guessed and feedback frequencies during profile feedback stream-in
This patch adds simple code to dump and compare frequencies of basic blocks
read from the profile feedback and frequencies guessed statically.
It dumps basic blocks in the order of decreasing frequencies from feedback
along with guessed frequencies and histograms.

It makes it to possible spot basic blocks in hot regions that are considered
cold by guessed profile or vice versa.

I am trying to figure out how realistic our profile estimate is compared to
read one on exchange2 (looking again into PR98782.  There IRA now places spills
into hot regions of code while with older (and worse) profile it did not.
Catch is that the function is very large and has 9 nested loops, so it is hard
to figure out how to improve the profile estimate and/or IRA.

gcc/ChangeLog:

2021-11-28  Jan Hubicka  <hubicka@ucw.cz>

	* profile.c: Include sreal.h
	(struct bb_stats): New.
	(cmp_stats): New function.
	(compute_branch_probabilities): Output bb stats.
2021-11-28 19:42:45 +01:00
Jan Hubicka
d1471457fc Improve -fprofile-report
Profile-report was never properly updated after switch to new profile
representation.  This patch fixes the way profile mismatches are calculated:
we used to collect separately count and freq mismatches, while now we have
only counts & probabilities.  So we verify
 - in count: that total count of incomming edges is close to acutal count of
   the BB
 - out prob: that total sum of outgoing edge edge probabilities is close
   to 1 (except for BB containing noreturn calls or EH).

Moreover I added dumping of absolute data which is useful to plot them: with
Martin Liska we plan to setup regular testing so we keep optimizers profie
updates bit under control.

Finally I added both static and dynamic stats about mismatches - static one is
simply number of inconsistencies in the cfg while dynamic is scaled by the
profile - I think in order to keep eye on optimizers the first number is quite
relevant. WHile when tracking why code quality regressed the second number
matters more.

2021-11-28  Jan Hubicka  <hubicka@ucw.cz>

	* cfghooks.c: Include sreal.h, profile.h.
	(profile_record_check_consistency): Fix checking of count counsistency;
	record also dynamic mismatches.
	* cfgrtl.c (rtl_account_profile_record): Similarly.
	* tree-cfg.c (gimple_account_profile_record): Likewise.
	* cfghooks.h (struct profile_record): Remove num_mismatched_freq_in,
	num_mismatched_freq_out, turn time to double, add
	dyn_mismatched_prob_out, dyn_mismatched_count_in,
	num_mismatched_prob_out; remove num_mismatched_count_out.
	* passes.c (account_profile_1): New function.
	(account_profile_in_list): New function.
	(pass_manager::dump_profile_report): Rewrite.
	(execute_one_ipa_transform_pass): Check profile consistency after
	running all passes.
	(execute_all_ipa_transforms): Remove cfun test; record all transform
	methods.
	(execute_one_pass): Fix collecting of profile stats.
2021-11-28 19:25:33 +01:00
Jakub Jelinek
7393fa8b1d libstdc++: Implement std::byteswap for C++23
This patch attempts to implement P1272R4 (except for the std::bit_cast
changes in there which seem quite unrelated to this and will need to be
fixed on the compiler side).
While at least for GCC __builtin_bswap{16,32,64,128} should work fine
in constant expressions, I wonder about other compilers, so I'm using
a fallback implementation for constexpr evaluation always.
If you think that is unnecessary, I can drop the
__cpp_if_consteval >= 202106L &&
if !consteval
  {
and
  }
and reformat.
The fallback implementation is an attempt to make it work even for integral
types that don't have number of bytes divisible by 2 or when __CHAR_BIT__
is e.g. 16.

2021-11-28  Jakub Jelinek  <jakub@redhat.com>

	* include/std/bit (__cpp_lib_byteswap, byteswap): Define.
	* include/std/version (__cpp_lib_byteswap): Define.
	* testsuite/26_numerics/bit/bit.byteswap/byteswap.cc: New test.
	* testsuite/26_numerics/bit/bit.byteswap/version.cc: New test.
2021-11-28 16:33:33 +01:00
Martin Liska
7a66c4909f d: fix thinko in optimize attr parsing
gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Fix thinko.
2021-11-28 09:39:40 +01:00
GCC Administrator
d62c8c747c Daily bump. 2021-11-28 00:16:20 +00:00
John David Anglin
14dd0921fe Fix typo in t-dimode
2021-11-27  John David Anglin  <danglin@gcc.gnu.org>

libgcc/ChangeLog:

	* config/pa/t-dimode (lib2difuncs): Fix typo.
2021-11-27 21:47:47 +00:00
Petter Tomner
1e53408452 jit: Change printf specifiers for size_t to %zu
Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds.

Signed-off-by
2021-11-27	Petter Tomner	<tomner@kth.se>

gcc/jit/
	* libgccjit.c: %ld -> %zu
2021-11-27 16:45:41 +01:00
Jakub Jelinek
f7e4f57f1c x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]
The following testcase is miscompiled because the x86_{,64_}sh{l,r}d
patterns don't properly describe what the instructions do.  One thing
is left out, in particular that there is initial count &= 63 for
sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}.  And another thing
not described properly, in particular the behavior when count (after the
masking) is 0.  The pattern says it is e.g.
res = (op0 << op2) | (op1 >> (64 - op2))
but that triggers UB on op1 >> 64.  For op2 0 we actually want
res = (op0 << op2) | 0
When constants are propagated to these patterns during RTL optimizations,
both such problems trigger wrong-code issues.
This patch represents the patterns as e.g.
res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63)))
so there is both the initial masking and op2 == 0 behavior results in
zero being ored.
The patch introduces alternate patterns for constant op2 where
simplify-rtx.c will fold those expressions into simple numbers,
and define_insn_and_split pre-reload splitter for how the patterns
looked before into the new form, so that it can pattern match during
combine even computations that assumed the shift amount will be in
the range of 1 .. bitsize-1.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/103431
	* config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd):
	Change insn pattern to accurately describe the instructions.
	(*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New
	define_insn patterns.
	(*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New
	define_insn_and_split patterns.
	(*ashl<dwi>3_doubleword_mask, *ashl<dwi>3_doubleword_mask_1,
	*<insn><dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask_1,
	ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust
	splitters for x86_{,64_}sh{l,r}d pattern changes.

	* gcc.dg/pr103431.c: New test.
2021-11-27 13:02:06 +01:00
Jakub Jelinek
567d5f3d62 bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]
On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
  /* Find real size of result (highest non-zero byte).  */
  if (n->base_addr)
    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
  else
    rsize = n->range;
The shifts then shift uint64_t by 64 bits.  For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/103435
	* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
	n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
	case.
2021-11-27 13:00:55 +01:00
Roger Sayle
d9c8a0238f [Committed] Fix new ivopts-[89].c test cases for -m32.
2021-11-27  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
	* gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32.
	* gcc.dg/tree-ssa/ivopts-9.c: Likewise.
2021-11-27 10:13:31 +00:00
GCC Administrator
f4ed2e3ae7 Daily bump. 2021-11-27 00:16:19 +00:00
Martin Jambor
9e2e47391b
ipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441)
I forgot that IPA passes before ipa-inline must not return
TODO_cleanup_cfg from their transformation function because ordinary
CFG cleanup does not remove call graph edges associated with removed
call statements but must use
delete_unreachable_blocks_update_callgraph instead.  This patch fixes
that error.

gcc/ChangeLog:

2021-11-26  Martin Jambor  <mjambor@suse.cz>

	PR ipa/103441
	* ipa-prop.c (ipcp_transform_function): Call
	delete_unreachable_blocks_update_callgraph instead of returning
	TODO_cleanup_cfg.
2021-11-27 01:01:46 +01:00
Jonathan Wakely
52b769437a libstdc++: Fix test that fails in C++20 mode
This test was written to verify that the LWG 3265 changes work. But
those changes were superseded by LWG 3435, and the test is now incorrect
according to the current draft. The assignment operator is now
constrained to also require convertibility, which makes the test fail.

Change the Iter type to be convertible from int*, but make it throw an
exception if that conversion is used. Change the test from compile-only
to run, so we verify that the exception isn't thrown.

libstdc++-v3/ChangeLog:

	* testsuite/24_iterators/move_iterator/dr3265.cc: Fix test to
	account for LWG 3435 resolution.
2021-11-26 22:56:51 +00:00
Jonathan Wakely
33adfd0d42 libstdc++: Fix trivial relocation for constexpr std::vector
When implementing constexpr std::vector I added a check for constant
evaluation in vector::_S_use_relocate(), so that we would not try to relocate
trivial objects by using memmove. But I put it in the constexpr function
that decides whether to relocate or not, and calls to that function are
always constant evaluated. This had the effect of disabling relocation
entirely, even in non-constexpr vectors.

This removes the check in _S_use_relocate() and modifies the actual
relocation algorithm, __relocate_a_1, to use the non-trivial
implementation instead of memmove when called during constant
evaluation.

libstdc++-v3/ChangeLog:

	* include/bits/stl_uninitialized.h (__relocate_a_1): Do not use
	memmove during constant evaluation.
	* include/bits/stl_vector.h (vector::_S_use_relocate()): Do not
	check is_constant_evaluated in always-constexpr function.
2021-11-26 22:28:48 +00:00
Jonathan Wakely
76c6be48b7 libstdc++: Remove workaround for FE bug in std::tuple [PR96592]
The FE bug was fixed, so we don't need this workaround now.

libstdc++-v3/ChangeLog:

	PR libstdc++/96592
	* include/std/tuple (tuple::is_constructible): Remove.
2021-11-26 22:26:08 +00:00
Harald Anlauf
4d540c7a4a Fortran: improve check of arguments to the RESHAPE intrinsic
gcc/fortran/ChangeLog:

	PR fortran/103411
	* check.c (gfc_check_reshape): Improve check of size of source
	array for the RESHAPE intrinsic against the given shape when pad
	is not given, and shape is a parameter.  Try other simplifications
	of shape.

gcc/testsuite/ChangeLog:

	PR fortran/103411
	* gfortran.dg/pr68153.f90: Adjust test to improved check.
	* gfortran.dg/reshape_7.f90: Likewise.
	* gfortran.dg/reshape_9.f90: New test.
2021-11-26 21:00:35 +01:00
Iain Sandoe
caa04517e6 libitm: Fix bootstrap for targets without HAVE_ELF_STYLE_WEAKREF.
Recent improvements to null address warnings notice that for
targets that do not support HAVE_ELF_STYLE_WEAKREF the dummy stub
implementation of __cxa_get_globals() means that the address can
never be null.

Fixed by removing the test for such targets.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

libitm/ChangeLog:

	* eh_cpp.cc (GTM::gtm_thread::init_cpp_exceptions): If the
	target does not support HAVE_ELF_STYLE_WEAKREF then do not
	try to test the __cxa_get_globals against NULL.
2021-11-26 19:40:27 +00:00
Siddhesh Poyarekar
4a2007594c tree-object-size: Abstract object_sizes array
Put all accesses to object_sizes behind functions so that we can add
dynamic capability more easily.

gcc/ChangeLog:

	* tree-object-size.c (object_sizes_grow, object_sizes_release,
	object_sizes_unknown_p, object_sizes_get, object_size_set_force,
	object_sizes_set): New functions.
	(addr_object_size, compute_builtin_object_size,
	expr_object_size, call_object_size, unknown_object_size,
	merge_object_sizes, plus_stmt_object_size,
	cond_expr_object_size, collect_object_sizes_for,
	check_for_plus_in_loops_1, init_object_sizes,
	fini_object_sizes): Adjust.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26 23:33:59 +05:30
Siddhesh Poyarekar
35c8bbe96b tree-object-size: Replace magic numbers with enums
A simple cleanup to allow inserting dynamic size code more easily.

gcc/ChangeLog:

	* tree-object-size.c: New enum.
	(object_sizes, computed, addr_object_size,
	compute_builtin_object_size, expr_object_size, call_object_size,
	merge_object_sizes, plus_stmt_object_size,
	collect_object_sizes_for, init_object_sizes, fini_object_sizes,
	object_sizes_execute): Replace magic numbers with enums.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26 23:33:56 +05:30
Roger Sayle
b41be002ed ivopts: Improve code generated for very simple loops.
This patch tidies up the code that GCC generates for simple loops,
by selecting/generating a simpler loop bound expression in ivopts.
The original motivation came from looking at the following loop (from
gcc.target/i386/pr90178.c)

int *find_ptr (int* mem, int sz, int val)
{
  for (int i = 0; i < sz; i++)
    if (mem[i] == val)
      return &mem[i];
  return 0;
}

which GCC currently compiles to:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        leal    -1(%rsi), %ecx
        leaq    4(%rdi,%rcx,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

Notice the relatively complex leal/leaq instructions, that result
from ivopts using the following expression for the loop bound:
inv_expr 2:     ((unsigned long) ((unsigned int) sz_8(D) + 4294967295)
		* 4 + (unsigned long) mem_9(D)) + 4

which results from NITERS being (unsigned int) sz_8(D) + 4294967295,
i.e. (sz - 1), and the logic in cand_value_at determining the bound
as BASE + NITERS*STEP at the start of the final iteration and as
BASE + NITERS*STEP + STEP at the end of the final iteration.

Ideally, we'd like the middle-end optimizers to simplify
BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially
when NITERS already has the form BOUND-1, but with type conversions
and possible overflow to worry about, the above "inv_expr 2" is the
best that can be done by fold (without additional context information).

This patch improves ivopts' cand_value_at by instead of using just
the tree expression for NITERS, passing the data structure that
explains how that expression was derived.  This allows us to peek
under the surface to check that NITERS+1 doesn't overflow, and in
this patch to use the SSA_NAME already holding the required value.

In the motivating loop above, inv_expr 2 now becomes:
(unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D)

And as a result, on x86_64 we now generate:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        movslq  %esi, %rsi
        leaq    (%rdi,%rsi,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

This improvement required one minor tweak to GCC's testsuite for
gcc.dg/wrapped-binop-simplify.c, where we again generate better
code, and therefore no longer find as many optimization opportunities
in later passes (vrp2).

Previously:

void v1 (unsigned long *in, unsigned long *out, unsigned int n)
{
  int i;
  for (i = 0; i < n; i++) {
    out[i] = in[i];
  }
}

on x86_64 generated:
v1:	testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
.L3:	movq    (%rdi,%rax,8), %rcx
        movq    %rcx, (%rsi,%rax,8)
        addq    $1, %rax
        cmpq    %rax, %rdx
        jne     .L3
.L1:	ret

and now instead generates:
v1:	testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
        leaq    0(,%rdx,8), %rcx
.L3:	movq    (%rdi,%rax), %rdx
        movq    %rdx, (%rsi,%rax)
        addq    $8, %rax
        cmpq    %rax, %rcx
        jne     .L3
.L1:	ret

2021-11-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
	tree_niter_desc* argument instead of just a tree for NITER.
	If we require the iv candidate value at the end of the final
	loop iteration, try using the original loop bound as the
	NITER for sufficiently simple loops.
	(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite/ChangeLog
	* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
	* gcc.dg/tree-ssa/ivopts-5.c: New test case.
	* gcc.dg/tree-ssa/ivopts-6.c: New test case.
	* gcc.dg/tree-ssa/ivopts-7.c: New test case.
	* gcc.dg/tree-ssa/ivopts-8.c: New test case.
	* gcc.dg/tree-ssa/ivopts-9.c: New test case.
2021-11-26 17:22:10 +00:00
Jonathan Wakely
665f726b8a libstdc++: Ensure dg-add-options comes after dg-options
This is what the docs say is required.

libstdc++-v3/ChangeLog:

	* testsuite/29_atomics/atomic_float/1.cc: Reorder directives.
2021-11-26 15:11:58 +00:00
Jonathan Wakely
0a12bd92d1 libstdc++: Fix dg-do directive for tests supposed to be run
libstdc++-v3/ChangeLog:

	* testsuite/23_containers/unordered_map/modifiers/move_assign.cc:
	Change dg-do compile to run.
	* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
	Likewise.
2021-11-26 15:11:58 +00:00
Jonathan Wakely
1ecc9ba578 libstdc++: Remove redundant xfail selectors in dg-do compile tests
An 'xfail' selector means the test is expected to fail at runtime, so is
ignored for a compile-only test. The way to mark a compile-only test as
failing is with dg-error (which these already do).

libstdc++-v3/ChangeLog:

	* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
	Remove xfail selector.
	* testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
	Likewise.
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
	Likewise.
	* testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
	Likewise.
	* testsuite/23_containers/span/101411.cc: Likewise.
	* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Likewise.
	* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc:
	Likewise.
	* testsuite/25_algorithms/equal/constexpr_neg.cc: Likewise.
	* testsuite/25_algorithms/equal/debug/constexpr_neg.cc: Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc:
	Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc:
	Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc:
	Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc:
	Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc:
	Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc:
	Likewise.
2021-11-26 15:11:58 +00:00
Martin Liska
f1ec39c86c d: fix ASAN in option processing
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
2021-11-26 14:55:12 +01:00