Commit Graph

188124 Commits

Author SHA1 Message Date
Iain Buclaw
53a4def0dc d: Don't include terminating null pointer in string expression conversion (PR102185)
This gets re-added by the ExprVisitor when lowering StringExp back into a
STRING_CST during the code generator pass.

	PR d/102185

gcc/d/ChangeLog:

	* d-builtins.cc (d_eval_constant_expression): Don't include
	terminating null pointer in string expression conversion.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr102185.d: New test.
2021-09-12 17:36:19 +02:00
Roger Sayle
b195fae7c1 Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.
This patch catches another place in the middle-end where it's possible
to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the
benefit of later RTL optimizations.  This adds the same logic to
expr.c's convert_move as recently added to convert_modes.

On nvptx-none, the simple test program:

short foo (char c) { return c; }

currently generates three instructions:

mov.u32	%r23, %ar0;
cvt.u16.u32     %r24, %r23;
cvt.s32.s16     %value, %r24;

with this patch, we now generate just one:

mov.u32 %value, %ar0;

This patch should look familiar, it's almost identical to the recent patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with
the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html

2021-09-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when
	creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
	subreg.
2021-09-12 15:18:57 +01:00
GCC Administrator
d71126eeea Daily bump. 2021-09-12 00:16:18 +00:00
Ian Lance Taylor
79513dc0b2 compiler: don't pad zero-sized trailing field in results struct
Nothing can take the address of that field anyhow.

Fixes PR go/101994

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/343873
2021-09-11 14:20:19 -07:00
Aldy Hernandez
5485bbebb3 Refactor jump_thread_path_registry.
In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.

The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.

The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
  jt_path_registry ();
  virtual ~jt_path_registry ();
  bool register_jump_thread (vec<jump_thread_edge *> *);
  bool thread_through_all_blocks (bool peel_loop_headers);
  jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
  vec<jump_thread_edge *> *allocate_thread_path ();
protected:
  vec<vec<jump_thread_edge *> *> m_paths;
  unsigned long m_num_threaded_edges;
private:
  virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
  fwd_jt_path_registry ();
  ~fwd_jt_path_registry ();
  void remove_jump_threads_including (edge);
private:
  bool update_cfg (bool peel_loop_headers) override;
  void mark_threaded_blocks (bitmap threaded_blocks);
  bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
  bool thread_block (basic_block, bool noloop_only);
  bool thread_through_loop_header (class loop *loop,
                                   bool may_peel_loop_headers);
  class redirection_data *lookup_redirection_data (edge e, enum insert_option);
  hash_table<struct removed_edges> *m_removed_edges;
  hash_table<redirection_data> *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
  bool update_cfg (bool peel_loop_headers) override;
  void adjust_paths_after_duplication (unsigned curr_path_num);
  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
                              unsigned n_region, unsigned current_path_no);
  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c (class back_threader_registry): Use
	back_jt_path_registry.
	* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
	fwd_jt_path_registry.
	* tree-ssa-threadedge.h (class jump_threader): Same..
	* tree-ssa-threadupdate.c
	(jump_thread_path_registry::jump_thread_path_registry): Rename...
	(jt_path_registry::jt_path_registry): ...to this.
	(jump_thread_path_registry::~jump_thread_path_registry): Rename...
	(jt_path_registry::~jt_path_registry): ...this.
	(fwd_jt_path_registry::fwd_jt_path_registry): New.
	(fwd_jt_path_registry::~fwd_jt_path_registry): New.
	(jump_thread_path_registry::allocate_thread_edge): Rename...
	(jt_path_registry::allocate_thread_edge): ...to this.
	(jump_thread_path_registry::allocate_thread_path): Rename...
	(jt_path_registry::allocate_thread_path): ...to this.
	(jump_thread_path_registry::lookup_redirection_data): Rename...
	(fwd_jt_path_registry::lookup_redirection_data): ...to this.
	(jump_thread_path_registry::thread_block_1): Rename...
	(fwd_jt_path_registry::thread_block_1): ...to this.
	(jump_thread_path_registry::thread_block): Rename...
	(fwd_jt_path_registry::thread_block): ...to this.
	(jt_path_registry::thread_through_loop_header): Rename...
	(fwd_jt_path_registry::thread_through_loop_header): ...to this.
	(jump_thread_path_registry::mark_threaded_blocks): Rename...
	(fwd_jt_path_registry::mark_threaded_blocks): ...to this.
	(jump_thread_path_registry::debug_path): Rename...
	(jt_path_registry::debug_path): ...to this.
	(jump_thread_path_registry::dump): Rename...
	(jt_path_registry::debug): ...to this.
	(jump_thread_path_registry::rewire_first_differing_edge): Rename...
	(back_jt_path_registry::rewire_first_differing_edge): ...to this.
	(jump_thread_path_registry::adjust_paths_after_duplication): Rename...
	(back_jt_path_registry::adjust_paths_after_duplication): ...to this.
	(jump_thread_path_registry::duplicate_thread_path): Rename...
	(back_jt_path_registry::duplicate_thread_path): ...to this.  Also,
	drop ill-formed candidates.
	(jump_thread_path_registry::remove_jump_threads_including): Rename...
	(fwd_jt_path_registry::remove_jump_threads_including): ...to this.
	(jt_path_registry::thread_through_all_blocks): New.
	(back_jt_path_registry::update_cfg): New.
	(fwd_jt_path_registry::update_cfg): New.
	(jump_thread_path_registry::register_jump_thread): Rename...
	(jt_path_registry::register_jump_thread): ...to this.
	* tree-ssa-threadupdate.h (class jump_thread_path_registry):
	Abstract to...
	(class jt_path_registry): ...here.
	(class fwd_jt_path_registry): New.
	(class back_jt_path_registry): New.
2021-09-11 19:51:30 +02:00
Jakub Jelinek
3fca63b0b6 testsuite: Fix c-c++-common/auto-init-* tests
> > 2021-08-20  qing zhao  <qing.zhao@oracle.com>
> >
> >        * c-c++-common/auto-init-1.c: New test.
> >        * c-c++-common/auto-init-10.c: New test.
> >        * c-c++-common/auto-init-11.c: New test.
> >        * c-c++-common/auto-init-12.c: New test.
> >        * c-c++-common/auto-init-13.c: New test.
> >        * c-c++-common/auto-init-14.c: New test.
> >        * c-c++-common/auto-init-15.c: New test.
> >        * c-c++-common/auto-init-16.c: New test.
> >        * c-c++-common/auto-init-2.c: New test.
> >        * c-c++-common/auto-init-3.c: New test.
> >        * c-c++-common/auto-init-4.c: New test.
> >        * c-c++-common/auto-init-5.c: New test.
> >        * c-c++-common/auto-init-6.c: New test.
> >        * c-c++-common/auto-init-7.c: New test.
> >        * c-c++-common/auto-init-8.c: New test.
> >        * c-c++-common/auto-init-9.c: New test.
> >        * c-c++-common/auto-init-esra.c: New test.
> >        * c-c++-common/auto-init-padding-1.c: New test.
> >        * c-c++-common/auto-init-padding-2.c: New test.
> >        * c-c++-common/auto-init-padding-3.c: New test.

This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.

The main problem is hardcoding type sizes and structure layout expectations
that are valid only on some lp64 targets.
On ilp32 long and pointer are 32-bit, and there are targets that are neither
ilp32 nor lp64 and there even other sizes can't be taken for granted.
Also, long double depending on target and options is either 8, 12 or 16 byte
(the first one when it is the same as double, the second e.g. for ia32
extended long double (which is under the hood 10 byte), the last either
the same hw type on x86_64 or IBM double double or IEEE quad).
In the last test, one problem is that unsigned long is on ilp32 32-bit
instead of 64-bit, but even just changing to long long is not enough,
as long long in structures on ia32 is only 4 byte aligned instead of 8.

Tested on x86_64-linux -m32/-m64, ok for trunk?

Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
deal with that right now, just try
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*'
Guess some of those tests should be restricted to lp64 in there, others
where it might be easier to check all of lp64, x32 and ia32 code generation
could have different matches.  Wonder also about the aarch64 tests, there is
also -mabi=ilp32...
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfefefefefefefefe" 3
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfffffffffefefefe" 2
+FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t\\\\%xmm0, \\\\%xmm0 3
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\\\.long\\t0 14
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\)\\\\)" 3
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 2
+FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-2.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\\t\\\\\$16,
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
+FAIL: gcc.target/i386/auto-init-padding-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-5.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq

2021-09-11  Jakub Jelinek  <jakub@redhat.com>

	* c-c++-common/auto-init-1.c: Enable test only on ilp32 or lp64
	targets, expect different long and pointer sizes between ilp32 and
	lp64.
	* c-c++-common/auto-init-2.c: Likewise.
	* c-c++-common/auto-init-3.c: Expect one of the common long double
	sizes (8/12/16 bytes) instead of hardcoding 16 bytes.
	* c-c++-common/auto-init-4.c: Likewise.
	* c-c++-common/auto-init-5.c: Expect one of the common
	_Complex long double sizes (16/24/32 bytes) instead of hardcoding 32
	bytes.
	* c-c++-common/auto-init-6.c: Likewise.
	* c-c++-common/auto-init-padding-1.c: Enable test only on ilp32 or lp64
	targets.
	(struct test_small_hole): Change type of four to unsigned long long
	and add aligned attribute.
2021-09-11 13:48:52 +02:00
GCC Administrator
a26206ec7b Daily bump. 2021-09-11 00:16:27 +00:00
Petter Tomner
332a9f7636 libgccjit: Generate debug info for variables
Finalize declares via available helpers after location is set. Set
TYPE_NAME of primitives and friends to "int" etc. Debug info is now
set properly for variables.

Signed-off-by:
2021-09-09	Petter Tomner	<tomner@kth.se>

gcc/jit/
	* jit-playback.c: Moved global var processing to after loc handling.
	  Setting TYPE_NAME for fundamental types.
	  Using common functions for finalizing globals.
	* jit-playback.h: New method init_types().
	  Changed get_tree_node_for_type() to method.

gcc/testsuite/
	* jit.dg/test-error-array-bounds.c: Array is not unsigned
	* jit.dg/jit.exp: Helper function
	* jit.dg/test-debuginfo.c: New testcase
2021-09-11 01:00:48 +02:00
liuhongt
57b7c432cc Revert "Get rid of all float-int special cases in validate_subreg."
This reverts commit d2874d9056.

PR target/102254
PR target/102154
PR target/102211
2021-09-11 05:55:44 +08:00
Petter Tomner
f75e524278 MAINTAINERS: Adding myself to to DCO and write after approval
2020-09-10	Petter Tomner	<tomner@kth.se>

ChangeLog:
	* MAINTAINERS: Me added to DCO and write after approval
2021-09-10 21:43:10 +02:00
Jakub Jelinek
8122fbff77 openmp: Implement OpenMP 5.1 atomics, so far for C only
This patch implements OpenMP 5.1 atomics (with clarifications from upcoming 5.2).
The most important changes are that it is now possible to write (for C/C++,
for Fortran it was possible before already) min/max atomics and more importantly
compare and exchange in various forms.
Also, acq_rel is now allowed on read/write and acq_rel/acquire are allowed on
update, and there are new compare, weak and fail clauses.

2021-09-10  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum omp_memory_order): Add OMP_MEMORY_ORDER_MASK,
	OMP_FAIL_MEMORY_ORDER_UNSPECIFIED, OMP_FAIL_MEMORY_ORDER_RELAXED,
	OMP_FAIL_MEMORY_ORDER_ACQUIRE, OMP_FAIL_MEMORY_ORDER_RELEASE,
	OMP_FAIL_MEMORY_ORDER_ACQ_REL, OMP_FAIL_MEMORY_ORDER_SEQ_CST and
	OMP_FAIL_MEMORY_ORDER_MASK enumerators.
	(OMP_FAIL_MEMORY_ORDER_SHIFT): Define.
	* gimple-pretty-print.c (dump_gimple_omp_atomic_load,
	dump_gimple_omp_atomic_store): Print [weak] for weak atomic
	load/store.
	* gimple.h (enum gf_mask): Change GF_OMP_ATOMIC_MEMORY_ORDER
	to 6-bit mask, adjust GF_OMP_ATOMIC_NEED_VALUE value and add
	GF_OMP_ATOMIC_WEAK.
	(gimple_omp_atomic_weak_p, gimple_omp_atomic_set_weak): New inline
	functions.
	* tree.h (OMP_ATOMIC_WEAK): Define.
	* tree-pretty-print.c (dump_omp_atomic_memory_order): Adjust for
	fail memory order being encoded in the same enum and also print
	fail clause if present.
	(dump_generic_node): Print weak clause if OMP_ATOMIC_WEAK.
	* gimplify.c (goa_stabilize_expr): Add target_expr and rhs arguments,
	handle pre_p == NULL case as a test mode that only returns value
	but doesn't change gimplify nor change anything otherwise, adjust
	recursive calls, add MODIFY_EXPR, ADDR_EXPR, COND_EXPR, TARGET_EXPR
	and CALL_EXPR handling, adjust COMPOUND_EXPR handling for
	__builtin_clear_padding calls, for !rhs gimplify as lvalue rather
	than rvalue.
	(gimplify_omp_atomic): Adjust goa_stabilize_expr caller.  Handle
	COND_EXPR rhs.  Set weak flag on gimple load/store for
	OMP_ATOMIC_WEAK.
	* omp-expand.c (omp_memory_order_to_fail_memmodel): New function.
	(omp_memory_order_to_memmodel): Adjust for fail clause encoded
	in the same enum.
	(expand_omp_atomic_cas): New function.
	(expand_omp_atomic_pipeline): Use omp_memory_order_to_fail_memmodel
	function.
	(expand_omp_atomic): Attempt to optimize atomic compare and exchange
	using expand_omp_atomic_cas.
gcc/c-family/
	* c-common.h (c_finish_omp_atomic): Add r and weak arguments.
	* c-omp.c: Include gimple-fold.h.
	(c_finish_omp_atomic): Add r and weak arguments.  Add support for
	OpenMP 5.1 atomics.
gcc/c/
	* c-parser.c (c_parser_conditional_expression): If omp_atomic_lhs and
	cond.value is >, < or == with omp_atomic_lhs as one of the operands,
	don't call build_conditional_expr, instead build a COND_EXPR directly.
	(c_parser_binary_expression): Avoid calling parser_build_binary_op
	if omp_atomic_lhs even in more cases for >, < or ==.
	(c_parser_omp_atomic): Update function comment for OpenMP 5.1 atomics,
	parse OpenMP 5.1 atomics and fail, compare and weak clauses, allow
	acq_rel on atomic read/write and acq_rel/acquire clauses on update.
	* c-typeck.c (build_binary_op): For flag_openmp only handle
	MIN_EXPR/MAX_EXPR.
gcc/cp/
	* parser.c (cp_parser_omp_atomic): Allow acq_rel on atomic read/write
	and acq_rel/acquire clauses on update.
	* semantics.c (finish_omp_atomic): Adjust c_finish_omp_atomic caller.
gcc/testsuite/
	* c-c++-common/gomp/atomic-17.c (foo): Add tests for atomic read,
	write or update with acq_rel clause and atomic update with acquire clause.
	* c-c++-common/gomp/atomic-18.c (foo): Adjust expected diagnostics
	wording, remove tests moved to atomic-17.c.
	* c-c++-common/gomp/atomic-21.c: Expect only 2 omp atomic release and
	2 omp atomic acq_rel directives instead of 4 omp atomic release.
	* c-c++-common/gomp/atomic-25.c: New test.
	* c-c++-common/gomp/atomic-26.c: New test.
	* c-c++-common/gomp/atomic-27.c: New test.
	* c-c++-common/gomp/atomic-28.c: New test.
	* c-c++-common/gomp/atomic-29.c: New test.
	* c-c++-common/gomp/atomic-30.c: New test.
	* c-c++-common/goacc-gomp/atomic.c: Expect 1 omp atomic release and
	1 omp atomic_acq_rel instead of 2 omp atomic release directives.
	* gcc.dg/gomp/atomic-5.c: Adjust expected error diagnostic wording.
	* g++.dg/gomp/atomic-18.C:Expect 4 omp atomic release and
	1 omp atomic_acq_rel instead of 5 omp atomic release directives.
libgomp/
	* testsuite/libgomp.c-c++-common/atomic-19.c: New test.
	* testsuite/libgomp.c-c++-common/atomic-20.c: New test.
	* testsuite/libgomp.c-c++-common/atomic-21.c: New test.
2021-09-10 20:41:33 +02:00
Ian Lance Taylor
b7f84702b3 compiler: correct condition for calling memclrHasPointers
When compiling append(s, make([]typ, ln)...), where typ has a pointer,
and the append fits within the existing capacity of s, the condition
used to clear out the new elements was reversed.

Fixes golang/go#47771

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/344189
2021-09-10 11:14:06 -07:00
Aldy Hernandez
01b5038718 Disable threading through latches until after loop optimizations.
The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

	* tree-pass.h (PROP_loop_opts_done): New.
	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
	Intersect with global range.
	* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Disable
	threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
	threading through latches.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>
2021-09-10 18:41:51 +02:00
David Faust
fb88bf9931 doc: document BPF -mcpu and related options
This commit adds documentation for the new BPF options -mcpu, -mjmpext,
-mjmp32, and -malu32.

gcc/ChangeLog:
	* doc/invoke.texi: Document BPF -mcpu, -mjmpext, -mjmp32 and -malu32
	options.
2021-09-10 09:06:58 -07:00
David Faust
ae1cce71fa bpf testsuite: add tests for new feature options
This commit adds tests for the new -mjmpext, -mjmp32 and -malu32 feature
options in the BPF backend.

gcc/testsuite/ChangeLog:
	* gcc.target/bpf/alu-1.c: New test.
	* gcc.target/bpf/jmp-1.c: New test.
2021-09-10 09:06:58 -07:00
David Faust
5b2ab1d35e bpf: add -mcpu and related feature options
New instructions have been added over time to the eBPF ISA, but
previously there has been no good method to select which version to
target in GCC.

This patch adds the following options to the BPF backend:

  -mcpu={v1, v2, v3}
    Select which version of the eBPF ISA to target. This enables or
    disables generation of certain instructions. The default is v3.

  -mjmpext
    Enable extra conditional branch instructions.
    Enabled for CPU v2 and above.

  -mjmp32
    Enable 32-bit jump/branch instructions.
    Enabled for CPU v3 and above.

  -malu32
    Enable 32-bit ALU instructions.
    Enabled for CPU v3 and above.

gcc/ChangeLog:
	* config/bpf/bpf-opts.h (bpf_isa_version): New enum.
	* config/bpf/bpf-protos.h (bpf_expand_cbranch): New.
	* config/bpf/bpf.c (bpf_option_override): Handle -mcpu option.
	(bpf_expand_cbranch): New function.
	* config/bpf/bpf.md (AM mode iterator): Conditionalize support for SI
	mode.
	(zero_extendsidi2): Only use mov32 instruction if it is available.
	(SIM mode iterator): Conditionalize support for SI mode.
	(JM mode iterator): New.
	(cbranchdi4): Update name, use new JM iterator. Use bpf_expand_cbranch.
	(*branch_on_di): Update name, use new JM iterator.
	* config/bpf/bpf.opt: (mjmpext): New option.
	(malu32): Likewise.
	(mjmp32): Likewise.
	(mcpu): Likewise.
	(bpf_isa): New enum.
2021-09-10 09:06:58 -07:00
David Faust
4f0f696fea bpf: correct zero_extend output templates
The output templates for zero_extendhidi2 and zero_extendqidi2 could
lead to incorrect code generation when zero-extending one register into
another. This patch adds a new output template to the define_insns to
handle such cases and produce correct asm.

gcc/ChangeLog:
	* config/bpf/bpf.md (zero_extendhidi2): Add new output template
	for register-to-register extensions.
	(zero_extendqidi2): Likewise.
2021-09-10 09:00:27 -07:00
Jonathan Wakely
7f8af6dc82 libstdc++: Use "test.invalid." for invalid hostname
This avoids test.invalid.some.domain being successfully resolved.

libstdc++-v3/ChangeLog:

	* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
	Fix invalid hostname to only match the .invalid TLD.
2021-09-10 15:10:21 +01:00
Richard Biener
79f488de30 middle-end/102273 - avoid ICE with auto-init and nested functions
This refactors expansion to consider non-decl LHS.  I suspect
the is_val argument is not needed.

2021-09-10  Richard Biener  <rguenther@suse.de>

	PR middle-end/102273
	* internal-fn.c (expand_DEFERRED_INIT): Always expand non-SSA vars.

	* gcc.dg/pr102273.c: New testcase.
2021-09-10 13:51:42 +02:00
Thomas Schwinge
5c5c2d86e5 Fix 'dg-do run' syntax in 'c-c++-common/auto-init-padding-{2,3}.c'
Fix-up for recent commit a25e0b5e6a
 "Add -ftrivial-auto-var-init option and uninitialized variable attribute".

	gcc/testsuite/
	* c-c++-common/auto-init-padding-2.c: Fix 'dg-do run' syntax.
	* c-c++-common/auto-init-padding-3.c: Likewise.
2021-09-10 11:26:50 +02:00
Richard Biener
1dae802b68 middle-end/102269 - avoid auto-init of empty types
This avoids initializing empty types for which we'll eventually
leave a .DEFERRED_INIT call without a LHS.

2021-09-10  Richard Biener  <rguenther@suse.de>

	PR middle-end/102269
	* gimplify.c (is_var_need_auto_init): Empty types do not need
	initialization.

	* gcc.dg/pr102269.c: New testcase.
2021-09-10 11:10:59 +02:00
Richard Biener
f7523dbc2d Remove vestiges of --with-stabs
This removes the --with-stabs configure option which had no effect
since quite some time.

2021-09-10  Richard Biener  <rguenther@suse.de>

	* configure.ac (--with-stabs): Remove.
	* configure: Regenerate.
	* doc/install.texi: Remove --with-stabs documentation.
2021-09-10 11:10:59 +02:00
liuhongt
1e77bcbc7a AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h
	(check_results_mask): New check_function.
	* gcc.target/i386/avx512fp16-vcmpph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.
2021-09-10 14:59:31 +08:00
liuhongt
0f200733fe AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.
gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_cmp_ph_mask):
	New intrinsic.
	(_mm512_mask_cmp_ph_mask): Likewise.
	(_mm512_cmp_round_ph_mask): Likewise.
	(_mm512_mask_cmp_round_ph_mask): Likewise.
	(_mm_cmp_sh_mask): Likewise.
	(_mm_mask_cmp_sh_mask): Likewise.
	(_mm_cmp_round_sh_mask): Likewise.
	(_mm_mask_cmp_round_sh_mask): Likewise.
	(_mm_comieq_sh): Likewise.
	(_mm_comilt_sh): Likewise.
	(_mm_comile_sh): Likewise.
	(_mm_comigt_sh): Likewise.
	(_mm_comige_sh): Likewise.
	(_mm_comineq_sh): Likewise.
	(_mm_ucomieq_sh): Likewise.
	(_mm_ucomilt_sh): Likewise.
	(_mm_ucomile_sh): Likewise.
	(_mm_ucomigt_sh): Likewise.
	(_mm_ucomige_sh): Likewise.
	(_mm_ucomineq_sh): Likewise.
	(_mm_comi_round_sh): Likewise.
	(_mm_comi_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cmp_ph_mask): New intrinsic.
	(_mm_mask_cmp_ph_mask): Likewise.
	(_mm256_cmp_ph_mask): Likewise.
	(_mm256_mask_cmp_ph_mask): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/i386.md (ssevecmode): Add HF mode.
	(MODEFH): New mode iterator.
	* config/i386/sse.md
	(V48H_AVX512VL): New mode iterator to support HF vector modes.
	Ajdust corresponding description.
	(ssecmpintprefix): New.
	(VI12_AVX512VL): Adjust to support HF vector modes.
	(cmp_imm_predicate): Likewise.
	(<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>):
	Likewise.
	(avx512f_vmcmp<mode>3<round_saeonly_name>): Likewise.
	(avx512f_vmcmp<mode>3_mask<round_saeonly_name>): Likewise.
	(<sse>_<unord>comi<round_saeonly_name>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
2021-09-10 14:59:31 +08:00
liuhongt
98da680f69 AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vmaxph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vmaxph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmaxsh-1.c: Ditto.
	* gcc.target/i386/avx512fp16-vmaxsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vminph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vminph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vminsh-1.c: Ditto.
	* gcc.target/i386/avx512fp16-vminsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmaxph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vminph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.
2021-09-10 14:59:31 +08:00
liuhongt
b96cb2caa9 AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.
gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_max_ph): New intrinsic.
	(_mm512_mask_max_ph): Likewise.
	(_mm512_maskz_max_ph): Likewise.
	(_mm512_min_ph): Likewise.
	(_mm512_mask_min_ph): Likewise.
	(_mm512_maskz_min_ph): Likewise.
	(_mm512_max_round_ph): Likewise.
	(_mm512_mask_max_round_ph): Likewise.
	(_mm512_maskz_max_round_ph): Likewise.
	(_mm512_min_round_ph): Likewise.
	(_mm512_mask_min_round_ph): Likewise.
	(_mm512_maskz_min_round_ph): Likewise.
	(_mm_max_sh): Likewise.
	(_mm_mask_max_sh): Likewise.
	(_mm_maskz_max_sh): Likewise.
	(_mm_min_sh): Likewise.
	(_mm_mask_min_sh): Likewise.
	(_mm_maskz_min_sh): Likewise.
	(_mm_max_round_sh): Likewise.
	(_mm_mask_max_round_sh): Likewise.
	(_mm_maskz_max_round_sh): Likewise.
	(_mm_min_round_sh): Likewise.
	(_mm_mask_min_round_sh): Likewise.
	(_mm_maskz_min_round_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_max_ph): New intrinsic.
	(_mm256_max_ph): Likewise.
	(_mm_mask_max_ph): Likewise.
	(_mm256_mask_max_ph): Likewise.
	(_mm_maskz_max_ph): Likewise.
	(_mm256_maskz_max_ph): Likewise.
	(_mm_min_ph): Likewise.
	(_mm256_min_ph): Likewise.
	(_mm_mask_min_ph): Likewise.
	(_mm256_mask_min_ph): Likewise.
	(_mm_maskz_min_ph): Likewise.
	(_mm256_maskz_min_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	* config/i386/sse.md
	(<code><mode>3<mask_name><round_saeonly_name>): Adjust to
	support HF vector modes.
	(*<code><mode>3<mask_name><round_saeonly_name>): Likewise.
	(ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>):
	Likewise.
	(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
	Likewise.
	* config/i386/subst.md (round_saeonly_mode512bit_condition):
	Adjust for HF vector modes.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
2021-09-10 14:59:30 +08:00
liuhongt
63d7c9dd66 AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vaddsh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto.
	* gcc.target/i386/pr54855-11.c: Ditto.
2021-09-10 14:59:30 +08:00
Liu, Hongtao
71838266e7 AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.
gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic.
	(_mm_mask_add_sh): Likewise.
	(_mm_maskz_add_sh): Likewise.
	(_mm_sub_sh): Likewise.
	(_mm_mask_sub_sh): Likewise.
	(_mm_maskz_sub_sh): Likewise.
	(_mm_mul_sh): Likewise.
	(_mm_mask_mul_sh): Likewise.
	(_mm_maskz_mul_sh): Likewise.
	(_mm_div_sh): Likewise.
	(_mm_mask_div_sh): Likewise.
	(_mm_maskz_div_sh): Likewise.
	(_mm_add_round_sh): Likewise.
	(_mm_mask_add_round_sh): Likewise.
	(_mm_maskz_add_round_sh): Likewise.
	(_mm_sub_round_sh): Likewise.
	(_mm_mask_sub_round_sh): Likewise.
	(_mm_maskz_sub_round_sh): Likewise.
	(_mm_mul_round_sh): Likewise.
	(_mm_mask_mul_round_sh): Likewise.
	(_mm_maskz_mul_round_sh): Likewise.
	(_mm_div_round_sh): Likewise.
	(_mm_mask_div_round_sh): Likewise.
	(_mm_maskz_div_round_sh): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_round_builtin): Handle new builtins.
	* config/i386/sse.md (VF_128): Change description.
	(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>):
	Adjust to support HF vector modes.
	(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>):
	Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
2021-09-10 14:59:30 +08:00
H.J. Lu
d959312b42 AVX512FP16: Enable _Float16 autovectorization
gcc/ChangeLog:

	* config/i386/i386-expand.c
	(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
	* config/i386/i386.c
	(ix86_preferred_simd_mode): Handle HF mode.
	* config/i386/sse.md (V_256H): New mode iterator.
	(avx_vextractf128<mode>): Use it.
	(VEC_INIT_MODE): Align vector HFmode condition to vector
	HImodes since there're no real HF instruction used.
	(VEC_INIT_HALF_MODE): Ditto.
	(VIHF): Ditto.
	(VIHF_AVX512BW): Ditto.
	(*vec_extracthf): Ditto.
	(VEC_EXTRACT_MODE): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-float16-1.c: New test.
	* gcc.target/i386/vect-float16-10.c: Ditto.
	* gcc.target/i386/vect-float16-11.c: Ditto.
	* gcc.target/i386/vect-float16-12.c: Ditto.
	* gcc.target/i386/vect-float16-2.c: Ditto.
	* gcc.target/i386/vect-float16-3.c: Ditto.
	* gcc.target/i386/vect-float16-4.c: Ditto.
	* gcc.target/i386/vect-float16-5.c: Ditto.
	* gcc.target/i386/vect-float16-6.c: Ditto.
	* gcc.target/i386/vect-float16-7.c: Ditto.
	* gcc.target/i386/vect-float16-8.c: Ditto.
	* gcc.target/i386/vect-float16-9.c: Ditto.
2021-09-10 14:59:30 +08:00
Richard Biener
0458154caa Remove dbx.h, do not set PREFERRED_DEBUGGING_TYPE from dbxcoff.h, lynx.h
The following removes the unused config/dbx.h file and removes the
setting of PREFERRED_DEBUGGING_TYPE from dbxcoff.h which is
overridden by all users (djgpp/mingw/cygwin) via either including
config/i386/djgpp.h or config/i386/cygming.h

There are still circumstances where mingw and cygwin default to
STABS, namely when HAVE_GAS_PE_SECREL32_RELOC is not defined and
the target defaults to 32bit code generation.

The new style handling DBX_DEBUGGING_INFO is in line with
dbxelf.h which does not define PREFERRED_DEBUGGING_TYPE either.

The patch also removes the PREFERRED_DEBUGGING_TYPE define from
lynx.h which always follows elfos.h already defaulting to DWARF,
so the comment about STABS being the default is misleading and
outdated.

2021-09-09  Richard Biener  <rguenther@suse.de>

	PR target/102255
	* config/dbx.h: Remove.
	* config/dbxcoff.h: Do not define PREFERRED_DEBUGGING_TYPE.
	* config/lynx.h: Likewise.
2021-09-10 07:59:15 +02:00
liuhongt
60efb1fee9 Remove copysign post_reload splitter for scalar modes.
It can generate better code just like avx512dq-abs-copysign-1.c
shows.

gcc/ChangeLog:

	* config/i386/i386-expand.c (ix86_expand_copysign): Expand
	right into ANDNOT + AND + IOR, using paradoxical subregs.
	(ix86_split_copysign_const): Remove.
	(ix86_split_copysign_var): Ditto.
	* config/i386/i386-protos.h (ix86_split_copysign_const): Dotto.
	(ix86_split_copysign_var): Ditto.
	* config/i386/i386.md (@copysign<mode>3_const): Ditto.
	(@copysign<mode>3_var): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512dq-abs-copysign-1.c: Adjust testcase.
	* gcc.target/i386/avx512vl-abs-copysign-1.c: Adjust testcase.
2021-09-10 12:29:28 +08:00
GCC Administrator
f84e2f0b7b Daily bump. 2021-09-10 00:16:31 +00:00
qing zhao
a25e0b5e6a Add -ftrivial-auto-var-init option and uninitialized variable attribute.
Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.

gcc/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* builtins.c (expand_builtin_memset): Make external visible.
	* builtins.h (expand_builtin_memset): Declare extern.
	* common.opt (ftrivial-auto-var-init=): New option.
	* doc/extend.texi: Document the uninitialized attribute.
	* doc/invoke.texi: Document -ftrivial-auto-var-init.
	* flag-types.h (enum auto_init_type): New enumerated type
	auto_init_type.
	* gimple-fold.c (clear_padding_type): Add one new parameter.
	(clear_padding_union): Likewise.
	(clear_padding_emit_loop): Likewise.
	(clear_type_padding_in_mask): Likewise.
	(gimple_fold_builtin_clear_padding): Handle this new parameter.
	* gimplify.c (gimple_add_init_for_auto_var): New function.
	(gimple_add_padding_init_for_auto_var): New function.
	(is_var_need_auto_init): New function.
	(gimplify_decl_expr): Add initialization to automatic variables per
	users' requests.
	(gimplify_call_expr): Add one new parameter for call to
	__builtin_clear_padding.
	(gimplify_init_constructor): Add padding initialization in the end.
	* internal-fn.c (INIT_PATTERN_VALUE): New macro.
	(expand_DEFERRED_INIT): New function.
	* internal-fn.def (DEFERRED_INIT): New internal function.
	* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
	* tree-sra.c (generate_subtree_deferred_init): New function.
	(scan_function): Avoid setting cannot_scalarize_away_bitmap for
	calls to .DEFERRED_INIT.
	(sra_modify_deferred_init): New function.
	(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
	* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
	* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
	specially.
	(check_defs): Likewise.
	(warn_uninitialized_vars): Likewise.
	* tree-ssa.c (ssa_undefined_value_p): Likewise.
	* tree.c (build_common_builtin_nodes): Build tree node for
	BUILT_IN_CLEAR_PADDING when needed.

gcc/c-family/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* c-attribs.c (handle_uninitialized_attribute): New function.
	(c_common_attribute_table): Add "uninitialized" attribute.

gcc/testsuite/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

	* c-c++-common/auto-init-1.c: New test.
	* c-c++-common/auto-init-10.c: New test.
	* c-c++-common/auto-init-11.c: New test.
	* c-c++-common/auto-init-12.c: New test.
	* c-c++-common/auto-init-13.c: New test.
	* c-c++-common/auto-init-14.c: New test.
	* c-c++-common/auto-init-15.c: New test.
	* c-c++-common/auto-init-16.c: New test.
	* c-c++-common/auto-init-2.c: New test.
	* c-c++-common/auto-init-3.c: New test.
	* c-c++-common/auto-init-4.c: New test.
	* c-c++-common/auto-init-5.c: New test.
	* c-c++-common/auto-init-6.c: New test.
	* c-c++-common/auto-init-7.c: New test.
	* c-c++-common/auto-init-8.c: New test.
	* c-c++-common/auto-init-9.c: New test.
	* c-c++-common/auto-init-esra.c: New test.
	* c-c++-common/auto-init-padding-1.c: New test.
	* c-c++-common/auto-init-padding-2.c: New test.
	* c-c++-common/auto-init-padding-3.c: New test.
	* g++.dg/auto-init-uninit-pred-1_a.C: New test.
	* g++.dg/auto-init-uninit-pred-2_a.C: New test.
	* g++.dg/auto-init-uninit-pred-3_a.C: New test.
	* g++.dg/auto-init-uninit-pred-4.C: New test.
	* gcc.dg/auto-init-sra-1.c: New test.
	* gcc.dg/auto-init-sra-2.c: New test.
	* gcc.dg/auto-init-uninit-1.c: New test.
	* gcc.dg/auto-init-uninit-12.c: New test.
	* gcc.dg/auto-init-uninit-13.c: New test.
	* gcc.dg/auto-init-uninit-14.c: New test.
	* gcc.dg/auto-init-uninit-15.c: New test.
	* gcc.dg/auto-init-uninit-16.c: New test.
	* gcc.dg/auto-init-uninit-17.c: New test.
	* gcc.dg/auto-init-uninit-18.c: New test.
	* gcc.dg/auto-init-uninit-19.c: New test.
	* gcc.dg/auto-init-uninit-2.c: New test.
	* gcc.dg/auto-init-uninit-20.c: New test.
	* gcc.dg/auto-init-uninit-21.c: New test.
	* gcc.dg/auto-init-uninit-22.c: New test.
	* gcc.dg/auto-init-uninit-23.c: New test.
	* gcc.dg/auto-init-uninit-24.c: New test.
	* gcc.dg/auto-init-uninit-25.c: New test.
	* gcc.dg/auto-init-uninit-26.c: New test.
	* gcc.dg/auto-init-uninit-3.c: New test.
	* gcc.dg/auto-init-uninit-34.c: New test.
	* gcc.dg/auto-init-uninit-36.c: New test.
	* gcc.dg/auto-init-uninit-37.c: New test.
	* gcc.dg/auto-init-uninit-4.c: New test.
	* gcc.dg/auto-init-uninit-5.c: New test.
	* gcc.dg/auto-init-uninit-6.c: New test.
	* gcc.dg/auto-init-uninit-8.c: New test.
	* gcc.dg/auto-init-uninit-9.c: New test.
	* gcc.dg/auto-init-uninit-A.c: New test.
	* gcc.dg/auto-init-uninit-B.c: New test.
	* gcc.dg/auto-init-uninit-C.c: New test.
	* gcc.dg/auto-init-uninit-H.c: New test.
	* gcc.dg/auto-init-uninit-I.c: New test.
	* gcc.target/aarch64/auto-init-1.c: New test.
	* gcc.target/aarch64/auto-init-2.c: New test.
	* gcc.target/aarch64/auto-init-3.c: New test.
	* gcc.target/aarch64/auto-init-4.c: New test.
	* gcc.target/aarch64/auto-init-5.c: New test.
	* gcc.target/aarch64/auto-init-6.c: New test.
	* gcc.target/aarch64/auto-init-7.c: New test.
	* gcc.target/aarch64/auto-init-8.c: New test.
	* gcc.target/aarch64/auto-init-padding-1.c: New test.
	* gcc.target/aarch64/auto-init-padding-10.c: New test.
	* gcc.target/aarch64/auto-init-padding-11.c: New test.
	* gcc.target/aarch64/auto-init-padding-12.c: New test.
	* gcc.target/aarch64/auto-init-padding-2.c: New test.
	* gcc.target/aarch64/auto-init-padding-3.c: New test.
	* gcc.target/aarch64/auto-init-padding-4.c: New test.
	* gcc.target/aarch64/auto-init-padding-5.c: New test.
	* gcc.target/aarch64/auto-init-padding-6.c: New test.
	* gcc.target/aarch64/auto-init-padding-7.c: New test.
	* gcc.target/aarch64/auto-init-padding-8.c: New test.
	* gcc.target/aarch64/auto-init-padding-9.c: New test.
	* gcc.target/i386/auto-init-1.c: New test.
	* gcc.target/i386/auto-init-2.c: New test.
	* gcc.target/i386/auto-init-21.c: New test.
	* gcc.target/i386/auto-init-22.c: New test.
	* gcc.target/i386/auto-init-23.c: New test.
	* gcc.target/i386/auto-init-24.c: New test.
	* gcc.target/i386/auto-init-3.c: New test.
	* gcc.target/i386/auto-init-4.c: New test.
	* gcc.target/i386/auto-init-5.c: New test.
	* gcc.target/i386/auto-init-6.c: New test.
	* gcc.target/i386/auto-init-7.c: New test.
	* gcc.target/i386/auto-init-8.c: New test.
	* gcc.target/i386/auto-init-padding-1.c: New test.
	* gcc.target/i386/auto-init-padding-10.c: New test.
	* gcc.target/i386/auto-init-padding-11.c: New test.
	* gcc.target/i386/auto-init-padding-12.c: New test.
	* gcc.target/i386/auto-init-padding-2.c: New test.
	* gcc.target/i386/auto-init-padding-3.c: New test.
	* gcc.target/i386/auto-init-padding-4.c: New test.
	* gcc.target/i386/auto-init-padding-5.c: New test.
	* gcc.target/i386/auto-init-padding-6.c: New test.
	* gcc.target/i386/auto-init-padding-7.c: New test.
	* gcc.target/i386/auto-init-padding-8.c: New test.
	* gcc.target/i386/auto-init-padding-9.c: New test.
2021-09-09 15:44:49 -07:00
Harald Anlauf
5fe0865ab7 Fortran - out of bounds in array constructor with implied do loop
gcc/fortran/ChangeLog:

	PR fortran/98490
	* trans-expr.c (gfc_conv_substring): Do not generate substring
	bounds check for implied do loop index variable before it actually
	becomes defined.

gcc/testsuite/ChangeLog:

	PR fortran/98490
	* gfortran.dg/bounds_check_23.f90: New test.
2021-09-09 21:34:01 +02:00
H.J. Lu
de515ce0b2 x86-64: Update AVX512FP16 ABI tests for x32
On x32, long is the same as int and pointer is 32 bits.  Update AVX512FP16
ABI tests:

1. Replace long with long long for 64-bit integers.
2. Update type and alignment for long and pointer.
3. Skip tests for long on x32.

	* gcc.target/x86_64/abi/avx512fp16/args.h: Replace long with
	long long.
	(XMM_T): Rename _long to _longlong and _ulong to _ulonglong.
	(X87_T): Rename _ulong to _ulonglong.
	* gcc.target/x86_64/abi/avx512fp16/defines.h (TYPE_SIZE_LONG):
	Define to 4 if __ILP32__ is defined.
	(TYPE_SIZE_POINTER): Likewise.
	(TYPE_ALIGN_LONG): Likewise.
	(TYPE_ALIGN_POINTER): Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
	(main): Skip test for long if __ILP32__ is defined.
	* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
	(do_test): Replace _long with _longlong.
	* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c:
	(check_300): Replace _ulong with _ulonglong.
	* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: Replace long
	with long long.
	(YMM_T): Rename _long to _longlong and _ulong to _ulonglong.
	(X87_T): Rename _ulong to _ulonglong.
	* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Replace long
	with long long.
	(ZMM_T): Rename _long to _longlong and _ulong to _ulonglong.
	(X87_T): Rename _ulong to _ulonglong.
2021-09-09 08:42:35 -07:00
Richard Biener
013cfc6484 Improve LIM fill_always_executed_in computation
Currently the DOM walk over a loop body does not walk into not
always executed subloops to avoid scalability issues since doing
so makes the walk quadratic in the loop depth.  It turns out this
is not an issue in practice and even with a loop depth of 1800
this function is way off the radar.

So the following patch removes the limitation, replacing it with
a comment.

2021-09-09  Richard Biener  <rguenther@suse.de>

	* tree-ssa-loop-im.c (fill_always_executed_in_1): Walk
	into all subloops.

	* gcc.dg/tree-ssa/ssa-lim-17.c: New testcase.
2021-09-09 11:50:20 +02:00
Richard Biener
6e27bc2b88 Avoid full DOM walk in LIM fill_always_executed_in
This avoids a full DOM walk via get_loop_body_in_dom_order in the
loop body walk of fill_always_executed_in which is often terminating
the walk of a loop body early by integrating the DOM walk of
get_loop_body_in_dom_order with the actual processing done by
fill_always_executed_in.  This trades the fully populated loop
body array with a worklist allocation of the same size and thus
should be a strict improvement over the recursive approach of
get_loop_body_in_dom_order.

2021-09-09  Richard Biener  <rguenther@suse.de>

	* tree-ssa-loop-im.c (fill_always_executed_in_1): Integrate
	DOM walk from get_loop_body_in_dom_order using a worklist
	approach.
2021-09-09 11:16:58 +02:00
liuhongt
f77f3adebd AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h: New header file for
	FP16 runtime test.
	* gcc.target/i386/avx512fp16-vaddph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.
2021-09-09 16:09:05 +08:00
liuhongt
bd7a34ef55 AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.
gcc/ChangeLog:

	* config.gcc: Add avx512fp16vlintrin.h.
	* config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
	(_mm512_mask_add_ph): Likewise.
	(_mm512_maskz_add_ph): Likewise.
	(_mm512_sub_ph): Likewise.
	(_mm512_mask_sub_ph): Likewise.
	(_mm512_maskz_sub_ph): Likewise.
	(_mm512_mul_ph): Likewise.
	(_mm512_mask_mul_ph): Likewise.
	(_mm512_maskz_mul_ph): Likewise.
	(_mm512_div_ph): Likewise.
	(_mm512_mask_div_ph): Likewise.
	(_mm512_maskz_div_ph): Likewise.
	(_mm512_add_round_ph): Likewise.
	(_mm512_mask_add_round_ph): Likewise.
	(_mm512_maskz_add_round_ph): Likewise.
	(_mm512_sub_round_ph): Likewise.
	(_mm512_mask_sub_round_ph): Likewise.
	(_mm512_maskz_sub_round_ph): Likewise.
	(_mm512_mul_round_ph): Likewise.
	(_mm512_mask_mul_round_ph): Likewise.
	(_mm512_maskz_mul_round_ph): Likewise.
	(_mm512_div_round_ph): Likewise.
	(_mm512_mask_div_round_ph): Likewise.
	(_mm512_maskz_div_round_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h: New header.
	* config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
	Add new builtin types.
	* config/i386/i386-builtin.def: Add corresponding builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Likewise.
	* config/i386/immintrin.h: Include avx512fp16vlintrin.h
	* config/i386/sse.md (VFH): New mode_iterator.
	(VF2H): Likewise.
	(avx512fmaskmode): Add HF vector modes.
	(avx512fmaskhalfmode): Likewise.
	(<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
	HF vector modes.
	(*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
	(mul<mode>3<mask_name><round_name>): Likewise.
	(*mul<mode>3<mask_name><round_name>): Likewise.
	(div<mode>3): Likewise.
	(<sse>_div<mode>3<mask_name><round_name>): Likewise.
	* config/i386/subst.md (SUBST_V): Add HF vector modes.
	(SUBST_A): Likewise.
	(round_mode512bit_condition): Adjust for V32HFmode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
	* gcc.target/i386/avx-2.c: Add -mavx512vl.
	* gcc.target/i386/avx512fp16-11a.c: New test.
	* gcc.target/i386/avx512fp16-11b.c: Ditto.
	* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
	* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
	* gcc.target/i386/sse-13.c: Add test for new builtins.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
2021-09-09 16:08:56 +08:00
liuhongt
8f323c712e Optimize v4sf reduction.
gcc/ChangeLog:

	PR target/101059
	* config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
	(reduc_plus_scal_v4sf): .. this, New define_expand.
	(reduc_plus_scal_v2df): .. and this, New define_expand.

gcc/testsuite/ChangeLog:

	PR target/101059
	* gcc.target/i386/sse2-pr101059.c: New test.
	* gcc.target/i386/sse3-pr101059.c: New test.
2021-09-09 09:34:15 +08:00
liuhongt
60eec23b5e Optimize vec_extract for 256/512-bit vector when index exceeds the lower 128 bits.
-	vextracti32x8	$0x1, %zmm0, %ymm0
-	vmovd	%xmm0, %eax
+	valignd	$8, %zmm0, %zmm0, %zmm1
+	vmovd	%xmm1, %eax

-	vextracti32x8	$0x1, %zmm0, %ymm0
-	vextracti128	$0x1, %ymm0, %xmm0
-	vpextrd	$3, %xmm0, %eax
+	valignd	$15, %zmm0, %zmm0, %zmm1
+	vmovd	%xmm1, %eax

-	vextractf64x2	$0x1, %ymm0, %xmm0
+	valignq	$2, %ymm0, %ymm0, %ymm0

-	vextractf64x4	$0x1, %zmm0, %ymm0
-	vextractf64x2	$0x1, %ymm0, %xmm0
-	vunpckhpd	%xmm0, %xmm0, %xmm0
+	valignq	$7, %zmm0, %zmm0, %zmm0

gcc/ChangeLog:

	PR target/91103
	* config/i386/sse.md (*vec_extract<mode><ssescalarmodelower>_valign):
	New define_insn.

gcc/testsuite/ChangeLog:

	PR target/91103
	* gcc.target/i386/pr91103-1.c: New test.
	* gcc.target/i386/pr91103-2.c: New test.
2021-09-09 09:33:40 +08:00
GCC Administrator
b6db7cd41c Daily bump. 2021-09-09 00:16:32 +00:00
Jonathan Wakely
3c64582372 c++: Fix docs on assignment of virtual bases [PR60318]
The description of behaviour is incorrect, the virtual base gets
assigned before entering the bodies of A::operator= and B::operator=,
not after.

The example is also ill-formed (passing a string literal to char*) and
undefined (missing return from Base::operator=).

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

gcc/ChangeLog:

	PR c++/60318
	* doc/trouble.texi (Copy Assignment): Fix description of
	behaviour and fix code in example.
2021-09-08 22:34:16 +01:00
David Malcolm
e66b9f6779 analyzer: fix ICE when discarding result of realloc [PR102225]
gcc/analyzer/ChangeLog:
	PR analyzer/102225
	* analyzer.h (compat_types_p): New decl.
	* constraint-manager.cc
	(constraint_manager::get_or_add_equiv_class): Guard against NULL
	type when checking for pointer types.
	* region-model-impl-calls.cc (region_model::impl_call_realloc):
	Guard against NULL lhs type/region.  Guard against the size value
	not being of a compatible type for dynamic extents.
	* region-model.cc (compat_types_p): Make non-static.

gcc/testsuite/ChangeLog:
	PR analyzer/102225
	* gcc.dg/analyzer/realloc-1.c (test_10): New.
	* gcc.dg/analyzer/torture/pr102225.c: New test.
2021-09-08 14:37:19 -04:00
Richard Biener
716a583692 c++/102228 - make lookup_anon_field O(1)
For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309.  The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested.  There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them.  That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.

The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field.  That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.

I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much).  It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position.  I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.

2021-09-08  Richard Biener  <rguenther@suse.de>

	PR c++/102228
gcc/cp/
	* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
	* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
	place on invalid code.
	* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
	access.
	* module.cc (trees_in::read_class_def): Likewise.  Reconstruct
	ANON_AGGR_TYPE_FIELD.
	* semantics.c (finish_member_declaration): Populate
	ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
	* typeck.c (lookup_anon_field): Remove DFS search and return
	ANON_AGGR_TYPE_FIELD directly.
2021-09-08 17:43:40 +02:00
Joseph Myers
d27d694151 testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c
When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.

FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)

Jakub's commit 0b34dbc0a2 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux.  I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).

Tested with no regressions with cross to nios2-elf.

	* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.
2021-09-08 15:38:18 +00:00
Joseph Myers
d081516ae1 testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details
When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.

cselim is enabled conditionally by code in toplev.c:

  if (flag_tree_cselim == AUTODETECT_VALUE)
    {
      if (HAVE_conditional_move)
	flag_tree_cselim = 1;
      else
	flag_tree_cselim = 0;
    }

Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.

Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.

	* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
	gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
	gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
	gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
	gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
	gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.
2021-09-08 14:57:20 +00:00
Segher Boessenkool
86e6268cff rs6000: Fix ELFv2 r12 use in epilogue
We cannot use r12 here, it is already in use as the GEP (for sibling
calls).

2021-09-08  Segher Boessenkool  <segher@kernel.crashing.org>
	PR target/102107
	* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
	r11 instead of r12 for restoring CR.
2021-09-08 13:27:56 +00:00
Jakub Jelinek
7485a52551 i386: Fix up xorsign for AVX [PR89984]
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.

Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
	    liuhongt  <hongtao.liu@intel.com>

	PR target/89984
	* config/i386/i386.md (@xorsign<mode>3_1): Remove.
	* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
	into AND with mask and XOR, using paradoxical subregs.
	(ix86_split_xorsign): Remove.
	* config/i386/i386-protos.h (ix86_split_xorsign): Remove.

	* gcc.target/i386/avx-pr102224.c: Fix up PR number.
	* gcc.dg/pr89984.c: New test.
	* gcc.target/i386/avx-pr89984.c: New test.
2021-09-08 14:06:10 +02:00
liuhongt
6576ad5add Compile __{mul,div}hc3 into libgcc_s.so.1.
libgcc/ChangeLog:

	* config/i386/t-softfp: Compile __{mul,div}hc3 into
	libgcc_s.so.1.
2021-09-08 19:18:15 +08:00