OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Richard Biener	47ee6e6fb9	Use the proper vectype The following uses the SLP node vectype rather than the vectype stored in the DR group. 2021-09-17 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_load): Use the vectype from the SLP node.	2021-09-20 12:30:49 +02:00
Tobias Burnus	0de4184bac	Fortran/OpenMP: unconstrained/reproducible ordered modifier gcc/fortran/ChangeLog: * gfortran.h (gfc_omp_clauses): Add order_unconstrained. * dump-parse-tree.c (show_omp_clauses): Dump it. * openmp.c (gfc_match_omp_clauses): Match unconstrained/reproducible modifiers to ordered(concurrent). (OMP_DISTRIBUTE_CLAUSES): Accept ordered clause. (resolve_omp_clauses): Reject ordered + order on same directive. * trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses): Pass on unconstrained modifier of ordered(concurrent). gcc/testsuite/ChangeLog: * gfortran.dg/gomp/order-5.f90: New test. * gfortran.dg/gomp/order-6.f90: New test. * gfortran.dg/gomp/order-7.f90: New test. * gfortran.dg/gomp/order-8.f90: New test. * gfortran.dg/gomp/order-9.f90: New test.	2021-09-20 12:13:31 +02:00
Richard Biener	24f99147b9	Avoid premature alignment setting in vect_duplicate_ssa_name_ptr_info This removes adjusting alignment based on the vectorized accesses and instead keeps what was set on the original access. The code generating the actual accesses make sure to properly align the vectorized accesses based on the generated pointer already and the vectorizers alignment is always based of the desired alignment of a vector type and thus will reset alignment to unknown this way for example when doing strided accesses. 2021-09-20 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_duplicate_ssa_name_ptr_info): Do not compute alignment of the vectorized access here.	2021-09-20 11:25:18 +02:00
Richard Biener	f55c8db019	vect alignmet enhance TLC This properly marks the loop as for a runtime alias peel rather than (pointlessly) going through DR_MISALIGNMENT. 2021-09-20 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Store -1 for runtime alias peeling iterations.	2021-09-20 11:25:18 +02:00
Richard Biener	10555529c6	Obsolete hppa[12]--hpux10* and hppa[12]--hpux11* This obsoletes the 32bit hppa-hpux configurations which only support STABS as debuginfo format. 2021-09-20 Richard Biener <rguenther@suse.de> gcc/ * config.gcc: Obsolete hppa[12]--hpux10* and hppa[12]--hpux11. contrib/ config-list.mk: --enable-obsolete for hppa2.0-hpux10.1 and hppa2.0-hpux11.9.	2021-09-20 11:25:18 +02:00
Christophe Lyon	9081759b7e	testsuite: Remove .exe suffix in prune_gcc_output When running the testsuite under Windows, we noticed failures in testcase which attempt to match compiler error messages containing the name of the executable. For instance, gcc.dg/analyzer/signal-4a.c tries to match 'cc1:' which obviously fails when the executable is called cc1.exe. This patch removes the .exe suffix from various toolchain executables to avoid this problem. 2021-09-08 Christophe Lyon <christophe.lyon@foss.st.com> Torbjörn SVENSSON <torbjorn.svensson@st.com> gcc/testsuite/ * lib/prune.exp (prune_gcc_output): Remove .exe suffix from toolchain executables names.	2021-09-20 10:26:30 +02:00
Thomas Schwinge	7d79c3ebc3	Don't record string concatenation data for 'RESERVED_LOCATION_P' 'RESERVED_LOCATION_P' means 'UNKNOWN_LOCATION' or 'BUILTINS_LOCATION'. We're using 'UNKNOWN_LOCATION' as a spare value for 'Empty', so should ascertain that we don't use it as a key additionally. Similarly for 'BUILTINS_LOCATION' that we'd later like to use as a spare value for 'Deleted'. As discussed in the source code comment added, for these we didn't have stable behavior anyway. Follow-up to r239175 (commit `88fa5555a3`) "On-demand locations within string-literals". gcc/ * input.c (string_concat_db::record_string_concatenation) (string_concat_db::get_string_concatenation): Skip for 'RESERVED_LOCATION_P'. gcc/testsuite/ * gcc.dg/plugin/diagnostic-test-string-literals-1.c: Adjust expected error diagnostics.	2021-09-20 10:12:47 +02:00
Richard Biener	f92901a508	tree-optimization/65206 - dependence analysis on mixed pointer/array This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers. The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure. There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native). 2021-09-08 Richard Biener <rguenther@suse.de> PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise.	2021-09-20 08:51:07 +02:00
Iain Sandoe	abdf63d782	Driver: Fix bootstrap with DEFAULT_{ASSEMBLER,LINKER,DSYMUTIL}. The patch at r12-3662-g5fee8a0a9223d factored the code for printing the names of programes into a separate function. However the moved editions that print out the names of the assembler, linker (and dsymutil on Darwin) when those are specified at configure-time were not adjusted accordingly, leading to a bootstrap fail. Fixed by testing specifically for execute OK, since we know these are programs. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * gcc.c: Test for execute OK when we find the programs for assembler linker and dsymutil and those were specified at configure-time.	2021-09-20 07:46:19 +01:00
GCC Administrator	34fac9ef72	Daily bump.	2021-09-20 00:16:21 +00:00
Martin Sebor	825293da70	Correct a function pre/postcondition [PR102403]. Resolves: PR middle-end/102403 - ICE in init_from_control_deps, at gimple-predicate-analysis.cc:2364 gcc/ChangeLog: PR middle-end/102403 * gimple-predicate-analysis.cc (predicate::init_from_control_deps): Correct a function pre/postcondition. gcc/testsuite/ChangeLog: PR middle-end/102403 * gcc.dg/uninit-pr102403.c: New test. * gcc.dg/uninit-pr102403-c2.c: New test.	2021-09-19 17:23:19 -06:00
Martin Sebor	c3895ef466	Handle null cfun [PR102243]. Resolves: PR middle-end/102243 - ICE on placement new at global scope gcc/ChangeLog: PR middle-end/102243 * tree-ssa-strlen.c (get_range): Handle null cfun. gcc/testsuite/ChangeLog: PR middle-end/102243 * g++.dg/warn/Wplacement-new-size-10.C: New test.	2021-09-19 17:16:26 -06:00
Iain Sandoe	32731fa5b0	libgcc, Darwin: Remove unused symlinks. These were used on older systems to equate the FAT libgcc_s library to single-slice equivalents. Unused for any current system and never emitted by GCC. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> libgcc/ChangeLog: * config/t-slibgcc-darwin: Delete unused code.	2021-09-19 19:47:19 +01:00
Iain Sandoe	ea4e901fa3	libgcc, X86, Darwin: Handle symbols for HF cases. This reorganises the Darwin symbol vers files to include the generic ones at the top level; allowing for arch ports to override (via either exclusion or inclusion as needed). We add an X86-specific vers file containing the new HF symbols. Note that although Darwin does not use ELF-style symbol versioning - the parser that produces the map can consume it. Using the ELF-style description will help us know at which rev the symbols were introduced. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> libgcc/ChangeLog: * config/i386/t-darwin: Add in a vers file for X86-specific symbols. * config/t-darwin: Add the generic symbol maps here... * config/t-slibgcc-darwin: ... removing from here. * config/i386/libgcc-darwin.ver: New file.	2021-09-19 19:41:31 +01:00
Iain Sandoe	1297a40fb3	libgcc, X86: Exclude rules for libgcc2 __{div,mul}hc3. We want to override the libgcc2 generic version of these functions for X86. First exclude the original and the add in the replacements. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> libgcc/ChangeLog: * config/i386/t-softfp: Exclude libgcc2 versions of __divhc3 and __mulhc3.	2021-09-19 19:38:04 +01:00
Iain Sandoe	8738543878	Darwin, crts: Build Darwin10 unwinder shim as a library. We have a small unwinder shim that is only used for Darwin10 (and only then in quite specific cases). To avoid linking this code for every executable or DSO, we can present the crt as a convenience library (rather than a .o file). Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config/darwin.h (LINK_COMMAND_SPEC_A): Use Darwin10 unwinder shim as a convenience library. libgcc/ChangeLog: * config.host: Use convenience library for Darwin10 unwinder shim. * config/t-darwin: Build Darwin10 unwinder shim as a convenience library.	2021-09-19 19:35:00 +01:00
Jeff Law	f75b237254	[committed] Make test names unique for a couple of goacc tests gcc/testsuite * gfortran.dg/goacc/privatization-1-compute.f90: Make test names unique. * gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Likewise.	2021-09-19 13:31:32 -04:00
Andrew Pinski	7e4ada576f	Update the section on binutils version LTO usage requires binutils 2.35 or newer due to https://sourceware.org/PR25355. This adds a note in the prerequisites page about it. Ok? gcc/ChangeLog: * doc/install.texi: Add note about binutils 2.35 is required for LTO usage.	2021-09-19 17:29:37 +00:00
Andrew Pinski	68aace4458	Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken So the problem here is that now the lto-plugin requires NM that works with LTO to work so we need to pass down NM just like we do for ranlib and ar. OK? Bootstrapped and tested with --with-build-config=bootstrap-lto on aarch64-linux-gnu. Note you need to use binutils 2.35 or later too due to ttps://sourceware.org/PR25355 (I will submit another patch to improve the installation instructions too). config/ChangeLog: PR bootstrap/102389 * bootstrap-lto-lean.mk: Handle NM like RANLIB AND AR. * bootstrap-lto.mk: Likewise.	2021-09-19 17:29:36 +00:00
Aldy Hernandez	08900f2889	Minor cleanups to forward threader. Every time we allocate a threading edge we push it onto the path in a distinct step. There's no need to do this in two steps, and avoiding this, keeps us from exposing the internals of the registry. I've also did some tiny cleanups in thread_across_edge, most importantly removing the bitmap in favor of an auto_bitmap. There are no functional changes. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader_registry::register_path): Use push_edge. * tree-ssa-threadedge.c (jump_threader::thread_around_empty_blocks): Same. (jump_threader::thread_through_normal_block): Same. (jump_threader::thread_across_edge): Same. Also, use auto_bitmap. Tidy up code. * tree-ssa-threadupdate.c (jt_path_registry::allocate_thread_edge): Remove. (jt_path_registry::push_edge): New. (dump_jump_thread_path): Make static. * tree-ssa-threadupdate.h (allocate_thread_edge): Remove. (push_edge): New.	2021-09-19 18:54:43 +02:00
Iain Sandoe	124c354ad7	Jit, testsuite: Amend expect processing to tolerate more platforms. The current 'fixed_host_execute' implementation fails on Darwin platforms for a number of reasons: 1/ If the sub-process spawn fails (e.g. because of missing or mal- formed params); rather than reporting the fail output into the match stream, as indicated by the expect manual, it terminates the script. - We fix this by (a) checking that the executable is valid as well as existing (b) we put the spawn into a catch block and report a failure. 2/ There is no recovery path at all for a buffer-full case (and we do see buffer-full events with the default sizes). - Added by the patch here, however it is not as sophisticated as the methods used by dejagnu internally. Here we set the process to be "nowait" and then close the connection - with the intent that this will terminate the spawned process. 3/ The expect logic assumes that 'Totals:' is a valid indicator for the end of the spawned process output. This is not true even for the default dejagnu header (there are a number of additional reporting lines after). In addition to this, there are some tests that intentionally produce more output after the totals report (and there are tests that do not use that mechanism at all). The effect is the we might arrive at the "wait" for the spawned process to finish - but that process might not have completed all its output. For Darwin, at least that causes a deadlock between expect and the spawnee - the latter is doing a non- cancellable write and the former is waiting for the latter to terminate. For some reason this does not seem to affect Linux perhaps the pty implementation allows the write(s) are able to proceed even though there is no reader. - This is fixed by modifying the loop termination condition to be either EOF (which will be the 'correct' condition) or a timeout which would represent an error either in the runtime or in the parsing of the output. As added precautions, we only try to wait if there is a correcly-spawned process, and we are also specific about which process we are waiting for. 4/ Darwin appears to have a bug in either the tcl or termios 'cooking' code that ocassionally inserts an additional CR char into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The original program output is correct (it only contains a single \n) - the additional character is being inserted somewhere in the translations applied before the output reaches expect. The logic of this expect implementation does not tolerate single \r or \n characters (it will fail with a timeout or buffer-full if that occurs). - This is fixed by having a line-end match that is adjusted for Darwin. 5/ The default buffer size does seem to be too small in some cases noting that GCC uses 10000 as the match buffer size and the default is 2000. - Fixed by increasing the size to 8192. 6/ There is a somewhat arbitrary dumping of output where we match ^$prefix\tSOMETHING... and then process the something. This essentially allows the match to start at any place in the buffer following any collection of non-line-end chars. - Fixed by amending the match for 'general' lines to accommodate these cases, and reporting such lines to the log. At least this should allow debugging of any cases where output that should be recognized is being dropped. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/testsuite/ChangeLog: * jit.dg/jit.exp (fixed_local_execute): Amend the match and exit conditions to cater for more platforms.	2021-09-19 17:02:43 +01:00
Aldy Hernandez	8d42a27d89	Make dump_ranger routines externally visible. There was an inline extern declaration for dump_ranger that was a bit of a hack. I've removed it in favor of an actual prototype. There are also some trivial changes to the dumping code in the path solver. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): Add header. (path_range_query::dump): Remove extern declaration of dump_ranger. * gimple-range-trace.cc (dump_ranger): Add DEBUG_FUNCTION marker. * gimple-range-trace.h (dump_ranger): Add prototype.	2021-09-19 17:40:34 +02:00
John Ericson	5fee8a0a92	[PATCH] Factor out `find_a_program` helper around `find_a_file` gcc/ * gcc.c (find_a_program): New function, factored out of... (find_a_file): Here. (execute): Use find_a_program when looking for programs rather than find_a_file.	2021-09-19 11:08:32 -04:00
Matwey V. Kornilov	16f9776669	[PATCH] avr: Add atmega324pb MCU gcc/ * config/avr/avr-mcus.def: Add atmega324pb. * doc/avr-mmcu.texi: Corresponding changes.	2021-09-19 11:05:00 -04:00
Roger Sayle	e9e46864cd	PR middle-end/88173: More constant folding of NaN comparisons. This patch tackles PR middle-end/88173 where the order of operands in a comparison affects constant folding. As diagnosed by Jason Merrill, "match.pd handles these comparisons very differently". The history is that the middle end, typically canonicalizes comparisons to place constants on the right, but when a comparison contains two constants we need to check/transform both constants, i.e. on both the left and the right. Hence the added lines below duplicate for @0 the same transform applied a few lines above for @1. Whilst preparing the testcase, I noticed that this transformation is incorrectly disabled with -fsignaling-nans even when both operands are known not be be signaling NaNs, so I've corrected that and added a second test case. Unfortunately, c-c++-common/pr57371-4.c then starts failing, as it doesn't distinguish QNaNs (which are quiet) from SNaNs (which signal), so this patch includes a minor tweak to the expected behaviour for QNaNs in that existing test. 2021-09-19 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/88173 * match.pd (cmp @0 REAL_CST@1): When @0 is also REAL_CST, apply the same transformations as to @1. For comparisons against NaN, don't check HONOR_SNANS but confirm that neither operand is a signaling NaN. gcc/testsuite/ChangeLog PR middle-end/88173 * c-c++-common/pr57371-4.c: Tweak/correct test case for QNaNs. * g++.dg/pr88173-1.C: New test case. * g++.dg/pr88173-2.C: New test case.	2021-09-19 09:07:01 +01:00
Benjamin Peterson	69337e7495	[PATCH] Remove unused function make_unique_name. gcc/ * attribs.c (make_unique_name): Delete. * attribs.h (make_unique_name): Delete.	2021-09-19 00:08:21 -04:00
Andrew Pinski	767c098247	Fix middle-end/102395: reg_class having only NO_REGS and ALL_REGS. So this is a simple fix is to just add to the assert that sclass and dclass are both greater than or equal to NO_REGS. NO_REGS is documented as the first register class so it should have the value of 0. gcc/ChangeLog: * lra-constraints.c (check_and_process_move): Assert that dclass and sclass are greater than or equal to NO_REGS.	2021-09-19 03:49:13 +00:00
GCC Administrator	cf74e7b57b	Daily bump.	2021-09-19 00:16:29 +00:00
Jakub Jelinek	e9d8fcabd0	openmp: Handle unconstrained and reproducible modifiers on order(concurrent) This patch adds handling for unconstrained and reproducible modifiers on order(concurrent) clause. For all static schedules (including auto and no schedule or dist_schedule clauses) I believe what we implement is reproducible, so the patch doesn't do much beyond recognizing those. Note, there is an OpenMP/spec issue that needs resolution on what should happen with the dynamic schedules (whether it should be an error to mix such clauses, or silently make it non-reproducible, and in which exact cases), so it might need some follow-up. Besides that, this patch allows order(concurrent) clause on the distribute construct which is something also added in OpenMP 5.1, and finally check the newly added restriction that at most one order clause can appear on a construct. The allowing of order clause on distribute has a side-effect that order(concurrent) copyin(thrpriv) is no longer allowed on combined/composite constructs with distribute parallel for{, simd} in it, previously the order applied only to for/simd and so a threadprivate var could be seen in the construct, but now it also applies to distribute and so on the parallel we shouldn't refer to a threadprivate var. 2021-09-18 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_ORDER_UNCONSTRAINED): Define. * tree-pretty-print.c (dump_omp_clause): Print unconstrained: for OMP_CLAUSE_ORDER_UNCONSTRAINED. gcc/c-family/ * c-omp.c (c_omp_split_clauses): Split order clause also to distribute construct. Copy over OMP_CLAUSE_ORDER_UNCONSTRAINED. gcc/c/ * c-parser.c (c_parser_omp_clause_order): Parse unconstrained and reproducible modifiers. (OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause. gcc/cp/ * parser.c (cp_parser_omp_clause_order): Parse unconstrained and reproducible modifiers. (OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause. gcc/testsuite/ * c-c++-common/gomp/order-1.c (f2): Add tests for distribute with order clause. (f3): Remove. * c-c++-common/gomp/order-2.c: Don't expect error for distribute with order clause. * c-c++-common/gomp/order-5.c: New test. * c-c++-common/gomp/order-6.c: New test. * c-c++-common/gomp/clause-dups-1.c (f1): Add tests for duplicated order clause. (f9): New function. * c-c++-common/gomp/clauses-1.c (baz, bar): Don't mix copyin and order(concurrent) clauses on the same composite construct combined with distribute, instead split it into two tests, one without copyin and one without order(concurrent). Add order(concurrent) clauses to {,{,target} teams} distribute. * g++.dg/gomp/attrs-1.C (baz, bar): Likewise. * g++.dg/gomp/attrs-2.C (baz, bar): Likewise.	2021-09-18 09:58:13 +02:00
liuhongt	e666a0a22a	Fix ICE in pass_rpad. Besides conversion instructions, pass_rpad also handles scalar sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to handle conversion instructions, so fix it. gcc/ChangeLog: * config/i386/i386-features.c (remove_partial_avx_dependency): Restrict TARGET_USE_VECTOR_FP_CONVERTS and TARGET_USE_VECTOR_CONVERTS to conversion instructions only.	2021-09-18 15:52:20 +08:00
Jakub Jelinek	e5597f2ad5	openmp: Allow private or firstprivate arguments to default clause even for C/C++ OpenMP 5.1 allows default(private) or default(firstprivate) even in C/C++, but it behaves the same way as in Fortran only for variables not declared at namespace or file scope. For the namespace/file scope variables it instead behaves as default(none). 2021-09-18 Jakub Jelinek <jakub@redhat.com> gcc/ * gimplify.c (omp_default_clause): For C/C++ default({,first}private), if file/namespace scope variable doesn't have predetermined sharing, treat it as if there was default(none). gcc/c/ * c-parser.c (c_parser_omp_clause_default): Handle private and firstprivate arguments, adjust diagnostics on unknown argument. gcc/cp/ * parser.c (cp_parser_omp_clause_default): Handle private and firstprivate arguments, adjust diagnostics on unknown argument. * cp-gimplify.c (cxx_omp_finish_clause): Handle OMP_CLAUSE_PRIVATE. gcc/testsuite/ * c-c++-common/gomp/default-2.c: New test. * c-c++-common/gomp/default-3.c: New test. * g++.dg/gomp/default-1.C: New test. libgomp/ * testsuite/libgomp.c++/default-1.C: New test. * testsuite/libgomp.c-c++-common/default-1.c: New test. * libgomp.texi (OpenMP 5.1): Mark "private and firstprivate argument to default clause in C and C++" as implemented.	2021-09-18 09:47:25 +02:00
liuhongt	d07c750cc6	AVX512FP16: Add testcase for scalar FMA instructions. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c: Ditto.	2021-09-18 15:00:12 +08:00
liuhongt	3c9de0a93e	AVX512FP16: Add scalar fma instructions. Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/ vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_fmadd_sh): New intrinsic. (_mm_mask_fmadd_sh): Likewise. (_mm_mask3_fmadd_sh): Likewise. (_mm_maskz_fmadd_sh): Likewise. (_mm_fmadd_round_sh): Likewise. (_mm_mask_fmadd_round_sh): Likewise. (_mm_mask3_fmadd_round_sh): Likewise. (_mm_maskz_fmadd_round_sh): Likewise. (_mm_fnmadd_sh): Likewise. (_mm_mask_fnmadd_sh): Likewise. (_mm_mask3_fnmadd_sh): Likewise. (_mm_maskz_fnmadd_sh): Likewise. (_mm_fnmadd_round_sh): Likewise. (_mm_mask_fnmadd_round_sh): Likewise. (_mm_mask3_fnmadd_round_sh): Likewise. (_mm_maskz_fnmadd_round_sh): Likewise. (_mm_fmsub_sh): Likewise. (_mm_mask_fmsub_sh): Likewise. (_mm_mask3_fmsub_sh): Likewise. (_mm_maskz_fmsub_sh): Likewise. (_mm_fmsub_round_sh): Likewise. (_mm_mask_fmsub_round_sh): Likewise. (_mm_mask3_fmsub_round_sh): Likewise. (_mm_maskz_fmsub_round_sh): Likewise. (_mm_fnmsub_sh): Likewise. (_mm_mask_fnmsub_sh): Likewise. (_mm_mask3_fnmsub_sh): Likewise. (_mm_maskz_fnmsub_sh): Likewise. (_mm_fnmsub_round_sh): Likewise. (_mm_mask_fnmsub_round_sh): Likewise. (_mm_mask3_fnmsub_round_sh): Likewise. (_mm_maskz_fnmsub_round_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c: Handle new builtin type. * config/i386/sse.md (fmai_vmfmadd_<mode><round_name>): Ajdust to support FP16. (fmai_vmfmsub_<mode><round_name>): Ditto. (fmai_vmfnmadd_<mode><round_name>): Ditto. (fmai_vmfnmsub_<mode><round_name>): Ditto. (fmai_fmadd_<mode>): Ditto. (fmai_fmsub_<mode>): Ditto. (fmai_fnmadd_<mode><round_name>): Ditto. (fmai_fnmsub_<mode><round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask<round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfmadd_<mode>_maskz<round_expand_name>): Ditto. (avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto. (avx512f_vmfmsub_<mode>_mask<round_name>): Ditto. (avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto. (avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto. (avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto. (avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto. (avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_mask<round_name>): Renamed to ... (avx512f_vmfnmadd_<mode>_mask<round_name>) ... this, and adjust to support FP16. (avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_maskz<round_expand_name>): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.	2021-09-18 15:00:12 +08:00
H.J. Lu	376d69f3f7	AVX512FP16: Enable FP16 mask load/store. gcc/ChangeLog: * config/i386/sse.md (avx512fmaskmodelower): Extend to support HF modes. (maskload<mode><avx512fmaskmodelower>): Ditto. (maskstore<mode><avx512fmaskmodelower>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-xorsign-1.c: New test.	2021-09-18 15:00:12 +08:00
liuhongt	ef6ab4abc4	AVX512FP16: Add testcase for fp16 bitwise operations. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-neg-1a.c: New test. * gcc.target/i386/avx512fp16-neg-1b.c: Ditto. * gcc.target/i386/avx512fp16-scalar-bitwise-1a.c: Ditto. * gcc.target/i386/avx512fp16-scalar-bitwise-1b.c: Ditto. * gcc.target/i386/avx512fp16-vector-bitwise-1a.c: Ditto. * gcc.target/i386/avx512fp16-vector-bitwise-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-neg-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-neg-1b.c: Ditto.	2021-09-18 15:00:12 +08:00
H.J. Lu	75a97b59e1	AVX512FP16: Add scalar/vector bitwise operations, including 1. FP16 vector xor/ior/and/andnot/abs/neg 2. FP16 scalar abs/neg/copysign/xorsign gcc/ChangeLog: * config/i386/i386-expand.c (ix86_expand_fp_absneg_operator): Handle HFmode. (ix86_expand_copysign): Ditto. (ix86_expand_xorsign): Ditto. * config/i386/i386.c (ix86_build_const_vector): Handle HF vector modes. (ix86_build_signbit_mask): Ditto. (ix86_can_change_mode_class): Ditto. * config/i386/i386.md (SSEMODEF): Add HFmode. (ssevecmodef): Ditto. (<code>hf2): New define_expand. (<code>hf2_1): New define_insn_and_split. (copysign<mode>): Extend to support HFmode under AVX512FP16. (xorsign<mode>): Ditto. config/i386/sse.md (VFB): New mode iterator. (VFB_128_256): Ditto. (VFB_512): Ditto. (sseintvecmode2): Support HF vector mode. (<code><mode>2): Use new mode iterator. (<code><mode>2): Ditto. (copysign<mode>3): Ditto. (xorsign<mode>3): Ditto. (<code><mode>3<mask_name>): Ditto. (<code><mode>3<mask_name>): Ditto. (<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode. (<sse>_andnot<mode>3<mask_name>): Ditto. (<code><mode>3<mask_name>): Ditto. (*<code><mode>3<mask_name>): Ditto.	2021-09-18 15:00:12 +08:00
liuhongt	630a1249a0	AVX512FP16: Add testcase for fma instructions gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c: Ditto.	2021-09-18 15:00:11 +08:00
liuhongt	ede1820d21	AVX512FP16: Add FP16 fma instructions. Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/ vfnmsub[132,213,231]ph. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph): New intrinsic. (_mm512_mask3_fmadd_ph): Likewise. (_mm512_maskz_fmadd_ph): Likewise. (_mm512_fmadd_round_ph): Likewise. (_mm512_mask_fmadd_round_ph): Likewise. (_mm512_mask3_fmadd_round_ph): Likewise. (_mm512_maskz_fmadd_round_ph): Likewise. (_mm512_fnmadd_ph): Likewise. (_mm512_mask_fnmadd_ph): Likewise. (_mm512_mask3_fnmadd_ph): Likewise. (_mm512_maskz_fnmadd_ph): Likewise. (_mm512_fnmadd_round_ph): Likewise. (_mm512_mask_fnmadd_round_ph): Likewise. (_mm512_mask3_fnmadd_round_ph): Likewise. (_mm512_maskz_fnmadd_round_ph): Likewise. (_mm512_fmsub_ph): Likewise. (_mm512_mask_fmsub_ph): Likewise. (_mm512_mask3_fmsub_ph): Likewise. (_mm512_maskz_fmsub_ph): Likewise. (_mm512_fmsub_round_ph): Likewise. (_mm512_mask_fmsub_round_ph): Likewise. (_mm512_mask3_fmsub_round_ph): Likewise. (_mm512_maskz_fmsub_round_ph): Likewise. (_mm512_fnmsub_ph): Likewise. (_mm512_mask_fnmsub_ph): Likewise. (_mm512_mask3_fnmsub_ph): Likewise. (_mm512_maskz_fnmsub_ph): Likewise. (_mm512_fnmsub_round_ph): Likewise. (_mm512_mask_fnmsub_round_ph): Likewise. (_mm512_mask3_fnmsub_round_ph): Likewise. (_mm512_maskz_fnmsub_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph): New intrinsic. (_mm256_mask_fmadd_ph): Likewise. (_mm256_mask3_fmadd_ph): Likewise. (_mm256_maskz_fmadd_ph): Likewise. (_mm_fmadd_ph): Likewise. (_mm_mask_fmadd_ph): Likewise. (_mm_mask3_fmadd_ph): Likewise. (_mm_maskz_fmadd_ph): Likewise. (_mm256_fnmadd_ph): Likewise. (_mm256_mask_fnmadd_ph): Likewise. (_mm256_mask3_fnmadd_ph): Likewise. (_mm256_maskz_fnmadd_ph): Likewise. (_mm_fnmadd_ph): Likewise. (_mm_mask_fnmadd_ph): Likewise. (_mm_mask3_fnmadd_ph): Likewise. (_mm_maskz_fnmadd_ph): Likewise. (_mm256_fmsub_ph): Likewise. (_mm256_mask_fmsub_ph): Likewise. (_mm256_mask3_fmsub_ph): Likewise. (_mm256_maskz_fmsub_ph): Likewise. (_mm_fmsub_ph): Likewise. (_mm_mask_fmsub_ph): Likewise. (_mm_mask3_fmsub_ph): Likewise. (_mm_maskz_fmsub_ph): Likewise. (_mm256_fnmsub_ph): Likewise. (_mm256_mask_fnmsub_ph): Likewise. (_mm256_mask3_fnmsub_ph): Likewise. (_mm256_maskz_fnmsub_ph): Likewise. (_mm_fnmsub_ph): Likewise. (_mm_mask_fnmsub_ph): Likewise. (_mm_mask3_fnmsub_ph): Likewise. (_mm_maskz_fnmsub_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (<avx512>_fmadd_<mode>_maskz<round_expand_name>): Adjust to support HF vector modes. (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>): Ditto. (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1): Ditto. (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2): Ditto. (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fmadd_<mode>_mask<round_name>): Ditto. (<avx512>_fmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fmsub_<mode>_maskz<round_expand_name>): Ditto. (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>): Ditto. (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): Ditto. (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): Ditto. (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fmsub_<mode>_mask3<round_name>): Ditto. (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>): Ditto. (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1): Ditto. (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2): Ditto. (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fnmadd_<mode>_mask<round_name>): Ditto. (<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Ditto. (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>): Ditto. (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1): Ditto. (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2): Ditto. (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fnmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test fot new intrinsics. * gcc.target/i386/sse-22.c: Ditto.	2021-09-18 15:00:11 +08:00
liuhongt	b6c24eab08	AVX512FP16: Add testcase for vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c: New test. * gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c: Ditto.	2021-09-18 15:00:11 +08:00
liuhongt	1e6850841f	AVX512FP16: Add vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph): New intrinsic. (_mm512_mask_fmaddsub_ph): Likewise. (_mm512_mask3_fmaddsub_ph): Likewise. (_mm512_maskz_fmaddsub_ph): Likewise. (_mm512_fmaddsub_round_ph): Likewise. (_mm512_mask_fmaddsub_round_ph): Likewise. (_mm512_mask3_fmaddsub_round_ph): Likewise. (_mm512_maskz_fmaddsub_round_ph): Likewise. (_mm512_mask_fmsubadd_ph): Likewise. (_mm512_mask3_fmsubadd_ph): Likewise. (_mm512_maskz_fmsubadd_ph): Likewise. (_mm512_fmsubadd_round_ph): Likewise. (_mm512_mask_fmsubadd_round_ph): Likewise. (_mm512_mask3_fmsubadd_round_ph): Likewise. (_mm512_maskz_fmsubadd_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph): New intrinsic. (_mm256_mask_fmaddsub_ph): Likewise. (_mm256_mask3_fmaddsub_ph): Likewise. (_mm256_maskz_fmaddsub_ph): Likewise. (_mm_fmaddsub_ph): Likewise. (_mm_mask_fmaddsub_ph): Likewise. (_mm_mask3_fmaddsub_ph): Likewise. (_mm_maskz_fmaddsub_ph): Likewise. (_mm256_fmsubadd_ph): Likewise. (_mm256_mask_fmsubadd_ph): Likewise. (_mm256_mask3_fmsubadd_ph): Likewise. (_mm256_maskz_fmsubadd_ph): Likewise. (_mm_fmsubadd_ph): Likewise. (_mm_mask_fmsubadd_ph): Likewise. (_mm_mask3_fmsubadd_ph): Likewise. (_mm_maskz_fmsubadd_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator. * (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander. * (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use VFH_SF_AVX512VL. * (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>): Ditto. * (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto. * (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto. * (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>): Ditto. * (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto. * (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.	2021-09-18 15:00:11 +08:00
liuhongt	7afcb53423	Support embedded broadcast for AVX512FP16 instructions. gcc/ChangeLog: PR target/87767 * config/i386/i386.c (ix86_print_operand): Handle V8HF/V16HF/V32HFmode. * config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode. * config/i386/sse.md (avx512bcst): Remove. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-broadcast-1.c: New test. * gcc.target/i386/avx512fp16-broadcast-2.c: New test.	2021-09-18 13:03:07 +08:00
Jason Merrill	18b57c1d4a	c++: improve lookup of member-qualified names I've been working on the resolution of CWG1835 by P1787, which among many other things clarified that a name after -> or . is looked up first in the class of the object expression even if it's dependent. This patch does not make that change; this is a smaller change extracted from that work in progress to make the lookup in the object type work better in cases where unqualified lookup doesn't find anything. Basically, if we see "t.foo::" we know that looking up foo in t needs to find a type, so we build an implicit TYPENAME_TYPE for it. This also implements the change from P1787 to assume that a name followed by < in a type-only context names a template, since the less-than operator can't appear in a type context. This makes some of the lines in dtor11.C work. I introduce the predicate 'dependentish_scope_p' for the case where the current instantiation has dependent bases, so even though we can perform name lookup, we can't conclude that a lookup failure is conclusive. gcc/cp/ChangeLog: * cp-tree.h (dependentish_scope_p): Declare. * pt.c (dependentish_scope_p): New. * parser.c (cp_parser_lookup_name): Return a TYPENAME_TYPE for lookup of a type in a dependent object. (cp_parser_template_id): Handle TYPENAME_TYPE. (cp_parser_template_name): If we're looking for a type, a name followed by < names a template. gcc/testsuite/ChangeLog: * g++.dg/template/dtor5.C: Adjust expected error. * g++.dg/cpp23/lookup2.C: New test. * g++.dg/template/dtor11.C: New test.	2021-09-17 22:43:00 -04:00
Jason Merrill	8618f9e58c	c++: fix comment typo gcc/cp/ChangeLog: * cp-tree.h: Fix typo in LANG_FLAG list.	2021-09-17 22:43:00 -04:00
GCC Administrator	0a4cb43932	Daily bump.	2021-09-18 00:16:36 +00:00
Martin Sebor	94c12ffac2	Factor predidacte analysis out of tree-ssa-uninit.c into its own module. gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-predicate-analysis.o. * tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis. (MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same. (check_defs): Add comment. (can_skip_redundant_opnd): Update comment. (compute_uninit_opnds_pos): Adjust to namespace change. (find_pdom): Move to gimple-predicate-analysis.cc. (find_dom): Same. (struct uninit_undef_val_t): New. (is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc. (find_control_equiv_block): Same. (MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same. (MAX_SWITCH_CASES): Same. (compute_control_dep_chain): Same. (find_uninit_use): Use predicate analyzer. (struct pred_info): Move to gimple-predicate-analysis. (convert_control_dep_chain_into_preds): Same. (find_predicates): Same. (collect_phi_def_edges): Same. (warn_uninitialized_phi): Use predicate analyzer. (find_def_preds): Move to gimple-predicate-analysis. (dump_pred_info): Same. (dump_pred_chain): Same. (dump_predicates): Same. (destroy_predicate_vecs): Remove. (execute_late_warn_uninitialized): New. (get_cmp_code): Move to gimple-predicate-analysis. (is_value_included_in): Same. (value_sat_pred_p): Same. (find_matching_predicate_in_rest_chains): Same. (is_use_properly_guarded): Same. (prune_uninit_phi_opnds): Same. (find_var_cmp_const): Same. (use_pred_not_overlap_with_undef_path_pred): Same. (pred_equal_p): Same. (is_neq_relop_p): Same. (is_neq_zero_form_p): Same. (pred_expr_equal_p): Same. (is_pred_expr_subset_of): Same. (is_pred_chain_subset_of): Same. (is_included_in): Same. (is_superset_of): Same. (pred_neg_p): Same. (simplify_pred): Same. (simplify_preds_2): Same. (simplify_preds_3): Same. (simplify_preds_4): Same. (simplify_preds): Same. (push_pred): Same. (push_to_worklist): Same. (get_pred_info_from_cmp): Same. (is_degenerated_phi): Same. (normalize_one_pred_1): Same. (normalize_one_pred): Same. (normalize_one_pred_chain): Same. (normalize_preds): Same. (can_one_predicate_be_invalidated_p): Same. (can_chain_union_be_invalidated_p): Same. (uninit_uses_cannot_happen): Same. (pass_late_warn_uninitialized::execute): Define. * gimple-predicate-analysis.cc: New file. * gimple-predicate-analysis.h: New file.	2021-09-17 15:39:13 -06:00
Harald Anlauf	51166eb2c5	Fortran - (large) arrays in the main shall be static gcc/fortran/ChangeLog: PR fortran/102366 * trans-decl.c (gfc_finish_var_decl): Disable the warning message for variables moved from stack to static storange if they are declared in the main, but allow the move to happen. gcc/testsuite/ChangeLog: PR fortran/102366 * gfortran.dg/pr102366.f90: New test.	2021-09-17 21:46:32 +02:00
Jonathan Wakely	42eff613d0	libstdc++: Add 'noexcept' to path::iterator members All path::iterator operations are non-throwing. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> libstdc++-v3/ChangeLog: * include/bits/fs_path.h (path::iterator): Add noexcept to all member functions and friend functions. (distance): Add noexcept. (advance): Add noexcept and inline. * include/experimental/bits/fs_path.h (path::iterator): Add noexcept to all member functions.	2021-09-17 20:43:34 +01:00
Jonathan Wakely	1fa2c5a695	libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270] Also rename the test so it actually runs. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> libstdc++-v3/ChangeLog: PR libstdc++/102270 * include/std/tuple (_Tuple_impl): Add constexpr to constructor missed in previous patch. * testsuite/20_util/tuple/cons/102270.C: Moved to... * testsuite/20_util/tuple/cons/102270.cc: ...here. * testsuite/util/testsuite_allocator.h (SimpleAllocator): Add constexpr to constructor so it can be used for C++20 tests.	2021-09-17 20:43:34 +01:00
Julian Brown	2961ac45b9	openacc: Remove unnecessary barriers (gimple worker partitioning/broadcast) This is an optimisation for middle-end worker-partitioning support (used to support multiple workers on AMD GCN). At present, barriers may be emitted in cases where they aren't needed and cannot be optimised away. This patch stops the extraneous barriers from being emitted in the first place. One exception to the above (where the barrier is still needed) is for predicated blocks of code that perform a write to gang-private shared memory from one worker. We must execute a barrier before other workers read that shared memory location. gcc/ * config/gcn/gcn.c (gimple.h): Include. (gcn_fork_join): Emit barrier for worker-level joins. * omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add writes_gang_private bitmap parameter. Set bit for blocks containing gang-private variable writes. (worker_single_simple): Don't emit barrier after predicated block. (worker_single_copy): Don't emit barrier if we're not broadcasting anything and the block contains no gang-private writes. (neuter_worker_single): Don't predicate blocks that only contain NOPs or internal marker functions. Pass has_gang_private_write argument to worker_single_copy. (oacc_do_neutering): Add writes_gang_private bitmap handling.	2021-09-17 21:04:30 +02:00
Julian Brown	2a3f9f6532	openacc: Shared memory layout optimisation This patch implements an algorithm to lay out local data-share (LDS) space. It currently works for AMD GCN. At the moment, LDS is used for three things: 1. Gang-private variables 2. Reduction temporaries (accumulators) 3. Broadcasting for worker partitioning After the patch is applied, (2) and (3) are placed at preallocated locations in LDS, and (1) continues to be handled by the backend (as it is at present prior to this patch being applied). LDS now looks like this: +--------------+ (gang-private size + 1024, = 1536) \| free space \| \| ... \| \| - - - - - - -\| \| worker bcast \| +--------------+ \| reductions \| +--------------+ <<< -mgang-private-size=<number> (def. 512) \| gang-private \| \| vars \| +--------------+ (32) \| low LDS vars \| +--------------+ LDS base So, gang-private space is fixed at a constant amount at compile time (which can be increased with a command-line switch if necessary for some given code). The layout algorithm takes out a slice of the remainder of usable space for reduction vars, and uses the rest for worker partitioning. The partitioning algorithm works as follows. 1. An "adjacency" set is built up for each basic block that might do a broadcast. This is calculated by starting at each such block, and doing a recursive DFS walk over successors to find the next block (or blocks) that also does a broadcast (dfs_broadcast_reachable_1). 2. The adjacency set is inverted to get adjacent predecessor blocks also. 3. Blocks that will perform a broadcast are sorted by size of that broadcast: the biggest blocks are handled first. 4. A splay tree structure is used to calculate the spans of LDS memory that are already allocated by the blocks adjacent to this one (merge_ranges{,_1}. 5. The current block's broadcast space is allocated from the first free span not allocated in the splay tree structure calculated above (first_fit_range). This seems to work quite nicely and efficiently with the splay tree structure. 6. Continue with the next-biggest broadcast block until we're done. In this way, "adjacent" broadcasts will not use the same piece of LDS memory. PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in: The GCN backend uses tree nodes like MEM((__lds TYPE ) <constant>) for reduction temporaries. Unlike e.g. var decls and SSA names, these nodes cannot be shared during gimplification, but are so in some circumstances. This is detected when appropriate --enable-checking options are used. This patch unshares such nodes when they are reused more than once. gcc/ config/gcn/gcn-protos.h (gcn_goacc_create_worker_broadcast_record): Update prototype. * config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use preallocated block of LDS memory. Do not cache/share decls for reduction temporaries between invocations. (gcn_goacc_reduction_teardown): Unshare VAR on second use. (gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter and return temporary LDS space at that offset. Return pointer in "sender" case. * config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs): New global vars. (ACC_LDS_SIZE): Define as acc_lds_size. (gcn_init_machine_status): Don't initialise lds_allocated, lds_allocs, reduc_decls fields of machine function struct. (gcn_option_override): Handle default size for gang-private variables and -mgang-private-size option. (gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when initialising M0_REG. (gcn_shared_mem_layout): New function. (gcn_print_lds_decl): Update comment. Use global lds_allocs map and gang_private_hwm variable. (TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook. * config/gcn/gcn.h (machine_function): Remove lds_allocated, lds_allocs, reduc_decls. Add reduction_base, reduction_limit. * config/gcn/gcn.opt (gang_private_size_opt): New global. (mgang-private-size=): New option. * doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place documentation hook. * doc/tm.texi: Regenerate. * omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h): Add includes. (build_sender_ref): Handle sender_decl being pointer. (worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS parameters. Pass placement argument to create_worker_broadcast_record hook invocations. Handle sender_decl being pointer and isolate_broadcasts inserting extra barriers. (blk_offset_map_t): Add typedef. (neuter_worker_single): Add BLK_OFFSET_MAP parameter. Pass preallocated range to worker_single_copy call. (dfs_broadcast_reachable_1): New function. (idx_decl_pair_t, used_range_vec_t): New typedefs. (sort_size_descending): New function. (addr_range): New class. (splay_tree_compare_addr_range, splay_tree_free_key) (first_fit_range, merge_ranges_1, merge_ranges): New functions. (execute_omp_oacc_neuter_broadcast): Rename to... (oacc_do_neutering): ... this. Add BOUNDS_LO, BOUNDS_HI parameters. Arrange layout of shared memory for broadcast operations. (execute_omp_oacc_neuter_broadcast): New function. (pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1 handling from here. Enable pass for all OpenACC routines in order to call shared memory-layout hook. * target.def (create_worker_broadcast_record): Add OFFSET parameter. (shared_mem_layout): New hook. libgomp/ * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.	2021-09-17 21:04:30 +02:00

1 2 3 4 5 ...

188143 Commits