OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
David Malcolm	13ad6d9f50	analyzer: fix missing check for uninit of return values When moving the -fanalyzer tests for -ftrivial-auto-var-init to the "torture" subdirectory of gcc.dg/analyzer I noticed that -fanalyzer wasn't always properly checking for initialization of return values. The issue was that some "return" handling was using region_model::copy_region to copy to the RESULT_DECL, and copy_region wasn't checking for poisoned svalues. This patch eliminates region_model::copy_region in favor of simply doing a get_ravlue/set_value pair, fixing the issue. gcc/analyzer/ChangeLog: * region-model.cc (region_model::on_return): Replace usage of copy_region with get_rvalue/set_value pair. (region_model::pop_frame): Likewise. (selftest::test_compound_assignment): Likewise. * region-model.h (region_model::copy_region): Delete decl. * region.cc (region_model::copy_region): Delete. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/torture/ubsan-1.c: Add missing return stmts. * gcc.dg/analyzer/uninit-trivial-auto-var-init-pattern.c: Move to... * gcc.dg/analyzer/torture/uninit-trivial-auto-var-init-pattern.c: ...here. * gcc.dg/analyzer/uninit-trivial-auto-var-init-uninitialized.c: Move to... * gcc.dg/analyzer/torture/uninit-trivial-auto-var-init-uninitialized.c: ...here. * gcc.dg/analyzer/uninit-trivial-auto-var-init-zero.c: Move to... * gcc.dg/analyzer/torture/uninit-trivial-auto-var-init-zero.c: ...here. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2022-02-02 09:55:29 -05:00
David Malcolm	ea3e191595	analyzer: consolidate duplicate code in region::calc_offset gcc/analyzer/ChangeLog: * region.cc (region::calc_offset): Consolidate effectively identical cases. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2022-02-02 09:54:32 -05:00
David Malcolm	93e759fc18	analyzer: implement bit_range_region GCC 12 has gained -Wanalyzer-use-of-uninitialized-value, and I'm seeing various false positives from it due to region_model::get_lvalue not properly handling BIT_FIELD_REF, and falling back to using an UNKNOWN_REGION for them. This patch fixes these false positives by implementing a new bit_range_region region subclass for handling BIT_FIELD_REF. gcc/analyzer/ChangeLog: * analyzer.h (class bit_range_region): New forward decl. * region-model-manager.cc (region_model_manager::get_bit_range): New. (region_model_manager::log_stats): Handle m_bit_range_regions. * region-model.cc (region_model::get_lvalue_1): Handle BIT_FIELD_REF. * region-model.h (region_model_manager::get_bit_range): New decl. (region_model_manager::m_bit_range_regions): New field. * region.cc (region::get_base_region): Handle RK_BIT_RANGE. (region::base_region_p): Likewise. (region::calc_offset): Likewise. (bit_range_region::dump_to_pp): New. (bit_range_region::get_byte_size): New. (bit_range_region::get_bit_size): New. (bit_range_region::get_byte_size_sval): New. (bit_range_region::get_relative_concrete_offset): New. * region.h (enum region_kind): Add RK_BIT_RANGE. (region::dyn_cast_bit_range_region): New vfunc. (class bit_range_region): New. (is_a_helper <const bit_range_region >::test): New. (default_hash_traits<bit_range_region::key_t>): New. gcc/testsuite/ChangeLog: gcc.dg/analyzer/torture/uninit-bit-field-ref.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2022-02-02 09:52:58 -05:00
David Malcolm	9b4eee5fd1	analyzer: stop -ftrivial-auto-var-init from suppressing uninit warnings [PR104270] GCC 12 has gained two features for dealing with uninitialized variables: (a) a new -Wanalyzer-use-of-uninitialized-value warning within -fanalyzer for interprocedural path-sensitive detection of ununit uses, and (b) a new -ftrivial-auto-var-init option for mitigating some uses of uninit variables It turns out that using (b) was thwarting (a), as it led to -fanalyzer seeing calls to IFN_DEFERRED_INIT, which -fanalyzer wasn't special-casing, thus treating it as initializing the variables in question, and thus silencing -Wanalyzer-use-of-uninitialized-value on them. invoke.texi says: "GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, @option{-Wuninitialized} will still report warning messages on such automatic variables." and thus -Wanalyzer-use-of-uninitialized-value ought to as well. This patch adds special-case handling to -fanalyzer for IFN_DEFERRED_INIT, so that -fanalyzer will warn on uninit uses of variables that are mitigated by -ftrivial-auto-var-init. gcc/analyzer/ChangeLog: PR analyzer/104270 * region-model.cc (region_model::on_call_pre): Handle IFN_DEFERRED_INIT. gcc/testsuite/ChangeLog: PR analyzer/104270 * gcc.dg/analyzer/uninit-trivial-auto-var-init-pattern.c: New test. * gcc.dg/analyzer/uninit-trivial-auto-var-init-uninitialized.c: New test. * gcc.dg/analyzer/uninit-trivial-auto-var-init-zero.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2022-02-02 09:51:07 -05:00
Bernd Kuhls	cac2f69cda	gcc: define _REENTRANT for OpenRISC when -pthread is passed The detection of pthread support fails on OpenRISC unless _REENTRANT is defined. Added the CPP_SPEC definition to correct this. gcc/ChangeLog: PR target/94372 * config/or1k/linux.h (CPP_SPEC): Define. Signed-off-by: Bernd Kuhls <bernd.kuhls@t-online.de>	2022-02-02 20:02:59 +09:00
Tamar Christina	9f6f411f63	AArch32: use canonical ordering for complex mul, fma and fms After the first patch in the series this updates the optabs to expect the canonical sequence. gcc/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * config/arm/vec-common.md (cml<fcmac1><conj_op><mode>4): Use canonical order.	2022-02-02 10:52:17 +00:00
Tamar Christina	ab95fe61fe	AArch64: use canonical ordering for complex mul, fma and fms After the first patch in the series this updates the optabs to expect the canonical sequence. gcc/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4): Use canonical order. * config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4): Likewise.	2022-02-02 10:51:38 +00:00
Tamar Christina	55d83cdf23	vect: Simplify and extend the complex numbers validation routines. This patch boosts the analysis for complex mul,fma and fms in order to ensure that it doesn't create an incorrect output. Essentially it adds an extra verification to check that the two nodes it's going to combine do the same operations on compatible values. The reason it needs to do this is that if one computation differs from the other then with the current implementation we have no way to deal with it since we have to remove the permute. When we can keep the permute around we can probably handle these by unrolling. While implementing this since I have to do the traversal anyway I took advantage of it by simplifying the code a bit. Previously we would determine whether something is a conjugate and then try to figure out which conjugate it is and then try to see if the permutes match what we expect. Now the code that does the traversal will detect this in one go and return to us whether the operation is something that can be combined and whether a conjugate is present. Secondly because it does this I can now simplify the checking code itself to essentially just try to apply fixed patterns to each operation. The patterns represent the order operations should appear in. For instance a complex MUL operation combines : Left 1 + Right 1 Left 2 + Right 2 with a permute on the nodes consisting of: { Even, Even } + { Odd, Odd } { Even, Odd } + { Odd, Even } By abstracting over these patterns the checking code becomes quite simple. As part of this I was checking the order of the operands which was left in "slp" order. as in, the same order they showed up in during SLP, which means that the accumulator is first. However it looks like I didn't document this and the x86 optab was implemented assuming the same order as FMA, i.e. that the accumulator is last. I have this changed the order to match that of FMA and FMS which corrects the x86 codegen and will update the Arm targets. This has now also been documented. gcc/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * doc/md.texi: Update docs for cfms, cfma. * tree-data-ref.h (same_data_refs): Accept optional offset. * tree-vect-slp-patterns.cc (is_linear_load_p): Fix issue with repeating patterns. (vect_normalize_conj_loc): Remove. (is_eq_or_top): Change to take two nodes. (enum _conj_status, compatible_complex_nodes_p, vect_validate_multiplication): New. (class complex_add_pattern, complex_add_pattern::matches, complex_add_pattern::recognize, class complex_mul_pattern, complex_mul_pattern::recognize, class complex_fms_pattern, complex_fms_pattern::recognize, class complex_operations_pattern, complex_operations_pattern::recognize, addsub_pattern::recognize): Pass new cache. (complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new cache and use new validation code. * tree-vect-slp.cc (vect_match_slp_patterns_2, vect_match_slp_patterns, vect_analyze_slp): Pass along cache. (compatible_calls_p): Expose. * tree-vectorizer.h (compatible_calls_p, slp_node_hash, slp_compat_nodes_map_t): New. (class vect_pattern): Update signatures include new cache. gcc/testsuite/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * g++.dg/vect/pr99149.cc: xfail for now. * gcc.dg/vect/complex/pr102819-1.c: New test. * gcc.dg/vect/complex/pr102819-2.c: New test. * gcc.dg/vect/complex/pr102819-3.c: New test. * gcc.dg/vect/complex/pr102819-4.c: New test. * gcc.dg/vect/complex/pr102819-5.c: New test. * gcc.dg/vect/complex/pr102819-6.c: New test. * gcc.dg/vect/complex/pr102819-7.c: New test. * gcc.dg/vect/complex/pr102819-8.c: New test. * gcc.dg/vect/complex/pr102819-9.c: New test. * gcc.dg/vect/complex/pr103169.c: New test.	2022-02-02 10:39:03 +00:00
Martin Sebor	756eabacfc	Declare std::array members with attribute const [PR101831]. Resolves: PR libstdc++/101831 - Spurious maybe-uninitialized warning on std::array::size libstdc++-v3/ChangeLog: PR libstdc++/101831 * include/std/array (begin): Declare const member function attribute const. (end, rbegin, rend, size, max_size, empty, data): Same. * testsuite/23_containers/array/capacity/empty.cc: Add test cases. * testsuite/23_containers/array/capacity/max_size.cc: Same. * testsuite/23_containers/array/capacity/size.cc: Same. * testsuite/23_containers/array/iterators/begin_end.cc: New test.	2022-02-01 17:21:49 -07:00
Hans-Peter Nilsson	07a6c52c4c	cris: Reload using special-regs before general-regs On code where reload has an effect (i.e. quite rarely, just enough to be noticeable), this change gets code quality back to the situation prior to "Remove CRIS v32 ACR artefacts". We had from IRA a pseudoregister marked to be reloaded from a union of all allocatable registers (here: SPEC_GENNONACR_REGS) but where the register-class corresponding to the constraint for the register-type alternative (here: GENERAL_REGS) was not a subset of that class: SPEC_GENNONACR_REGS (and GENNONACR_REGS) had a one-register "hole" for the ACR register, a register present in GENERAL_REGS. Code in reload.cc:find_reloads adds 4 to the cost of a register-type alternative that is neither a subset of the preferred register class nor vice versa and thus reload thinks it can't use. It would be preferable to look for a non-empty intersection of the two, and use that intersection for that alternative, something that can't be expressed because a register class can't be formed from a random register set. The effect was here that the GENERAL_REGS to/from memory alternatives ("r") had their cost raised such that the SPECIAL_REGS alternatives ("x") looked better. This happened to improve code quality just a little bit compared to GENERAL_REGS being chosen. Anyway, with the improved CRIS register-class topology, the subset-checking code no longer has the GENERAL_REGS-demoting effect. To get the same quality, we have to adjust the port such that SPECIAL_REGS are specifically preferred when possible and advisible, i.e. when there's at least two of those registers as for the CPU variant with multiplication (which happens to be the variant maintained for performance). For the move-pattern, the obvious method may seem to simply "curse" the constraints of some alternatives (by prepending one of the "?!^$" characters) but that method can't be used, because we want the effect to be conditional on the CPU variant. It'd also be a shame to split the "movsi_internal<setcc><setnz><setnzvc>" into two CPU-variants (with different cursing). Iterators would help, but it still seems unwieldy. Instead, add copies of the GENERAL_REGS variants (to the SPECIAL_REGS alternatives) on the "other" side, and make use of the "enabled" attribute to activate just the desired order of alternatives. gcc: config/cris/cris.cc (cris_preferred_reload_class): Reject "eliminated" registers and small-enough constants unless reloaded into a class that is a subset of GENERAL_REGS. * config/cris/cris.md (attribute "cpu_variant"): New. (attribute "enabled"): Conditionalize on a matching attribute cpu_variant, if specified. ("*movsi_internal<setcc><setnz><setnzvc>"): For moves to and from memory, add cpu-variant-enabled variants for "r" alternatives on the far side of the "x" alternatives, preferring the "x" ones only for variants where MOF is present (in addition to SRP).	2022-02-02 01:20:06 +01:00
Hans-Peter Nilsson	9a7f14ef9b	cris: Don't discriminate against ALL_REGS in TARGET_REGISTER_MOVE_COST When the tightest class including both SPECIAL_REGS and GENERAL_REGS is ALL_REGS, artificially special-casing for either to or from, hits artificially hard. This gets the port back to the code quality before the previous patch ("cris: Remove CRIS v32 ACR artefacts") - except for_vfprintf_r and _vfiprintf_r in newlib (still .8 and .4% larger). gcc: * config/cris/cris.cc (cris_register_move_cost): Remove special pre-ira extra cost for ALL_REGS.	2022-02-02 01:20:05 +01:00
Hans-Peter Nilsson	27e35bc491	cris: Remove CRIS v32 ACR artefacts This is the change to which I alluded to this in r11-220 / `d0780379c1` as "causes extra register moves in libgcc". It has unfortunate side-effects due to the change in register-class topology. There's a slight improvement in coremark numbers (< 0.07%) though also increase in code size total (< 0.7%) but looking at the individual changes in functions, it's all-over (-7..+7%). Looking specifically at functions that improved in speed, it's also both plus and minus in code sizes. It's unworkable to separate improvements from regressions for this case. I'll follow up with patches to restore the previous code quality, in both size and speed. gcc: * config/cris/constraints.md (define_register_constraint "b"): Now GENERAL_REGS. * config/cris/cris.md (CRIS_ACR_REGNUM): Remove. * config/cris/cris.h: (reg_class, REG_CLASS_NAMES) (REG_CLASS_CONTENTS): Remove ACR_REGS, SPEC_ACR_REGS, GENNONACR_REGS, and SPEC_GENNONACR_REGS. * config/cris/cris.cc (cris_preferred_reload_class): Don't mention ACR_REGS and return GENERAL_REGS instead of GENNONACR_REGS.	2022-02-02 01:20:04 +01:00
Hans-Peter Nilsson	a58401d2e6	cris: For expanded movsi, don't match operands we know will be reloaded In a session investigating unexpected fallout from a change, I noticed reload needs one operand being a register to make an informed decision. It can happen that there's just a constant and a memory operand, as in: (insn 668 667 42 104 (parallel [ (set (mem:SI (plus:SI (reg/v/f:SI 347 [ fs ]) (const_int 168 [0xa8])) \ [1 fs_126(D)->regs.cfa_how+0 S4 A8]) (const_int 2 [0x2])) (clobber (reg:CC 19 dccr)) ]) "<...>/gcc/libgcc/unwind-dw2.c":1121:21 22 {movsi_internal} (expr_list:REG_UNUSED (reg:CC 19 dccr) (nil))) This was helpfully created by combine. When this happens, reload can't check for costs and preferred register classes, (both operands will start with NO_REGS as the preferred class) and will default to the constraints order in the insn in reload. (Which also does its own temporary merge in find_reloads, but that's a different story.) Better don't match the simple cases. Beware that subregs have to be matched. I'm doing this just for word_mode (SI) for now, but may repeat this for the other valid modes as well. In particular, that goes for DImode as I see the expanded movdi does almost* this, but uses register_operand instead of REG_S_P (from cris.h). Using REG_S_P is the right choice here because register_operand also matches (subreg (mem ...) ...) until reload is done. By itself it's just a sub-0.1% performance win (coremark). Also removing a stale comment. gcc: * config/cris/cris.md ("*movsi_internal<setcc><setnz><setnzvc>"): Conditionalize on (sub-)register operands or operand 1 being 0.	2022-02-02 01:20:03 +01:00
Hans-Peter Nilsson	4c4d0af4c9	cris: Don't default to -mmul-bug-workaround This flips the default for the errata handling for an old version (TL;DR: workaround: no multiply instruction last on a cache-line). Newer versions of the CRIS cpu don't have that bug. While the impact of the workaround is very marginal (coremark: less than .05% larger, less than .0005% slower) it's an irritating pseudorandom factor when assessing the impact of other changes. Also, fix a wart requiring changes to more than TARGET_DEFAULT to flip the default. People building old kernels or operating systems to run on ETRAX 100 LX are advised to pass "-mmul-bug-workaround". gcc: * config/cris/cris.h (TARGET_DEFAULT): Don't include MASK_MUL_BUG. (MUL_BUG_ASM_DEFAULT): New macro. (MAYBE_AS_NO_MUL_BUG_ABORT): Define in terms of MUL_BUG_ASM_DEFAULT. * doc/invoke.texi (CRIS Options, -mmul-bug-workaround): Adjust accordingly.	2022-02-02 01:20:02 +01:00
GCC Administrator	ae7e4af964	Daily bump.	2022-02-02 00:17:16 +00:00
Jonathan Wakely	d98668eb06	libstdc++: Do not use dirent::d_type unconditionally These new tests should not use the d_type member unless it's actually present on the OS. libstdc++-v3/ChangeLog: * testsuite/27_io/filesystem/iterators/error_reporting.cc: Use autoconf macro to check whether d_type is present. * testsuite/experimental/filesystem/iterators/error_reporting.cc: Likewise.	2022-02-02 00:01:43 +00:00
Eugene Rozenfeld	c17975d81a	AutoFDO: don't set param_early_inliner_max_iterations to 10. param_early_inliner_max_iterations specifies the maximum number of nested indirect inlining iterations performed by early inliner. Normally, the default value is 1. For AutoFDO this parameter was also used as the number of iteration for its indirect call promotion loop and the default value was set to 10. While it makes sense to have 10 in the indirect call promotion loop (we want to make the IR match the profiled binary before actual annotation) there is no reason to have a special default value for the regular early inliner. This change removes the special AutoFDO default value setting for param_early_inliner_max_iterations while keeping 10 as the number of iterations for the AutoFDO indirect call promotion loop. This change improves a simple fibonacci benchmark in AutoFDO mode by 15% on x86_64-pc-linux-gnu. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (auto_profile): Hard-code the number of iterations (10). gcc/ChangeLog: * opts.cc (common_handle_option): Don't set param_early_inliner_max_iterations to 10 for AutoFDO.	2022-02-01 15:20:11 -08:00
Andrew Pinski	6bc732eba9	[COMMITTED] Change multiprecision.org to use https As reported at https://gcc.gnu.org/pipermail/gcc/2022-February/238216.html, multiprecision.org now uses https so this updates the documentation to use https instead of http. Committed as obvious. gcc/ChangeLog: * doc/install.texi:	2022-02-01 23:10:06 +00:00
Jonathan Wakely	2dc2f41728	libstdc++: Add more tests for filesystem directory iterators The PR 97731 test was added to verify a fix to the Filesystem TS code, but we should also have the same test to avoid similar regressions in the C++17 std::filesystem code. Also add tests for directory_options::follow_directory_symlink libstdc++-v3/ChangeLog: * testsuite/27_io/filesystem/iterators/97731.cc: New test. * testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc: Check follow_directory_symlink option. * testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc: Likewise.	2022-02-01 21:56:35 +00:00
Jonathan Wakely	ec09a5335f	libstdc++: Reset filesystem::recursive_directory_iterator on error The standard requires directory iterators to become equal to the end iterator value if they report an error. Some members functions of filesystem::recursive_directory_iterator fail to do that. libstdc++-v3/ChangeLog: * src/c++17/fs_dir.cc (recursive_directory_iterator::increment): Reset state to past-the-end iterator on error. (fs::recursive_directory_iterator::pop(error_code&)): Likewise. (fs::recursive_directory_iterator::pop()): Check _M_dirs before it might get reset. * src/filesystem/dir.cc (recursive_directory_iterator): Likewise, for the TS implementation. * testsuite/27_io/filesystem/iterators/error_reporting.cc: New test. * testsuite/experimental/filesystem/iterators/error_reporting.cc: New test.	2022-02-01 21:56:16 +00:00
Jonathan Wakely	90263a4830	libstdc++: Fix doxygen comment for filesystem::perms operators libstdc++-v3/ChangeLog: * include/bits/fs_fwd.h (filesystem::perms): Fix comment.	2022-02-01 21:53:15 +00:00
Jonathan Wakely	19b8946dbd	libstdc++: Improve config output for --enable-cstdio [PR104301] Currently we just print "checking for underlying I/O to use... stdio" unconditionally, whether configured to use stdio_pure or stdio_posix. We should make it clear that the user's configure option chose the right thing. libstdc++-v3/ChangeLog: PR libstdc++/104301 * acinclude.m4 (GLIBCXX_ENABLE_CSTDIO): Print different messages for stdio_pure and stdio_posix options. * configure: Regenerate.	2022-02-01 21:53:14 +00:00
Ilya Leoshkevich	8753b13a31	IBM Z: fix `section type conflict` with -mindirect-branch-table s390_code_end () puts indirect branch tables into separate sections and tries to switch back to wherever it was in the beginning by calling switch_to_section (current_function_section ()). First of all, this is unnecessary - the other backends don't do it. Furthermore, at this time there is no current function, but if the last processed function was cold, in_cold_section_p remains set. This causes targetm.asm_out.function_section () to call targetm.section_type_flags (), which in absence of current function decl classifies the section as SECTION_WRITE. This causes a section type conflict with the existing SECTION_CODE. gcc/ChangeLog: * config/s390/s390.cc (s390_code_end): Do not switch back to code section. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-section-type-conflict.c: New test.	2022-02-01 22:13:38 +01:00
Harald Anlauf	447047a8f9	Fortran: error recovery when simplifying EOSHIFT gcc/fortran/ChangeLog: PR fortran/104331 * simplify.cc (gfc_simplify_eoshift): Avoid NULL pointer dereference when shape is not set. gcc/testsuite/ChangeLog: PR fortran/104331 * gfortran.dg/eoshift_9.f90: New test.	2022-02-01 21:36:42 +01:00
Jakub Jelinek	95ac563540	libcpp: Fix up padding handling in funlike_invocation_p [PR104147] As mentioned in the PR, in some cases we preprocess incorrectly when we encounter an identifier which is defined as function-like macro, followed by at least 2 CPP_PADDING tokens and then some other identifier. On the following testcase, the problem is in the 3rd funlike_invocation_p, the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token), CPP_PADDING (one created with padding_token, val.source is non-NULL and val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME. funlike_invocation_p remembers there was a padding token, but remembers the first one because of its condition, then the next token is the CPP_NAME, which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we can't easily backup more tokens, it pushes into a new context the padding token (the pfile->avoid_paste one). The net effect is that when Y is not defined as fun-like macro, we read Y, avoid_paste, padding_token, Y, while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y (the second avoid_paste is because that is how we handle end of a context). Now, for stringify_arg that is unfortunately a significant difference, which handles CPP_PADDING tokens with: if (token->type == CPP_PADDING) { if (source == NULL \|\| (!(source->flags & PREV_WHITE) && token->val.source == NULL)) source = token->val.source; continue; } and later on /* Leading white space? / if (dest - 1 != BUFF_FRONT (pfile->u_buff)) { if (source == NULL) source = token; if (source->flags & PREV_WHITE) dest++ = ' '; } source = NULL; (and c-ppoutput.cc has similar code). So, when Y is not fun-like macro, ' ' is added because padding_token's val.source->flags & PREV_WHITE is non-zero, while when it is fun-like macro, we don't add ' ' in between, because source is NULL and so used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set. Now, the funlike_invocation_p condition if (padding == NULL \|\| (!(padding->flags & PREV_WHITE) && token->val.source == NULL)) padding = token; looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume the intent was to prefer do the same thing and pick the right padding. But there are significant differences. Both stringify_arg and c-ppoutput.cc don't remember the CPP_PADDING token, but its val.source instead, while in funlike_invocation_p we want to remember the padding token that has the significant information for stringify_arg/c-ppoutput.cc. So, IMHO we want to overwrite padding if: 1) padding == NULL (remember that there was any padding at all) 2) padding->val.source == NULL (this matches the source == NULL case in stringify_arg) 3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL case in stringify_arg) 2022-02-01 Jakub Jelinek <jakub@redhat.com> PR preprocessor/104147 * macro.cc (funlike_invocation_p): For padding prefer a token with val.source non-NULL especially if it has PREV_WHITE set on val.source->flags. Add gcc_assert that CPP_PADDING tokens don't have PREV_WHITE set in flags. * c-c++-common/cpp/pr104147.c: New test.	2022-02-01 20:48:03 +01:00
Jakub Jelinek	efc46b550f	libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens The funlike_invocation_p macro never triggered, the other asserts did on some tests, see below for a full list. This seems to be caused by #pragma/_Pragma handling. do_pragma does: pfile->directive_result.src_loc = pragma_token_virt_loc; pfile->directive_result.type = CPP_PRAGMA; pfile->directive_result.flags = pragma_token->flags; pfile->directive_result.val.pragma = p->u.ident; when it sees a pragma, while start_directive does: pfile->directive_result.type = CPP_PADDING; and so does _cpp_do__Pragma. Now, for #pragma lex.cc will just ignore directive_result if it has CPP_PADDING type: if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE)) { if (pfile->directive_result.type == CPP_PADDING) continue; result = &pfile->directive_result; } but destringize_and_run does not: if (pfile->directive_result.type == CPP_PRAGMA) { ... } else { count = 1; toks = XNEW (cpp_token); toks[0] = pfile->directive_result; and from there it will copy type member of CPP_PADDING, but all the other members from the last CPP_PRAGMA before it. Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd). #pragma GCC push_options #pragma GCC ignored "-Wformat" #pragma GCC pop_options void foo () { _Pragma ("omp simd") for (int i = 0; i < 64; i++) ; } Here is a patch that replaces those toks = XNEW (cpp_token); toks[0] = pfile->directive_result; lines with toks = &pfile->avoid_paste; 2022-02-01 Jakub Jelinek <jakub@redhat.com> * directives.cc (destringize_and_run): Push &pfile->avoid_paste instead of a copy of pfile->directive_result for the CPP_PADDING case.	2022-02-01 20:42:49 +01:00
Jakub Jelinek	fa882c3e3b	rs6000: Fix up PCH on powerpc* [PR104323] As mentioned in the PR and as can be seen on: --- gcc/testsuite/gcc.dg/pch/pr104323-1.c.jj 2022-02-01 13:06:00.163192414 +0100 +++ gcc/testsuite/gcc.dg/pch/pr104323-1.c 2022-02-01 13:13:41.226712735 +0100 @@ -0,0 +1,16 @@ +/* PR target/104323 / +/ { dg-require-effective-target powerpc_altivec_ok } / +/ { dg-options "-maltivec" } / + +#include "pr104323-1.h" + +__vector int a1 = { 100, 200, 300, 400 }; +__vector int a2 = { 500, 600, 700, 800 }; +__vector int r; + +int +main () +{ + r = vec_add (a1, a2); + return 0; +} --- gcc/testsuite/gcc.dg/pch/pr104323-1.hs.jj 2022-02-01 13:06:03.180149978 +0100 +++ gcc/testsuite/gcc.dg/pch/pr104323-1.hs 2022-02-01 13:12:30.175706620 +0100 @@ -0,0 +1,5 @@ +/ PR target/104323 / +/ { dg-require-effective-target powerpc_altivec_ok } / +/ { dg-options "-maltivec" } / + +#include <altivec.h> testcase which I'm not including into testsuite because for some reason the test fails on non-powerpc targets (is done even on those and fails because of missing altivec.h etc.), PCH is broken on powerpc--* since the new builtin generator has been introduced. The generator contains or emits comments like: /* #### Cannot mark this as a GC root because only pointer types can be marked as GTY((user)) and be GC roots. All trees in here are kept alive by other globals, so not a big deal. Alternatively, we could change the enum fields to ints and cast them in and out to avoid requiring a GTY((user)) designation, but that seems unnecessarily gross. / Having the fntypes stored in other GC roots can work fine for GC, ggc_collect will then always mark them and so they won't disappear from the tables, but it definitely doesn't work for PCH, which when the arrays with fntype members aren't GTY marked means on PCH write we create copies of those FUNCTION_TYPEs and store in .gch that the GC roots should be updated, but don't store that rs6000_builtin_info[?].fntype etc. should be updated. When PCH is read again, the blob is read at some other address, GC roots are updated, rs6000_builtin_info[?].fntype contains garbage pointers (GC freed pointers with random data, or random unrelated types or other trees). The following patch fixes that. It stops any user markings because that is totally unnecessary, just skips fields we don't need to mark and adds GTY(()) to the 2 array variables. We can get rid of all those global vars for the fn types, they can be now automatic vars. With the patch we get { &rs6000_instance_info[0].fntype, 1 * (RS6000_INST_MAX), sizeof (rs6000_instance_info[0]), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, { &rs6000_builtin_info[0].fntype, 1 * (RS6000_BIF_MAX), sizeof (rs6000_builtin_info[0]), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, as the new roots which is exactly what we want and significantly more compact than countless { &uv2di_ftype_pudi_usi, 1, sizeof (uv2di_ftype_pudi_usi), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, { &uv2di_ftype_lg_puv2di, 1, sizeof (uv2di_ftype_lg_puv2di), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, { &uv2di_ftype_lg_pudi, 1, sizeof (uv2di_ftype_lg_pudi), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, { &uv2di_ftype_di_puv2di, 1, sizeof (uv2di_ftype_di_puv2di), &gt_ggc_mx_tree_node, &gt_pch_nx_tree_node }, cases (822 of these instead of just those 4 shown). 2022-02-01 Jakub Jelinek <jakub@redhat.com> PR target/104323 * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h rather than $(srcdir)/config/rs6000/rs6000-builtins.def. * config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use GTY((user)) for struct bifdata and struct ovlddata. Instead add GTY((skip(""))) to members with pointer and enum types that don't need to be tracked. Add GTY(()) to rs6000_builtin_info and rs6000_instance_info declarations. Don't emit gt_ggc_mx and gt_pch_nx declarations. (write_extern_fntype, write_fntype): Remove. (write_fntype_init): Emit the fntype vars as automatic vars instead of file scope ones. (write_header_file): Don't iterate with write_extern_fntype. (write_init_file): Don't iterate with write_fntype. Don't emit gt_ggc_mx and gt_pch_nx definitions.	2022-02-01 20:42:04 +01:00
Jason Merrill	8a37897862	c++: lambda in template default argument [PR103186] The problem with this testcase was that since my patch for PR97900 we weren't preserving DECL_UID identity for parameters of instantiations of templated functions, so using those parameters as the keys for the defarg_inst map broke. I think this was always fragile given the possibility of redeclarations, so instead of reverting that change let's switch to keying off the function. Memory use compiling stdc++.h is not noticeably different. PR c++/103186 gcc/cp/ChangeLog: * pt.cc (defarg_inst): Use tree_vec_map_cache_hasher. (defarg_insts_for): New. (tsubst_default_argument): Adjust. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-defarg10.C: New test.	2022-02-01 14:14:12 -05:00
Jason Merrill	b649071d4b	tree: move tree_vec_map_cache_hasher into header gcc/ChangeLog: * tree.h (struct tree_vec_map_cache_hasher): Move from... * tree.cc (struct tree_vec_map_cache_hasher): ...here.	2022-02-01 14:14:12 -05:00
Tom de Vries	f32f74c2e8	[nvptx] Add uniform_warp_check insn On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit "[nvptx] Update default ptx isa to 6.3". The problem is again that the first diverging branch is not handled as such in SASS, which causes problems with a subsequent shfl insn, but given that we have -mptx=3.1 we can't use the bar.warp.sync insn. Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a lesser importance, implement the next best thing: abort when detecting non-convergence using this insn: ... { .reg.b32 act; vote.ballot.b32 act,1; .reg.pred uni; setp.eq.b32 uni,act,0xffffffff; @ !uni trap; @ !uni exit; } ... Interestingly, the effect of this is that rather than aborting, the test-case now passes. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_single): Use nvptx_uniform_warp_check. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_UNIFORM_WARP_CHECK. (define_insn "nvptx_uniform_warp_check"): New define_insn.	2022-02-01 19:29:01 +01:00
Tom de Vries	bba61d403d	[nvptx] Add bar.warp.sync On a GT 1030 (sm_61), with driver version 470.94 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit "[nvptx] Update default ptx isa to 6.3". The first divergent branch looks like: ... { .reg .u32 %x; mov.u32 %x,%tid.x; setp.ne.u32 %r59,%x,0; } @ %r59 bra $L15; mov.u64 %r48,%ar0; mov.u32 %r22,2; ld.u64 %r53,[%r48]; mov.u32 %r55,%r22; mov.u32 %r54,1; $L15: ... and when inspecting the generated SASS, the branch is not setup as a divergent branch, but instead as a regular branch. This causes us to execute a shfl.sync insn in divergent mode, which is likely to cause trouble given a remark in the ptx isa version 6.3, which mentions that for .target sm_6x or below, all threads must excute the same shfl.sync instruction in convergence. Fix this by placing a "bar.warp.sync 0xffffffff" at the desired convergence point (in the example above, after $L15). Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_single): Use nvptx_warpsync. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_WARPSYNC. (define_insn "nvptx_warpsync"): New define_insn.	2022-02-01 19:28:57 +01:00
Tom de Vries	8ff0669f6d	[nvptx] Update default ptx isa to 6.3 With the following example, minimized from parallel-dims.c: ... int main (void) { int vectors_max = -1; #pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max) { for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) #pragma acc loop vector reduction (max: vectors_max) for (int k = 0; k < 32; k++) vectors_max = k; } if (vectors_max != 31) __builtin_abort (); return 0; } ... I run into (T400, driver version 470.94): ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \ execution test ... The FAIL does not happen with GOMP_NVPTX_JIT=-O0. The problem seems to be that the shfl insns for the vector reduction are not executed uniformly by the warp. Enforcing this by using shfl.sync fixes the problem. Fix this by setting the ptx isa to 6.3 by default, which allows the use of shfl.sync. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 by default.	2022-02-01 19:28:52 +01:00
Tom de Vries	57f971f992	[nvptx] Update bar.sync for ptx isa 6.0 In ptx isa 6.0, a new barrier instruction was added, and bar.sync was redefined as barrier.sync.aligned. The aligned modifier indicates that all threads in a CTA will execute the same barrier instruction. The seems fine for a form "bar.sync 0". But a "bar.sync %rx,64" (as used for vector length > 32) may execute a diffferent barrier depending on the value of %rx, so we can't assume it's aligned. Fix this by using "barrier.sync %rx,64" instead. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_6_0. * config/nvptx/nvptx.h (TARGET_PTX_6_0): New macro. * config/nvptx/nvptx.md (define_insn "nvptx_barsync"): Use barrier insn for TARGET_PTX_6_0.	2022-02-01 19:28:48 +01:00
Tom de Vries	456de10c54	[nvptx] Handle nop in prevent_branch_around_nothing When running libgomp test-case reduction-7.c on an nvptx accelerator (T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into: ... reduction-7.exe:reduction-7.c:312: v_p_2: \ Assertion `out[j * 32 + i] == (i + j) * 2' failed. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... During investigation I found ptx code like this: ... @ %r163 bra $L262; $L262: ... There's a known problem with executing this type of code, and a workaround is in place to address this: prevent_branch_around_nothing. The workaround does not trigger though because it doesn't handle the nop insn. Fix this by handling the nop insn in prevent_branch_around_nothing. Tested libgomp on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> PR target/100428 * config/nvptx/nvptx.cc (prevent_branch_around_nothing): Handle nop insn.	2022-02-01 19:28:39 +01:00
Tom de Vries	e0451f93d9	[nvptx] Add some support for .local atomics The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given that actually the operation is not supported on local address space. Fix this by falling back on a non-atomic version when detecting a frame-related memory operand. This only solves some cases that are detected at compile-time. It does however fix the openacc private-atomic-* test-cases. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1") (define_insn "atomic_exchange<mode>") (define_insn "atomic_fetch_add<mode>") (define_insn "atomic_fetch_addsf") (define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version if memory operands is frame-relative. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/stack-atomics-run.c: New test. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove PR83812 workaround. * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.	2022-02-01 19:28:24 +01:00
Tom de Vries	ca902055d0	[nvptx] Fix reduction lock When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \ execution test ... The problem is in this code generated for a gang reduction: ... $L39: atom.global.cas.b32 %r59, [__reduction_lock], 0, 1; setp.ne.u32 %r116, %r59, 0; @%r116 bra $L39; ld.f64 %r60, [%r44]; ld.f64 %r61, [%r44+8]; ld.f64 %r64, [%r44]; ld.f64 %r65, [%r44+8]; add.f64 %r117, %r64, %r22; add.f64 %r118, %r65, %r41; st.f64 [%r44], %r117; st.f64 [%r44+8], %r118; atom.global.cas.b32 %r119, [__reduction_lock], 1, 0; ... which is taking and releasing a lock, but missing the appropriate barriers to protect the loads and store inside the lock. Fix this by adding membar.gl barriers. Likewise, add membar.cta barriers if we protect shared memory loads and stores (even though the worker-partitioning part of the test-case is not failing). Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (enum nvptx_builtins): Add NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA. (VOID): New macro. (nvptx_init_builtins): Add MEMBAR_GL and MEMBAR_CTA. (nvptx_expand_builtin): Handle NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA. (nvptx_lockfull_update): Add level parameter. Emit barriers. (nvptx_reduction_update, nvptx_goacc_reduction_fini): Update call to nvptx_lockfull_update. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR_GL. (define_expand "nvptx_membar_gl"): New expand. (define_insn "*nvptx_membar_gl"): New insn.	2022-02-01 19:28:04 +01:00
Thomas Rodgers	07a971b28c	Strengthen memory order for atomic<T>::wait/notify This matches the memory order in libc++. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h: Change memory order from Acquire/Release with relaxed loads to SeqCst+Release for accesses to the waiter's count.	2022-02-01 09:04:10 -08:00
Martin Liska	3ad29854f0	docs: remove --disable-stage1-checking from requirements As the minimal GCC version that can build the current master is 4.8, it does not make sense mentioning something for older versions. gcc/ChangeLog: * doc/install.texi: Remove option for GCC < 4.8.	2022-02-01 17:14:01 +01:00
Jakub Jelinek	e9bf6d6b0e	veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307] The following testcase fails -fcompare-debug, because expand_vector_comparison since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses the lhs in its condition. Obviously we should ignore debug stmts when doing so, e.g. by not pushing them to uses. That would be a 2 liner change, but while looking at it, I'm also worried about VEC_COND_EXPRs that would use the lhs in more than one operand, like VEC_COND_EXPR <lhs, lhs, something> or VEC_COND_EXPR <lhs, something, lhs> (sure, they ought to be folded, but what if they weren't). Because if something like that happens, then FOR_EACH_IMM_USE_FAST would push the same stmt multiple times and expand_vector_condition can return true even when it modifies it (for vector bool masking). And lastly, it seems quite wasteful to safe_push statements that will just cause vec_cond_expr_only = false; and break; in the second loop, both for cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like 999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR use. So this patch only pushes VEC_COND_EXPRs there. 2022-02-01 Jakub Jelinek <jakub@redhat.com> PR middle-end/104307 * tree-vect-generic.cc (expand_vector_comparison): Don't push debug stmts to uses vector, just set vec_cond_expr_only to false for non-VEC_COND_EXPRs instead of pushing them into uses. Treat VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too like non-VEC_COND_EXPRs. * gcc.target/i386/pr104307.c: New test.	2022-02-01 16:02:54 +01:00
Bill Schmidt	7e83607907	rs6000: Don't #ifdef "short" built-in names It was recently pointed out that we get anomalous behavior when using __attribute__((target)) to select a CPU. As an example, when building for -mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal to call __builtin_vec_mod, but not vec_mod, even though these are equivalent. This is because the equivalence is established with a #define that is guarded by #ifdef _ARCH_PWR10. This goofy behavior occurs with both the old builtins support and the new. One of the goals of the new builtins support was to make sure all appropriate interfaces are available using __attribute__((target)), so I failed in this respect. This patch corrects the problem by removing the ifdef. Note that in a few cases we use an ifdef in a way that can't be overridden by __attribute__((target)), and we need to keep those. For example, #ifdef __PPU__ is still appropriate. 2022-01-06 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-overload.def (VEC_ABSD): Remove #ifdef token. (VEC_BLENDV): Likewise. (VEC_BPERM): Likewise. (VEC_CFUGE): Likewise. (VEC_CIPHER_BE): Likewise. (VEC_CIPHERLAST_BE): Likewise. (VEC_CLRL): Likewise. (VEC_CLRR): Likewise. (VEC_CMPNEZ): Likewise. (VEC_CNTLZ): Likewise. (VEC_CNTLZM): Likewise. (VEC_CNTTZM): Likewise. (VEC_CNTLZ_LSBB): Likewise. (VEC_CNTM): Likewise. (VEC_CNTTZ): Likewise. (VEC_CNTTZ_LSBB): Likewise. (VEC_CONVERT_4F32_8F16): Likewise. (VEC_DIV): Likewise. (VEC_DIVE): Likewise. (VEC_EQV): Likewise. (VEC_EXPANDM): Likewise. (VEC_EXTRACT_FP_FROM_SHORTH): Likewise. (VEC_EXTRACT_FP_FROM_SHORTL): Likewise. (VEC_EXTRACTH): Likewise. (VEC_EXTRACTL): Likewise. (VEC_EXTRACTM): Likewise. (VEC_EXTRACT4B): Likewise. (VEC_EXTULX): Likewise. (VEC_EXTURX): Likewise. (VEC_FIRSTMATCHINDEX): Likewise. (VEC_FIRSTMACHOREOSINDEX): Likewise. (VEC_FIRSTMISMATCHINDEX): Likewise. (VEC_FIRSTMISMATCHOREOSINDEX): Likewise. (VEC_GB): Likewise. (VEC_GENBM): Likewise. (VEC_GENHM): Likewise. (VEC_GENWM): Likewise. (VEC_GENDM): Likewise. (VEC_GENQM): Likewise. (VEC_GENPCVM): Likewise. (VEC_GNB): Likewise. (VEC_INSERTH): Likewise. (VEC_INSERTL): Likewise. (VEC_INSERT4B): Likewise. (VEC_LXVL): Likewise. (VEC_MERGEE): Likewise. (VEC_MERGEO): Likewise. (VEC_MOD): Likewise. (VEC_MSUB): Likewise. (VEC_MULH): Likewise. (VEC_NAND): Likewise. (VEC_NCIPHER_BE): Likewise. (VEC_NCIPHERLAST_BE): Likewise. (VEC_NEARBYINT): Likewise. (VEC_NMADD): Likewise. (VEC_ORC): Likewise. (VEC_PDEP): Likewise. (VEC_PERMX): Likewise. (VEC_PEXT): Likewise. (VEC_POPCNT): Likewise. (VEC_PARITY_LSBB): Likewise. (VEC_REPLACE_ELT): Likewise. (VEC_REPLACE_UN): Likewise. (VEC_REVB): Likewise. (VEC_RINT): Likewise. (VEC_RLMI): Likewise. (VEC_RLNM): Likewise. (VEC_SBOX_BE): Likewise. (VEC_SIGNEXTI): Likewise. (VEC_SIGNEXTLL): Likewise. (VEC_SIGNEXTQ): Likewise. (VEC_SLDB): Likewise. (VEC_SLV): Likewise. (VEC_SPLATI): Likewise. (VEC_SPLATID): Likewise. (VEC_SPLATI_INS): Likewise. (VEC_SQRT): Likewise. (VEC_SRDB): Likewise. (VEC_SRV): Likewise. (VEC_STRIL): Likewise. (VEC_STRIL_P): Likewise. (VEC_STRIR): Likewise. (VEC_STRIR_P): Likewise. (VEC_STXVL): Likewise. (VEC_TERNARYLOGIC): Likewise. (VEC_TEST_LSBB_ALL_ONES): Likewise. (VEC_TEST_LSBB_ALL_ZEROS): Likewise. (VEC_VEE): Likewise. (VEC_VES): Likewise. (VEC_VIE): Likewise. (VEC_VPRTYB): Likewise. (VEC_VSCEEQ): Likewise. (VEC_VSCEGT): Likewise. (VEC_VSCELT): Likewise. (VEC_VSCEUO): Likewise. (VEC_VSEE): Likewise. (VEC_VSES): Likewise. (VEC_VSIE): Likewise. (VEC_VSTDC): Likewise. (VEC_VSTDCN): Likewise. (VEC_VTDC): Likewise. (VEC_XL): Likewise. (VEC_XL_BE): Likewise. (VEC_XL_LEN_R): Likewise. (VEC_XL_SEXT): Likewise. (VEC_XL_ZEXT): Likewise. (VEC_XST): Likewise. (VEC_XST_BE): Likewise. (VEC_XST_LEN_R): Likewise. (VEC_XST_TRUNC): Likewise. (VEC_XXPERMDI): Likewise. (VEC_XXSLDWI): Likewise. (VEC_TSTSFI_EQ_DD): Likewise. (VEC_TSTSFI_EQ_TD): Likewise. (VEC_TSTSFI_GT_DD): Likewise. (VEC_TSTSFI_GT_TD): Likewise. (VEC_TSTSFI_LT_DD): Likewise. (VEC_TSTSFI_LT_TD): Likewise. (VEC_TSTSFI_OV_DD): Likewise. (VEC_TSTSFI_OV_TD): Likewise. (VEC_VADDCUQ): Likewise. (VEC_VADDECUQ): Likewise. (VEC_VADDEUQM): Likewise. (VEC_VADDUDM): Likewise. (VEC_VADDUQM): Likewise. (VEC_VBPERMQ): Likewise. (VEC_VCLZB): Likewise. (VEC_VCLZD): Likewise. (VEC_VCLZH): Likewise. (VEC_VCLZW): Likewise. (VEC_VCTZB): Likewise. (VEC_VCTZD): Likewise. (VEC_VCTZH): Likewise. (VEC_VCTZW): Likewise. (VEC_VEEDP): Likewise. (VEC_VEESP): Likewise. (VEC_VESDP): Likewise. (VEC_VESSP): Likewise. (VEC_VIEDP): Likewise. (VEC_VIESP): Likewise. (VEC_VPKSDSS): Likewise. (VEC_VPKSDUS): Likewise. (VEC_VPKUDUM): Likewise. (VEC_VPKUDUS): Likewise. (VEC_VPOPCNT): Likewise. (VEC_VPOPCNTB): Likewise. (VEC_VPOPCNTD): Likewise. (VEC_VPOPCNTH): Likewise. (VEC_VPOPCNTW): Likewise. (VEC_VPRTYBD): Likewise. (VEC_VPRTYBQ): Likewise. (VEC_VPRTYBW): Likewise. (VEC_VRLD): Likewise. (VEC_VSLD): Likewise. (VEC_VSRAD): Likewise. (VEC_VSRD): Likewise. (VEC_VSTDCDP): Likewise. (VEC_VSTDCNDP): Likewise. (VEC_VSTDCNQP): Likewise. (VEC_VSTDCNSP): Likewise. (VEC_VSTDCQP): Likewise. (VEC_VSTDCSP): Likewise. (VEC_VSUBECUQ): Likewise. (VEC_VSUBEUQM): Likewise. (VEC_VSUBUDM): Likewise. (VEC_VSUBUQM): Likewise. (VEC_VTDCDP): Likewise. (VEC_VTDCSP): Likewise. (VEC_VUPKHSW): Likewise. (VEC_VUPKLSW): Likewise.	2022-02-01 08:55:48 -06:00
Andreas Krebbel	b9ebf6c330	PR101260 regcprop: Add mode change check for copy reg When propagating a multi-word register into an access with a smaller mode the can_change_mode backend hook is already consulted for the original register. This however is also required for the intermediate copy in copy_regno which might use a different register class. gcc/ChangeLog: PR rtl-optimization/101260 * regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for copy_regno. gcc/testsuite/ChangeLog: PR rtl-optimization/101260 * gcc.target/s390/pr101260.c: New testcase.	2022-02-01 13:33:55 +01:00
Xi Ruoyao	34afa19d29	fold-const: do not fold NaN result from non-NaN operands [PR95115] These operations should raise an invalid operation exception at runtime. So they should not be folded during compilation unless -fno-trapping-math is used. gcc/ PR middle-end/95115 * fold-const.cc (const_binop): Do not fold NaN result from non-NaN operands. gcc/testsuite * gcc.dg/pr95115.c: New test.	2022-02-01 18:20:57 +08:00
Tom de Vries	d43fbc7d3f	[libgomp, testsuite] Fix insufficient resources in test-cases When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_workers = x and vector_length = y' on \ that offloaded region or '-fopenacc-dim=❌y' where x * y <= 896. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... The error does not occur when using GOMP_NVPTX_JIT=-O0. Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia. Likewise for some other test-cases. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce num_workers for nvidia accelerator to fix libgomp error 'insufficient resources'. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Same. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.	2022-02-01 08:15:00 +01:00
Tom de Vries	be362d5e12	[libgomp, testsuite] Reduce recursion depth in declare_target-.f90 When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx accelerator (Nvidia T400, 2GB), I run into: ... libgomp: cuCtxSynchronize error: unspecified launch failure \ (perhaps abort was called) libgomp: cuMemFree_v2 error: unspecified launch failure libgomp: device finalization failed FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test ... The test-case contains: ... ! Reduced from 25 to 23, otherwise execution runs out of thread stack on ! Nvidia Titan V. if (fib (23) /= fib_wrapper (23)) stop 2 ... Fix this by reducing the fib/fib_wrapper argument from 23 to 22. Same for declare_target-2.f90. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce recursion depth. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-02-01 08:13:06 +01:00
Tom de Vries	2989516651	[ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns As mentioned in PR56888 comment 21: ... -fno-tree-loop-distribute-patterns is the reliable way to not transform loops into library calls. ... However, since commit `6f966f0614` ("ldist: Recognize strlen and rawmemchr like loops") a strlen or rawmemchr library call may be introduced by ldist. This caused regressions in testcases gcc.c-torture/execute/builtins/strlen{,-2,-3}.c for nvptx. Fix this by not calling transform_reduction_loop from loop_distribution::execute for -fno-tree-loop-distribute-patterns. Tested regressed test-cases as well as gcc.dg/tree-ssa/ldist-.c on nvptx. gcc/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> tree-loop-distribution.cc (generate_reduction_builtin_1): Check for -ftree-loop-distribute-patterns. (loop_distribution::execute): Don't call transform_reduction_loop for -fno-tree-loop-distribute-patterns. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * gcc.dg/tree-ssa/ldist-strlen-4.c: New test.	2022-02-01 08:12:24 +01:00
GCC Administrator	1bb5266257	Daily bump.	2022-02-01 00:16:29 +00:00
Andrew Pinski	691924db0d	Fix comment for operand_compare::operand_equal_p. The OEP_* enums were moved to tree-core.h in r0-124973-g5e351e960763 but the comment was correct when it was added added to fold-const.h in r10-4231-g7f4a8ee03d40. This fixes the reference to the OEP_* enum to reference tree-core. Committed as obvious after a bootstrap/test on x86_64-linux. gcc/ChangeLog: * fold-const.h (operand_compare::operand_equal_p): Fix comment about OEP_* flags.	2022-01-31 23:26:18 +00:00
Ed Smith-Rowland	43ee212764	MAINTAINERS: Update my email and add myself to the DCO list. ChangeLog: 2022-01-31 Ed Smith-Rowland <esmithrowland@gmail.com> * MAINTAINERS: Update my email and add myself to the DCO list.	2022-01-31 18:05:40 -05:00
Marek Polacek	874ad5d674	c++: ICE with auto[] and VLA [PR102414] Here we ICE in unify_array_domain when we're trying to deduce the type of an array, as in auto(p)[i] = (int()[i])0; but unify_array_domain doesn't arbitrarily complex bounds. Another test is, e.g., auto (b)[0/0] = &a; where the type of the array is <<< Unknown tree: template_type_parm >>>[0:(sizetype) ((ssizetype) (0 / 0) - 1)] It seems to me that we need not handle these. PR c++/102414 PR c++/101874 gcc/cp/ChangeLog: decl.cc (create_array_type_for_decl): Use template_placeholder_p. Sorry on a variable-length array of auto. gcc/testsuite/ChangeLog: * g++.dg/cpp23/auto-array3.C: New test. * g++.dg/cpp23/auto-array4.C: New test.	2022-01-31 15:35:59 -05:00
Marek Polacek	b1a8b92f8f	c++: Reject union std::initializer_list [PR102434] Weird things are going to happen if you define your std::initializer_list as a union. In this case, we crash in output_constructor_regular_field. Let's not allow such a definition in the first place. PR c++/102434 gcc/cp/ChangeLog: * class.cc (finish_struct): Don't allow union initializer_list. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist128.C: New test.	2022-01-31 15:35:20 -05:00

1 2 3 4 5 ...

191409 Commits