OpenE2K/gcc - gcc - Expired Mentality Git

Commit Graph

Author	SHA1	Message	Date
Jakub Jelinek	eeb31391b7	combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908] git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13. I've managed to reduce it into the following testcase. The problem is that during combine we see the s->k = -1; bitfield store and change the SET_SRC from a pseudo into a constant: (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 2 [0x2]) (const_int 7 [0x7])) (const_int -1 [0xffffffffffffffff])) This on s390x with the above option isn't recognized as valid instruction, so find_split_point decides to handle it as IOR or IOR/AND. src is -1, mask is 3 and pos is 7. src != mask (this is also incorrect, we want to set all (both) bits in the bitfield), so we go for IOR/AND, but instead of trying mem = (mem & ~0x180) \| ((-1 << 7) & 0x180) we actually try mem = (mem & ~0x180) \| (-1 << 7) and that is further simplified into: mem = mem \| (-1 << 7) aka mem = mem \| 0xff80 which doesn't set just the 2-bit bitfield, but also many other bitfields that shouldn't be touched. We really should do: mem = mem \| 0x180 instead. The problem is that we assume that no bits but those low len (2 here) will be set in the SET_SRC, but there is nothing that can prevent that, we just should ignore the other bits. The following patch fixes it by masking src with mask, this way already the src == mask test will DTRT, and as the code for or_mask uses gen_int_mode, if the most significant bit is set after shifting it left by pos, it will be properly sign-extended. 2020-02-25 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/93908 * combine.c (find_split_point): For store into ZERO_EXTRACT, and src with mask. * gcc.c-torture/execute/pr93908.c: New test.	2020-02-25 14:01:55 +01:00
Eric Botcazou	a6b74eaedf	Fix link failure with debug info in LTO mode This fixes a regression whereby the program fails to link with debug info in LTO mode because of an undefined reference to a symbol coming from the object files containing the early debug info. * dwarf2out.c (dwarf2out_size_function): Run in early-DWARF mode.	2020-02-25 12:44:19 +01:00
Roman Zhuykov	468664e1b7	doc: backport proper description of --enable-checking behavior This patch rewords the whole description to fix minor issues: - documents 'gimple' and 'types' checks, - clarifies what happens when option is used without '=list', - fixes inaccurate wrong wording about release snapshots, - describes that release checks can only be disabled explicitly. Backport from master 2020-02-24 Roman Zhuykov <zhroma@ispras.ru> * doc/install.texi (--enable-checking): Properly document current behavior. (--enable-stage1-checking): Minor clarification about bootstrap.	2020-02-25 14:32:42 +03:00
Richard Sandiford	f9be6e10c9	vect: Fix offset calculation for -ve strides [PR93767] This PR is a regression caused by r256644, which added support for alias checks involving variable strides. One of the changes in that commit was to split the access size out of the segment length. The PR shows that I hadn't done that correctly for the handling of negative strides in vect_compile_time_alias. The old code was: const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi (); offset_a = (offset_a + vect_get_scalar_dr_size (a)) - const_length_a; where vect_get_scalar_dr_size (a) was cancelling out the subtraction of the access size inherent in "- const_length_a". Taking the access size out of the segment length meant that the addition was no longer needed/correct. 2020-02-24 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2020-02-19 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/93767 * tree-vect-data-refs.c (vect_compile_time_alias): Remove the access-size bias from the offset calculations for negative strides. gcc/testsuite/ Backport from mainline 2020-02-19 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/93767 * gcc.dg/vect/pr93767.c: New test.	2020-02-24 21:24:11 +00:00
Bernd Edlinger	8389fcc4c1	Avoid collect2 calling signal unsafe functions and/or unlink with uninitialized memory 2020-02-24 Bernd Edlinger <bernd.edlinger@hotmail.de> * collect2.c (tool_cleanup): Avoid calling not signal-safe functions. (maybe_run_lto_and_relink): Avoid possible signal handler access to unintialzed memory (lto_o_files).	2020-02-24 14:43:06 +01:00
Peter Bergner	066184a282	rs6000: Fix infinite loop building ghostscript and icu [PR93658] Fix rs6000_legitimate_address_p(), which erroneously marks a valid Altivec address as being invalid, which causes LRA's process_address() to go into an infinite loop spilling the same address over and over again. Include Mike's earlier commits that fix bugs this patch exposes. Backport from master 2020-02-20 Peter Bergner <bergner@linux.ibm.com> PR target/93658 * config/rs6000/rs6000.c (rs6000_legitimate_address_p): Handle VSX vector modes. * gcc.target/powerpc/pr93658.c: New test.	2020-02-23 18:31:56 -06:00
Michael Meissner	428a4feef8	Adjust how variable vector extraction is done. Backport from master 2020-02-03 Michael Meissner <meissner@linux.ibm.com> * config/rs6000/rs6000.c (get_vector_offset): New helper function to calculate the offset in memory from the start of a vector of a particular element. Add code to keep the element number in bounds if the element number is variable. (rs6000_adjust_vec_address): Move calculation of offset of the vector element to get_vector_offset. (rs6000_split_vec_extract_var): Do not do the initial AND of element here, move the code to get_vector_offset. Fix PR 93568 (thinko) Backport from master 2020-02-05 Michael Meissner <meissner@linux.ibm.com> PR target/93568 * config/rs6000/rs6000.c (get_vector_offset): Fix Q constraint assert to use MEM.	2020-02-23 18:31:56 -06:00
Michael Meissner	48558cdf49	Fix bad code of vector extract of PC-relative address with variable element #. Backport from master 2020-01-06 Michael Meissner <meissner@linux.ibm.com> * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator): Use 'Q' for doing vector extract from memory. (vsx_extract_v4sf_var): Use 'Q' for doing vector extract from memory. (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Use 'Q' for doing vector extract from memory. (vsx_extract_<mode>_<VS_scalar>mode_var): Use 'Q' for doing vector extract from memory.	2020-02-23 18:31:56 -06:00
John David Anglin	4ccda0308e	Fix handling of floating-point homogeneous aggregates. 2020-02-21 John David Anglin <danglin@gcc.gnu.org> * gcc/config/pa/pa.c (pa_function_value): Fix check for word and double-word size when handling aggregate return values. * gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate that homogeneous SFmode and DFmode aggregates are passed and returned in general registers.	2020-02-21 23:34:09 +00:00
Uros Bizjak	bd2537ed5d	i386: Fix vec_extractv2sf_1 and vec_extractv2sf_1 shufps alternative [PR93828] shufps moves two of the four packed single-precision floating-point values from destination operand (first operand) into the low quadword of the destination operand. Match source operand to the destination. PR target/93828 * config/i386/mmx.md (vec_extractv2sf_1): Match source operand to destination operand for shufps alternative. (vec_extractv2si_1): Ditto.	2020-02-20 21:58:57 +01:00
H.J. Lu	f55bf4ddbf	i386: Skip ENDBR32 at the target function entry Skip ENDBR32 at the target function entry when initializing trampoline. Tested on Linux/x86-64 CET machine with and without -m32. gcc/ Backport from master PR target/93656 * config/i386/i386.c (ix86_trampoline_init): Skip ENDBR32 at the target function entry. gcc/testsuite/ Backport from master PR target/93656 * gcc.target/i386/pr93656.c: New test. (cherry picked from commit `1d69147af2`)	2020-02-20 03:05:27 -08:00
Richard Sandiford	2408b93a10	Check for bitwise identity when encoding VECTOR_CSTs [PR92768] This PR shows that we weren't checking for bitwise-identical values when trying to encode a VECTOR_CST, so -0.0 was treated the same as 0.0 for -fno-signed-zeros. The patch adds a new OEP flag to select that behaviour. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-12-05 Richard Sandiford <richard.sandiford@arm.com> PR middle-end/92768 * tree-core.h (OEP_BITWISE): New flag. * fold-const.c (operand_compare::operand_equal_p): Handle it. * tree-vector-builder.h (tree_vector_builder::equal_p): Pass it. gcc/testsuite/ PR middle-end/92768 * gcc.dg/pr92768.c: New test.	2020-02-18 12:26:03 +00:00
Richard Sandiford	52db14c1f7	Reject tail calls that read from an escaped RESULT_DECL [PR90313] In this PR we have two return paths from a function "map". The common code sets <result> to the value returned by one path, while the other path does: <retval> = map (&<retval>, ...); We treated this call as tail recursion, losing the copy semantics on the value returned by the recursive call. We'd correctly reject the same thing for variables: local = map (&local, ...); The problem is that RESULT_DECLs didn't get the same treatment. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-08-09 Richard Sandiford <richard.sandiford@arm.com> PR middle-end/90313 * tree-tailcall.c (find_tail_calls): Reject calls that might read from an escaped RESULT_DECL. gcc/testsuite/ PR middle-end/90313 * g++.dg/torture/pr90313.cc: New test.	2020-02-18 12:26:02 +00:00
Richard Sandiford	e129cd5283	predcom: Fix invalid store-store commoning [PR93434] predcom has the following code to stop one rogue load from interfering with other store-load opportunities: /* If A is read and B write or vice versa and there is unsuitable dependence, instead of merging both components into a component that will certainly not pass suitable_component_p, just put the read into bad component, perhaps at least the write together with all the other data refs in it's component will be optimizable. / But when store-store commoning was added later, this had the effect of ignoring loads that occur between two candidate stores. There is code further up to handle loads and stores with unknown dependences: / Don't do store elimination if there is any unknown dependence for any store data reference. / if ((DR_IS_WRITE (dra) \|\| DR_IS_WRITE (drb)) && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know \|\| DDR_NUM_DIST_VECTS (ddr) == 0)) eliminate_store_p = false; But the store-load code above skips loads for known* dependences if (a) the load has already been marked "bad" or (b) the data-ref machinery knows the dependence distance, but determine_offsets can't handle the combination. (a) happens to be the problem in the testcase, but a different sequence could have given (b) instead. We have writes to individual fields of a structure and reads from the whole structure. Since determine_offsets requires the types to be the same, it returns false for each such read/write combination. This patch records which components have had loads removed and prevents store-store commoning for them. It's a bit too pessimistic, since there shouldn't be a problem if a "bad" load dominates all stores in a component. But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p here and (b) the handling for that case would probably need to be removed again if we handled more exotic cases in future. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/93434 * tree-predcom.c (split_data_refs_to_components): Record which components have had aliasing loads removed. Prevent store-store commoning for all such components. gcc/testsuite/ PR tree-optimization/93434 * gcc.c-torture/execute/pr93434.c: New test.	2020-02-18 08:52:01 +00:00
Richard Sandiford	84a4651717	Don't pass booleans as mask types to simd clones [PR92710] In this PR we assigned a vector mask type to the result of a comparison and then tried to pass that mask type to a simd clone, which expected a normal (non-mask) type instead. This patch simply punts on call arguments that have a mask type. A better fix would be to pattern-match the comparison to a COND_EXPR, like we would if the comparison was stored to memory, but doing that isn't gcc 9 or 10 material. Note that this doesn't affect x86_64-linux-gnu because the ABI promotes bool arguments to ints. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-11-29 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/92710 * tree-vect-stmts.c (vectorizable_simd_clone_call): Reject vector mask arguments. gcc/testsuite/ PR tree-optimization/92710 * gcc.dg/vect/pr92710.c: New test.	2020-02-18 08:52:00 +00:00
Richard Sandiford	2d8ea3a0a6	Fix SLP downward group access classification [PR92420] This PR was caused by the SLP handling in get_group_load_store_type returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for downward groups. A more elaborate fix would be to try to combine the reverse permutation into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on optimisation and not backport material. It might also not necessarily be a win, if the target supports (say) reversing and odd/even swaps as independent permutes but doesn't recognise the combined form. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-11-11 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/92420 * tree-vect-stmts.c (get_negative_load_store_type): Move further up file. (get_group_load_store_type): Use it for reversed SLP accesses. gcc/testsuite/ PR tree-optimization/92420 * gcc.dg/vect/pr92420.c: New test.	2020-02-18 08:51:59 +00:00
Prathamesh Kulkarni	65709f4b93	re PR target/90724 (ICE with __sync_bool_compare_and_swap with -march=armv8.2-a+sve) 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-08-21 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> PR target/90724 * config/aarch64/aarch64.c (aarch64_gen_compare_reg_maybe_ze): Force y in reg if it fails aarch64_plus_operand predicate.	2020-02-18 08:51:58 +00:00
liuhongt	0692bc0ca7	Add Changelog entries to relavent Changelog files for my last commit.	2020-02-17 08:55:17 +08:00
Uros Bizjak	bfa537a2ff	i386: Fix atan2l argument order [PR93743] PR target/93743 * config/i386/i386.md (atan2xf3): Swap operands 1 and 2. (atan2<mode>3): Update operand order in the call to gen_atan2xf3. testsuite/ChangeLog: PR target/93743 * gcc.target/i386/pr93743.c : New test.	2020-02-16 23:43:22 +01:00
Jakub Jelinek	4980553313	match.pd: Disallow side-effects in GENERIC for non-COND_EXPR to COND_EXPR simplifications [PR93744] As the following testcases show (the first one reported, last two found by code inspection), we need to disallow side-effects in simplifications that turn some unconditional expression into conditional one. From my little understanding of genmatch.c, it is able to automatically disallow side effects if the same operand is used multiple times in the match pattern, maybe if it is used multiple times in the replacement pattern, and if it is used in conditional contexts in the match pattern, could it be taught to handle this case too? If yes, perhaps just the first hunk could be usable for 8/9 backports (+ the testcases). 2020-02-15 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/93744 * match.pd (((m1 >/</>=/<= m2) * d -> (m1 >/</>=/<= m2) ? d : 0, A - ((A - B) & -(C cmp D)) -> (C cmp D) ? B : A, A + ((B - A) & -(C cmp D)) -> (C cmp D) ? B : A): For GENERIC, make sure @2 in the first and @1 in the other patterns has no side-effects. * gcc.c-torture/execute/pr93744-1.c: New test. * gcc.c-torture/execute/pr93744-2.c: New test. * gcc.c-torture/execute/pr93744-3.c: New test.	2020-02-15 13:22:10 +01:00
Eric Botcazou	c1379a1c64	Fix problematic TLS sequences for the Solaris linker This is an old thinko pertaining to the interaction between TLS sequences and delay slot filling: the compiler knows that it cannot put instructions with TLS relocations into delay slots with the original Sun TLS model, but it tests TARGET_SUN_TLS in this context, which depends only on the assembler. So if the compiler is configured with the GNU assembler and the Solaris linker, then TARGET_GNU_TLS is set instead and the limitation is not enforced. PR target/93704 * config/sparc/sparc.c (eligible_for_call_delay): Test HAVE_GNU_LD in conjunction with TARGET_GNU_TLS in early return.	2020-02-14 19:26:46 +01:00
Alexander Monakov	0f7b7aeb71	sel-sched: allow negative insn priority (PR 88879) PR rtl-optimization/88879 * sel-sched.c (sel_target_adjust_priority): Remove assert.	2020-02-14 16:55:27 +03:00
Richard Biener	b8c42b4d0a	middle-end/90648 fend off builtin calls with not enough arguments from match This adds guards to genmatch generated code before accessing call expression or stmt arguments that might be out of bounds when the user provided bogus prototypes for what we consider builtins. 2020-02-05 Richard Biener <rguenther@suse.de> PR middle-end/90648 * genmatch.c (dt_node::gen_kids_1): Emit number of argument checks before matching calls. * gcc.dg/pr90648.c: New testcase.	2020-02-14 14:35:26 +01:00
Richard Biener	b00c322804	tree-optimization/93381 fix integer offsetting in points-to analysis We were incorrectly assuming a merge operation is conservative enough for not explicitely handled operations but we also need to consider offsetting within fields when field-sensitive analysis applies. 2020-01-22 Richard Biener <rguenther@suse.de> PR tree-optimization/93381 * tree-ssa-structalias.c (find_func_aliases): Assume offsetting throughout, handle all conversions the same. * gcc.dg/torture/pr93381.c: New testcase.	2020-02-14 11:50:15 +01:00
Richard Biener	03d2b1d797	tree-optimization/93439 move clique bookkeeping to OMP expansion Autopar was doing clique bookkeeping too early when creating destination functions but then later introducing new cliques via versioning loops. The following moves the bookkeeping to the actual outlining process. 2020-02-14 Richard Biener <rguenther@suse.de> Backport from mainline 2020-01-28 Richard Biener <rguenther@suse.de> PR tree-optimization/93439 * tree-parloops.c (create_loop_fn): Move clique bookkeeping... * tree-cfg.c (move_sese_region_to_fn): ... here. (verify_types_in_gimple_reference): Verify used cliques are tracked. * gfortran.dg/graphite/pr93439.f90: New testcase.	2020-02-14 11:01:50 +01:00
Richard Biener	3bcda566d6	middle-end/93054 deal with undefs in call gimplification 2020-02-14 Richard Biener <rguenther@suse.de> Backport from mainline 2020-01-09 Richard Biener <rguenther@suse.de> PR middle-end/93054 * gimplify.c (gimplify_expr): Deal with NOP definitions. * gcc.dg/pr93054.c: New testcase.	2020-02-14 11:01:50 +01:00
Richard Biener	794bb8c2f5	debug/92763 keep DIEs that might be used in DW_TAG_inlined_subroutine We were pruning type-local subroutine DIEs if their context is unused despite us later needing those DIEs as abstract origins for inlines. The patch makes code already present for -fvar-tracking-assignments unconditional. 2020-02-14 Richard Biener <rguenther@suse.de> Backport from mainline 2020-01-20 Richard Biener <rguenther@suse.de> PR debug/92763 * dwarf2out.c (prune_unused_types): Unconditionally mark called function DIEs. * g++.dg/debug/pr92763.C: New testcase.	2020-02-14 11:01:50 +01:00
Richard Biener	4230afc0f4	tree-optimization/92704 fix ifcvt ICE with loops without stores 2020-02-14 Richard Biener <rguenther@suse.de> Backport from mainline 2019-11-29 Richard Biener <rguenther@suse.de> PR tree-optimization/92704 * tree-if-conv.c (combine_blocks): Deal with virtual PHIs in loops performing only loads. * gcc.dg/torture/pr92704.c: New testcase.	2020-02-14 11:01:49 +01:00
Richard Biener	c6480e01fc	middle-end/92674 delay purging EH edges when folding during inlining 2020-02-14 Richard Biener <rguenther@suse.de> Backport from mainline 2019-11-27 Richard Biener <rguenther@suse.de> PR middle-end/92674 * tree-inline.c (expand_call_inline): Delay purging EH/abnormal edges and instead record blocks in bitmap. (gimple_expand_calls_inline): Adjust. (fold_marked_statements): Delay EH cleanup until all folding is done. (optimize_inline_calls): Do EH/abnormal cleanup for calls after inlining finished.	2020-02-14 11:01:49 +01:00
Jakub Jelinek	08cf145f99	i386: Fix up _mm_mask_popcnt_epi [PR93696] As mentioned in the PR and as https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mask_popcnt_epi also documents, _mm_popcnt_epi intrinsics are consistent with all other unary AVX512* intrinsics regarding arguments, i.e. the _mm_whatever has just single argument (called a in the docs, and __A in the GCC headers), _mm_mask_whatever has 3 arguments (called src, k, a in the docs and _W, __U, __A in GCC headers) and _mm_maskz_whatever 2 arguments (called k, a in the docs and __U, __A in GCC headers). Unfortunately, whomever implemented the _mm_popcnt_epi* intrinsics got it wrong for the _mm_mask_popcnt_epi ones, calling the args __A, __U, __B and not passing them in the canonical order to the builtins, making it API incompatible with ICC as well as clang (tested on godbolts clang 7/8/9/trunk and ICC 19.0.{0,1}, older clang/ICC don't understand those, so it isn't that it used to be broken even in other compilers and got changed afterwards). 2020-02-13 Jakub Jelinek <jakub@redhat.com> PR target/93696 * config/i386/avx512bitalgintrin.h (_mm512_mask_popcnt_epi8, _mm512_mask_popcnt_epi16, _mm256_mask_popcnt_epi8, _mm256_mask_popcnt_epi16, _mm_mask_popcnt_epi8, _mm_mask_popcnt_epi16): Rename __B argument to __A and __A to __W, pass __A to the builtin followed by __W instead of __A followed by __B. * config/i386/avx512vpopcntdqintrin.h (_mm512_mask_popcnt_epi32, _mm512_mask_popcnt_epi64): Likewise. * config/i386/avx512vpopcntdqvlintrin.h (_mm_mask_popcnt_epi32, _mm256_mask_popcnt_epi32, _mm_mask_popcnt_epi64, _mm256_mask_popcnt_epi64): Likewise. * gcc.target/i386/pr93696-1.c: New test. * gcc.target/i386/pr93696-2.c: New test. * gcc.target/i386/avx512bitalg-vpopcntw-1.c (TEST): Fix argument order of _mm_mask_popcnt_. * gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c (TEST): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c (TEST): Likewise. * gcc.target/i386/avx512bitalg-vpopcntb-1.c (TEST): Likewise. * gcc.target/i386/avx512bitalg-vpopcntb.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntbvl.c (foo): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntd.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntwvl.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntw.c (foo): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntq.c (foo): Likewise.	2020-02-13 21:50:15 +01:00
Jakub Jelinek	488a947b2d	i386: Fix kshift intrinsics [PR93673] As mentioned in the PR, the intrinsics allow counts from 0 to 255, but we actually reject values from 128 to 255. That is because QImode CONST_INTs can be only -128 to 127. Fixed by using const_0_to_255_operand and dropping the modes for the operands with those predicates (the IL actually contains the CONST_INT which has VOIDmode). 2020-02-13 Jakub Jelinek <jakub@redhat.com> PR target/93673 * config/i386/sse.md (k<code><mode>): Drop mode from last operand and use const_0_to_255_operand predicate instead of immediate_operand. (avx512dq_fpclass<mode><mask_scalar_merge_name>, avx512dq_vmfpclass<mode><mask_scalar_merge_name>, vgf2p8affineinvqb_<mode><mask_name>, vgf2p8affineqb_<mode><mask_name>): Drop mode from const_0_to_255_operand predicated operands. * gcc.target/i386/avx512f-pr93673.c: New test. * gcc.target/i386/avx512dq-pr93673.c: New test. * gcc.target/i386/avx512bw-pr93673.c: New test.	2020-02-13 21:49:35 +01:00
Jakub Jelinek	20ac13c895	i386: Fix up vec_extract_lo* patterns [PR93670] The VEXTRACT* insns have way too many different CPUID feature flags (ATT syntax) vextractf128 $imm, %ymm, %xmm/mem AVX vextracti128 $imm, %ymm, %xmm/mem AVX2 vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F As the testcase shows and the patch too, we didn't get it right in all cases. The first hunk is about avx512vl_vextractf128v8s[if] incorrectly requiring TARGET_AVX512DQ. The corresponding insn is the first vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it correct (TARGET_AVX512VL implies TARGET_AVX512F): BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI) We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if]. The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that, but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL) for masked insns. This is extraction from ZMM, so it doesn't need VL for anything. The hunk actually only requires TARGET_AVX512DQ when the insn is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F and does the same thing, extracts the low 256 bits from 512 bits vector (often we split it into just nothing, but there are some special cases like when using xmm16+ when we can't without AVX512VL). The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask suffixed ones are ok already and just split into nothing (lowpart subreg). The masked ones were incorrectly requiring TARGET_AVX512VL and TARGET_AVX512DQ, when we only need TARGET_AVX512VL. 2020-02-12 Jakub Jelinek <jakub@redhat.com> PR target/93670 * config/i386/sse.md (VI48F_256_DQ): New mode iterator. (avx512vl_vextractf128<mode>): Use it instead of VI48F_256. Remove TARGET_AVX512DQ from condition. (vec_extract_lo_<mode><mask_name>): Use <mask_avx512dq_condition> instead of <mask_mode512bit_condition> in condition. If TARGET_AVX512DQ is false, emit vextract64x4 instead of vextract32x8. (vec_extract_lo_<mode><mask_name>): Drop <mask_avx512dq_condition> from condition. * gcc.target/i386/avx512vl-pr93670.c: New test.	2020-02-13 21:47:53 +01:00
Jakub Jelinek	b7cbce7a17	i386: Fix -mavx -mno-mavx2 ICE with VEC_COND_EXPR [PR93637] As mentioned in the PR, for -mavx -mno-avx2 the backend does support vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors aren't much supported in that case, it is performed using vandps/vandnps/vorps). The problem is that after the last generic vector lowering (where the VEC_COND_EXPR still compares two V4DF vectors and has two V4DI last operands and V4DI result and so is considered ok) fre4 folds the condition into constant, at which point the middle-end during expansion will try vcond_mask_optab and fall back to trying to expand it as the constant vector < 0 vcondv4div4di, but neither of them is supported for -mavx -mno-avx2 and thus we ICE. So, the options I see is either what the following patch does, also support vcond_mask_v4div4di and vcond_mask_v4siv4si already for TARGET_AVX, or require for vcondv4div4df and vcondv8siv8sf TARGET_AVX2 rather than current TARGET_AVX. 2020-02-10 Jakub Jelinek <jakub@redhat.com> PR target/93637 * config/i386/sse.md (VI_256_AVX2): New mode iterator. (vcond_mask_<mode><sseintvecmodelower>): Use it instead of VI_256. Change condition from TARGET_AVX2 to TARGET_AVX. * gcc.target/i386/avx-pr93637.c: New test.	2020-02-13 21:47:07 +01:00
Jakub Jelinek	a91e5d8897	i386: Make xmm16-xmm31 call used even in ms ABI [PR65782] On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. We are talking e.g. about /* { dg-options "-O2 -mabi=ms -mavx512vl" } / typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b = bar (); asm ("" : "=x" (a) : "0" (b)); c = a; } foo (); { register V d __asm ("xmm18"); V e; d = c; asm ("" : "=x" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=x" (x)); asm ("" : "=v" (y)); x += y; y += x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .def foo; .scl 2; .type 32; .endef .seh_proc foo foo: subq $40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm %xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm %xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq $40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 2020-02-08 Uroš Bizjak <ubizjak@gmail.com> Jakub Jelinek <jakub@redhat.com> PR target/65782 config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. Co-authored-by: Uroš Bizjak <ubizjak@gmail.com>	2020-02-13 21:46:13 +01:00
Jakub Jelinek	05fa0de35e	openmp: Fix handling of non-addressable shared scalars in parallel nested inside of target [PR93515] As the following testcase shows, we need to consider even target to be a construct that forces not to use copy in/out for shared on parallel inside of the target. E.g. for parallel nested inside another parallel or host teams, we already avoid copy in/out and we need to treat target the same. 2020-02-06 Jakub Jelinek <jakub@redhat.com> PR libgomp/93515 * omp-low.c (use_pointer_for_field): For nested constructs, also look for map clauses on target construct. (scan_omp_1_stmt) <case GIMPLE_OMP_TARGET>: Bump temporarily taskreg_nesting_level. * testsuite/libgomp.c-c++-common/pr93515.c: New test.	2020-02-13 21:45:15 +01:00
Jakub Jelinek	d3266b1311	openmp: Notice reduction decl in outer contexts after adding it to shared [PR93515] If we call omp_add_variable, following omp_notice_variable will already find it on that construct and not go through outer constructs, the following patch fixes that. Note, this still doesn't follow OpenMP 5.0 semantics on target combined with other constructs with reduction/lastprivate/linear clauses, will handle that for GCC11. 2020-02-06 Jakub Jelinek <jakub@redhat.com> PR libgomp/93515 * gimplify.c (gimplify_scan_omp_clauses) <do_notice>: If adding shared clause, call omp_notice_variable on outer context if any.	2020-02-13 21:44:14 +01:00
Jakub Jelinek	d42f9eaa3e	openmp: Avoid ICEs with declare simd; declare simd inbranch [PR93555] The testcases ICE because when processing the declare simd inbranch, we don't create the i == 0 clone as it already exists, which means clone_info->nargs is not adjusted, but we then rely on it being adjusted when trying other clones. 2020-02-05 Jakub Jelinek <jakub@redhat.com> PR middle-end/93555 * omp-simd-clone.c (expand_simd_clones): If simd_clone_mangle or simd_clone_create failed when i == 0, adjust clone->nargs by clone->inbranch. * c-c++-common/gomp/pr93555-1.c: New test. * c-c++-common/gomp/pr93555-2.c: New test. * gfortran.dg/gomp/pr93555.f90: New test.	2020-02-13 21:33:47 +01:00
Jakub Jelinek	329475795c	combine: Punt on out of range rotate counts [PR93505] What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/93505 * combine.c (simplify_comparison) <case ROTATE>: Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test.	2020-02-13 21:32:23 +01:00
Jakub Jelinek	764e831291	i386: Fix ix86_fold_builtin shift folding [PR93418] The following testcase is miscompiled, because the variable shift left operand, { -1, -1, -1, -1 } is represented as a VECTOR_CST with VECTOR_CST_NPATTERNS 1 and VECTOR_CST_NELTS_PER_PATTERN 1, so when we call builder.new_unary_operation, builder.encoded_nelts () will be just 1 and thus we encode the resulting vector as if all the elements were the same. For non-masked is_vshift, we could perhaps call builder.new_binary_operation (TREE_TYPE (args[0]), args[0], args[1], false), but then there are masked shifts, for non-is_vshift we could perhaps call it too but with args[2] instead of args[1], but there is no builder.new_ternary_operation. All this stuff is primarily for aarch64 anyway, on x86 we don't have any variable length vectors, and it is not a big deal to compute all elements and just let builder.finalize () find the most efficient VECTOR_CST representation of the vector. So, instead of doing too much, this just keeps using new_unary_operation only if only one VECTOR_CST is involved (i.e. non-masked shift by constant) and for the rest just compute all elts. 2020-01-28 Jakub Jelinek <jakub@redhat.com> PR target/93418 * config/i386/i386.c (ix86_fold_builtin) <do_shift>: If mask is not -1 or is_vshift is true, use new_vector with number of elts npatterns rather than new_unary_operation. * gcc.target/i386/avx2-pr93418.c: New test.	2020-02-13 21:27:53 +01:00
Jakub Jelinek	3b2fbe3e72	postreload: Fix up postreload combine [PR93402] The following testcase is miscompiled, because the postreload pass changes: -(insn 14 13 23 2 (parallel [ - (set (reg:DI 1 dx [94]) - (plus:DI (reg:DI 1 dx [95]) - (reg:DI 5 di [92]))) - (clobber (reg:CC 17 flags)) - ]) "pr93402.c":8:30 186 {adddi_1} - (expr_list:REG_EQUAL (plus:DI (reg:DI 5 di [92]) - (const_int 111111111111 [0x19debd01c7])) - (nil))) -(insn 23 14 25 2 (set (reg:SI 0 ax) +(insn 23 13 25 2 (set (reg:SI 0 ax) (const_int 0 [0])) "pr93402.c":10:1 67 {movsi_internal} (nil)) (insn 25 23 26 2 (use (reg:SI 0 ax)) "pr93402.c":10:1 -1 (nil)) -(insn 26 25 35 2 (use (reg:DI 1 dx)) "pr93402.c":10:1 -1 +(insn 26 25 35 2 (use (plus:DI (reg:DI 1 dx [95]) + (reg:DI 5 di [92]))) "pr93402.c":10:1 -1 (nil)) A USE insn is not a normal insn and verify_changes called from apply_change_group is happy about any changes into it. The following patch avoids this optimization if we were to change the USE operand (this routine only changes a reg into (plus reg reg2)). 2020-01-23 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/93402 * postreload.c (reload_combine_recognize_pattern): Don't try to adjust USE insns. * gcc.c-torture/execute/pr93402.c: New test.	2020-02-13 21:21:43 +01:00
Tamar Christina	f6e9ae4da8	middle-end: Fix logical shift truncation (PR rtl-optimization/91838) (gcc-9 backport) This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/91838 * g++.dg/opt/pr91838.C: New test.	2020-02-11 10:50:15 +00:00
H.J. Lu	850c38f5f4	x86-64: Pass aggregates with only float/double in GPRs for MS_ABI MS_ABI requires passing aggregates with only float/double in integer registers as shown in the output from MSVC v19.10 at: https://godbolt.org/z/2NPygd This patch fixed: FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O0 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O2 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O0 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O2 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O0 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O2 -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test in libffi testsuite. gcc/ Backport from mainline PR target/85667 * config/i386/i386.c (function_arg_ms_64): Add a type argument. Don't return aggregates with only SFmode and DFmode in SSE register. (ix86_function_arg): Pass type to function_arg_ms_64. gcc/testsuite/ Backport from mainline PR target/85667 * gcc.target/i386/pr85667-10.c: New test. * gcc.target/i386/pr85667-7.c: Likewise. * gcc.target/i386/pr85667-8.c: Likewise. * gcc.target/i386/pr85667-9.c: Likewise. (cherry picked from commit `ea5ca698dc`)	2020-02-07 03:52:26 -08:00
John David Anglin	6957d3e4ee	Fix ICE in pa_elf_select_rtx_section. 2020-01-30 John David Anglin <danglin@gcc.gnu.org> * config/pa/pa.c (pa_elf_select_rtx_section): Place function pointers without a DECL in .data.rel.ro.local.	2020-01-30 07:29:35 -05:00
Kito Cheng	afb84a42ad	RISC-V: Disallow regrenme if the TO register never used before for interrupt functions gcc/ChangeLog PR target/93304 * config/riscv/riscv-protos.h (riscv_hard_regno_rename_ok): New. * config/riscv/riscv.c (riscv_hard_regno_rename_ok): New. * config/riscv/riscv.h (HARD_REGNO_RENAME_OK): Defined. gcc/testsuite/ChangeLog PR target/93304 * gcc.target/riscv/pr93304.c: New test.	2020-01-30 15:33:07 +08:00
Szabolcs Nagy	a1f8dca201	[AArch64] PR92424: Fix -fpatchable-function-entry=N,M with BTI This is a workaround that emits a BTI after the function label if that is followed by a patch area. We try to remove the BTI that follows the patch area (this may fail e.g. if the first instruction is a PACIASP). So before this commit -fpatchable-function-entry=3,1 with bti generates .section __patchable_function_entries .8byte .LPFE .text .LPFE: nop foo: nop nop bti c // or paciasp ... and after this commit .section __patchable_function_entries .8byte .LPFE .text .LPFE: nop foo: bti c nop nop // may be paciasp ... and with -fpatchable-function-entry=1 (M=0) the code now is foo: bti c .section __patchable_function_entries .8byte .LPFE .text .LPFE: nop // may be paciasp ... There is a new bti insn in the middle of the patchable area users need to be aware of unless M=0 (patch area is after the new bti) or M=N (patch area is before the label, no new bti). Note: bti is not added to all functions consistently (it can be turned off per function using a target attribute or the compiler may detect that the function is never called indirectly), so if bti is inserted in the middle of a patch area then user code needs to deal with detecting it. Tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/92424 * config/aarch64/aarch64.c (aarch64_declare_function_name): Set cfun->machine->label_is_assembled. (aarch64_print_patchable_function_entry): New. (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define. * config/aarch64/aarch64.h (struct machine_function): New field, label_is_assembled. gcc/testsuite/ChangeLog: PR target/92424 * gcc.target/aarch64/pr92424-2.c: New test. * gcc.target/aarch64/pr92424-3.c: New test.	2020-01-29 14:31:50 +00:00
Wilco Dijkstra	a708cb25d9	[AArch64] Fix shrinkwrapping interactions with atomics (PR92692) The separate shrinkwrapping pass may insert stores in the middle of atomics loops which can cause issues on some implementations. Avoid this by delaying splitting atomics patterns until after prolog/epilog generation. gcc/ PR target/92692 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) Add assert to ensure prolog has been emitted. (aarch64_split_atomic_op): Likewise. * config/aarch64/atomics.md (aarch64_compare_and_swap<mode>) Use epilogue_completed rather than reload_completed. (aarch64_atomic_exchange<mode>): Likewise. (aarch64_atomic_<atomic_optab><mode>): Likewise. (atomic_nand<mode>): Likewise. (aarch64_atomic_fetch_<atomic_optab><mode>): Likewise. (atomic_fetch_nand<mode>): Likewise. (aarch64_atomic_<atomic_optab>_fetch<mode>): Likewise. (atomic_nand_fetch<mode>): Likewise. (cherry picked from commit `e5e07b6818`)	2020-01-27 12:58:02 +00:00
Jakub Jelinek	f4a36c5017	Cherry-pick 15 bugfixes from mainline r10-6140-gd80f0a8dc9c2e5886bb79bddee2674e1d3f9d105 r10-6137-gc892d8f58f6fed46c343bdb6dd4d365f08f801b8 r10-6136-g44a9d801a7080d39658754ad603536da6cff2cd0 r10-6135-ga38979d9d7a4ab08336436052704028c56187618 r10-6118-gbd0a3e244d94ad4a5e41f01ebf285f0861cb4a03 r10-6104-g51e010b5f75c1fff06425a72702c1bf82a3ab053 r10-6041-gc60a18f8056facdcf370ce0e5f51550c9df5b539 r10-5954-gfbbc4c24fd7ba87e0c47cd965ae624afba6fa375 r10-5897-g91df4397a1404df65de6de23426294c50ab88bd2 r10-5829-ga0ab54de0ec3e0d48b2a681f7f78fe14bc4099eb r10-5723-g5a6e28b5bae7a236b35994d0f64fd902a574872c r10-5712-g4ea5d54b3c7175de045589f994fc94ed7e59d80d r10-5697-g2c8297996a7ab3496c5d2f798cdbe4cab749468e r10-5650-g7cd268ad6a6f71877744539d17ed53e752774bfa r10-5618-g6c7b84305a5e686644ee64bfd2d415f3f43fa85b	2020-01-22 20:18:03 +01:00
Jakub Jelinek	b6c7913402	aarch64: Fix aarch64_expand_subvti constant handling [PR93335] The two patterns that call aarch64_expand_subvti ensure that {low,high}_in1 is a register, while {low,high}_in2 can be a register or immediate. subdi3_compare1_imm uses the aarch64_plus_immediate predicate for its last two operands (the value and negated value), but aarch64_expand_subvti calls it whenever low_in2 is a CONST_INT, which leads to ICEs during vregs pass, as the emitted insn is not recognized as valid subdi3_compare1_imm. The following patch fixes that by only using subdi3_compare1_imm if it is ok to do so, and otherwise force the constant into register and use the non-immediate version - subdi3_compare1. Furthermore, previously the code was calling force_reg on high_in2 only if low_in2 is CONST_INT, on the (reasonable) assumption is that only if low_in2 is a CONST_INT, high_in2 can be non-REG, but with the above changes even in the else we might have CONST_INT and force_reg doesn't do anything if the operand is already a REG, so this patch calls it unconditionally. 2020-01-22 Jakub Jelinek <jakub@redhat.com> PR target/93335 * config/aarch64/aarch64.c (aarch64_expand_subvti): Only use gen_subdi3_compare1_imm if low_in2 satisfies aarch64_plus_immediate predicate, not whenever it is CONST_INT. Otherwise, force_reg it. Call force_reg on high_in2 unconditionally. * gcc.c-torture/compile/pr93335.c: New test.	2020-01-22 20:12:58 +01:00
Jakub Jelinek	d1c29dc8a3	i386: Fix up -fdollars-in-identifiers with identifiers starting with $ in -masm=att [PR91298] In AT&T syntax leading $ is special, so if we have identifiers that start with dollar, we usually fail to assemble it (or assemble incorrectly). As mentioned in the PR, what works is wrapping the identifiers inside of parens, like: movl $($a), %eax leaq ($a)(,%rdi,4), %rax movl ($a)(%rip), %eax movl ($a)+16(%rip), %eax .globl $a .type $a, @object .size $a, 72 $a: .string "$a" .quad ($a) (this is x86_64 -fno-pic -O2). In some places ($a) is not accepted, like as .globl operand, in .type, .size, so the patch overrides ASM_OUTPUT_SYMBOL_REF rather than e.g. ASM_OUTPUT_LABELREF. I didn't want to duplicate what assemble_name is doing (following transparent aliases), so split assemble_name into two parts; just mere looking at the first character of a name before calling assemble_name wouldn't be good enough, a transparent alias could lead from a name not starting with $ to one starting with it and vice versa. 2020-01-22 Jakub Jelinek <jakub@redhat.com> PR target/91298 * output.h (assemble_name_resolve): Declare. * varasm.c (assemble_name_resolve): New function. (assemble_name): Use it. * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Define. * gcc.target/i386/pr91298-1.c: New test. * gcc.target/i386/pr91298-2.c: New test.	2020-01-22 20:12:57 +01:00
Jakub Jelinek	51faa475c9	riscv: Fix up riscv_rtx_costs for RTL checking (PR target/93333) As mentioned in the PR, during combine rtx_costs can be called sometimes even on RTL that has not been validated yet and so can contain even operands that aren't valid in any instruction. 2020-01-21 Jakub Jelinek <jakub@redhat.com> PR target/93333 * config/riscv/riscv.c (riscv_rtx_costs) <case ZERO_EXTRACT>: Verify the last two operands are CONST_INT_P before using them as such. * gcc.c-torture/compile/pr93333.c: New test.	2020-01-22 20:12:57 +01:00

1 2 3 4 5 ...

73851 Commits