setnbc[r] is like setbc[r], but it writes -1 instead of 1 to the GPR.
2020-05-07 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.md (*setnbc_<un>signed_<GPR:mode>): New
define_insn.
(*setnbcr_<un>signed_<GPR:mode>): New define_insn.
(*neg_eq_<mode>): Avoid for TARGET_FUTURE; add missing && 1.
(*neg_ne_<mode>): Likewise.
New instructions setbc and setbcr. setbc sets a GPR to 1 if some
condition register bit is set, and 0 otherwise; setbcr does it the
other way around.
2020-05-07 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.md (setbc_<un>signed_<GPR:mode>): New
define_insn.
(*setbcr_<un>signed_<GPR:mode>): Likewise.
(cstore<mode>4): Use setbc[r] if available.
(<code><GPR:mode><GPR2:mode>2_isel): Avoid for TARGET_FUTURE.
(eq<mode>3): Use setbc for TARGET_FUTURE.
(*eq<mode>3): Avoid for TARGET_FUTURE.
(ne<mode>3): Replace :P with :GPR; use setbc for TARGET_FUTURE;
else for non-Pmode, use gen_eq and gen_xor.
(*ne<mode>3): Avoid for TARGET_FUTURE.
(*eqsi3_ext<mode>): Avoid for TARGET_FUTURE; fix missing && 1.
* config/h8300/h8300.md: Move expanders and patterns into
files based on functionality.
* config/h8300/addsub.md: New file.
* config/h8300/bitfield.md: New file
* config/h8300/combiner.md: New file
* config/h8300/divmod.md: New file
* config/h8300/extensions.md: New file
* config/h8300/jumpcall.md: New file
* config/h8300/logical.md: New file
* config/h8300/movepush.md: New file
* config/h8300/multiply.md: New file
* config/h8300/other.md: New file
* config/h8300/proepi.md: New file
* config/h8300/shiftrotate.md: New file
* config/h8300/testcompare.md: New file
commit da1de1d91088ac506c1bed0fba9b0f04c5b8c876
* config/h8300/h8300.md (adds/subs splitters): Merge into single
splitter.
(negation expanders and patterns): Simplify and combine using
iterators.
(one_cmpl expanders and patterns): Likewise.
(tablejump, indirect_jump patterns ): Likewise.
(shift and rotate expanders and patterns): Likewise.
(absolute value expander and pattern): Drop expander, rename pattern
to just "abssf2"
(peephole2 patterns): Move into...
* config/h8300/peepholes.md: New file.
Some new algorithms need to use _GLIBCXX_STD_A to refer to the "normal"
version of the algorithm, to workaround the namespace dance done for
parallel mode.
PR libstdc++/94971 (partial)
* include/bits/ranges_algo.h (ranges::__sample_fn): Qualify
std::sample using macro to work in parallel mode.
(__sort_fn): Likewise for std::sort.
(ranges::__nth_element_fn): Likewise for std::nth_element.
* include/bits/stl_algobase.h (lexicographical_compare_three_way):
Likewise for std::__min_cmp.
* include/parallel/algobase.h (lexicographical_compare_three_way):
Add to namespace std::__parallel.
This is a correct fix for the incorrect cppcheck suggestion to make
these parameters const. In order to that, the dereference operators need
to be const. The conversions to the underlying iterator can be const
too.
PR c/92472
* include/parallel/multiway_merge.h (_GuardedIterator::operator*)
(_GuardedIterator::operator _RAIter, _UnguardedIterator::operator*)
(_UnguardedIterator::operator _RAIter): Add const qualifier.
(operator<(_GuardedIterator&, _GuardedIterator&)
(operator<=(_GuardedIterator&, _GuardedIterator&)
(operator<(_UnguardedIterator&, _UnguardedIterator&)
(operator<=(_UnguardedIterator&, _UnguardedIterator&): Change
parameters to const references.
When we have completely missing key information (e.g. the
coroutine_traits) or a partially transformed function body, we
need to try and balance returning useful information about
failures with the possibility that some part of the diagnostics
machinery or following code will not be able to handle the
state.
The PRs (and revised testcase) point to cases where that processing
has failed.
This revises the process to avoid special handling for the
ramp, and falls back on the same code used for regular function
fails.
There are test-cases (in addition to the ones for the PRs) that now
cover all early exit points [where the transforms are considered
to have failed in a manner that does not allow compilation to
continue].
gcc/cp/ChangeLog:
2020-05-07 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94817
PR c++/94829
* coroutines.cc (morph_fn_to_coro): Set unformed outline
functions to error_mark_node. For early error returns suppress
warnings about missing ramp return values. Fix reinstatement
of the function body on pre-existing initial error.
* decl.c (finish_function): Use the normal error path for fails
in the ramp function, do not try to compile the helpers if the
transform fails.
gcc/testsuite/ChangeLog:
2020-05-07 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94817
PR c++/94829
* g++.dg/coroutines/coro-missing-final-suspend.C: New test.
* g++.dg/coroutines/coro-missing-initial-suspend.C: New test.
* g++.dg/coroutines/coro-missing-promise-yield.C: Check for
continuation of compilation.
* g++.dg/coroutines/coro-missing-promise.C: Likewise.
* g++.dg/coroutines/coro-missing-ret-value.C: Likewise
* g++.dg/coroutines/coro-missing-ret-void.C: Likewise
* g++.dg/coroutines/coro-missing-ueh-3.C: Likewise
* g++.dg/coroutines/pr94817.C: New test.
* g++.dg/coroutines/pr94829.C: New test.
This PR points out that we don't detect long double -> double narrowing
when long double happens to have the same precision as double; on x86_64
this can be achieved by -mlong-double-64.
[dcl.init.list]#7.2 specifically says "from long double to double or float,
or from double to float", but check_narrowing only checks
TYPE_PRECISION (type) < TYPE_PRECISION (ftype)
so we need to handle the other cases too, e.g. by same_type_p as in
the following patch.
PR c++/94590 - Detect long double -> double narrowing.
* typeck2.c (check_narrowing): Detect long double -> double
narrowing even when double and long double have the same
precision. Make it handle conversions to float too.
* g++.dg/cpp0x/Wnarrowing18.C: New test.
This is an ICE on invalid, because we're specializing S::foo in the
wrong namespace. cp_parser_class_specifier_1 parses S::foo in M
and then it tries to push the nested-name-specifier of foo, which is
S. By that, we're breaking the assumption of push_inner_scope that
the pushed scope must be a scope nested inside current scope: current
scope is M, but the namespace context of S is N, and N is not nested
in M, so we fell into an infinite loop in push_inner_scope_r.
(cp_parser_class_head called check_specialization_namespace which already
gave a permerror.)
PR c++/94255
* parser.c (cp_parser_class_specifier_1): Check that the scope is
nested inside current scope before pushing it.
* g++.dg/template/spec41.C: New test.
* tree-ssa-operands.c (operands_scanner): New class.
(operands_bitmap_obstack): Remove.
(n_initialized): Remove.
(build_uses): Move to operands_scanner class.
(build_vuse): Same as above.
(build_vdef): Same as above.
(verify_ssa_operands): Same as above.
(finalize_ssa_uses): Same as above.
(cleanup_build_arrays): Same as above.
(finalize_ssa_stmt_operands): Same as above.
(start_ssa_stmt_operands): Same as above.
(append_use): Same as above.
(append_vdef): Same as above.
(add_virtual_operand): Same as above.
(add_stmt_operand): Same as above.
(get_mem_ref_operands): Same as above.
(get_tmr_operands): Same as above.
(maybe_add_call_vops): Same as above.
(get_asm_stmt_operands): Same as above.
(get_expr_operands): Same as above.
(parse_ssa_operands): Same as above.
(finalize_ssa_defs): Same as above.
(build_ssa_operands): Same as above, plus create a C-like wrapper.
(update_stmt_operands): Create an instance of operands_scanner.
This was approved in the Prague 2020 WG21 meeting so let's adjust the
comment. Since it's supposed to be a DR I think we should no longer
limit it to C++20.
P1957R2
* typeck2.c (check_narrowing): Consider T* to bool narrowing
in C++11 and up.
* g++.dg/cpp0x/initlist92.C: Don't expect an error in C++20 only.
externally_visible_p wasn't the correct predicate to use (even if it
worked), instead we should use DECL_EXTERNAL || TREE_PUBLIC.
2020-05-07 Richard Biener <rguenther@suse.de>
PR ipa/94947
* tree-ssa-structalias.c (refered_from_nonlocal_fn): Use
DECL_EXTERNAL || TREE_PUBLIC instead of externally_visible.
(refered_from_nonlocal_var): Likewise.
(ipa_pta_execute): Likewise.
I was looking at DR 296 and noticed that we say "nonstatic" instead of
"non-static", which is the version the standard uses. So this patch
fixes the spelling throughout the front end. Did not check e.g.
non-dependent or any other.
* decl.c (grok_op_properties): Fix spelling of non-static.
* typeck.c (build_class_member_access_expr): Likewise.
* g++.dg/other/operator1.C: Adjust expected message.
* g++.dg/overload/operator2.C: Likewise.
* g++.dg/template/error30.C: Likewise.
* g++.old-deja/g++.jason/operator.C: Likewise.
This extends DECL_GIMPLE_REG_P to all types so we can clear
TREE_ADDRESSABLE even for integers with partial defs, not just
complex and vector variables. To make that transition easier
the patch inverts DECL_GIMPLE_REG_P to DECL_NOT_GIMPLE_REG_P
since that makes the default the current state for all other
types besides complex and vectors.
For the testcase in PR94703 we're able to expand the partial
def'ed local integer to a register then, producing a single
movl rather than going through the stack.
On i?86 this execute FAILs gcc.dg/torture/pr71522.c because
we now expand a round-trip through a long double automatic var
to a register fld/fst which normalizes the value. For that
during RTL expansion we're looking for problematic punnings
of decls and avoid pseudos for those - I chose integer or
BLKmode accesses on decls with modes where precision doesn't
match bitsize which covers the XFmode case.
2020-05-07 Richard Biener <rguenther@suse.de>
PR middle-end/94703
* tree-core.h (tree_decl_common::gimple_reg_flag): Rename ...
(tree_decl_common::not_gimple_reg_flag): ... to this.
* tree.h (DECL_GIMPLE_REG_P): Rename ...
(DECL_NOT_GIMPLE_REG_P): ... to this.
* gimple-expr.c (copy_var_decl): Copy DECL_NOT_GIMPLE_REG_P.
(create_tmp_reg): Simplify.
(create_tmp_reg_fn): Likewise.
(is_gimple_reg): Check DECL_NOT_GIMPLE_REG_P for all regs.
* gimplify.c (create_tmp_from_val): Simplify.
(gimplify_bind_expr): Likewise.
(gimplify_compound_literal_expr): Likewise.
(gimplify_function_tree): Likewise.
(prepare_gimple_addressable): Set DECL_NOT_GIMPLE_REG_P.
* asan.c (create_odr_indicator): Do not clear DECL_GIMPLE_REG_P.
(asan_add_global): Copy it.
* cgraphunit.c (cgraph_node::expand_thunk): Force args
to be GIMPLE regs.
* function.c (gimplify_parameters): Copy
DECL_NOT_GIMPLE_REG_P.
* ipa-param-manipulation.c
(ipa_param_body_adjustments::common_initialization): Simplify.
(ipa_param_body_adjustments::reset_debug_stmts): Copy
DECL_NOT_GIMPLE_REG_P.
* omp-low.c (lower_omp_for_scan): Do not set DECL_GIMPLE_REG_P.
* sanopt.c (sanitize_rewrite_addressable_params): Likewise.
* tree-cfg.c (make_blocks_1): Simplify.
(verify_address): Do not verify DECL_GIMPLE_REG_P setting.
* tree-eh.c (lower_eh_constructs_2): Simplify.
* tree-inline.c (declare_return_variable): Adjust and
generalize.
(copy_decl_to_var): Copy DECL_NOT_GIMPLE_REG_P.
(copy_result_decl_to_var): Likewise.
* tree-into-ssa.c (pass_build_ssa::execute): Adjust comment.
* tree-nested.c (create_tmp_var_for): Simplify.
* tree-parloops.c (separate_decls_in_region_name): Copy
DECL_NOT_GIMPLE_REG_P.
* tree-sra.c (create_access_replacement): Adjust and
generalize partial def support.
* tree-ssa-forwprop.c (pass_forwprop::execute): Set
DECL_NOT_GIMPLE_REG_P on decls we introduce partial defs on.
* tree-ssa.c (maybe_optimize_var): Handle clearing of
TREE_ADDRESSABLE and setting/clearing DECL_NOT_GIMPLE_REG_P
independently.
* lto-streamer-out.c (hash_tree): Hash DECL_NOT_GIMPLE_REG_P.
* tree-streamer-out.c (pack_ts_decl_common_value_fields): Stream
DECL_NOT_GIMPLE_REG_P.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Likewise.
* cfgexpand.c (avoid_type_punning_on_regs): New.
(discover_nonconstant_array_refs): Call
avoid_type_punning_on_regs to avoid unsupported mode punning.
lto/
* lto-common.c (compare_tree_sccs_1): Compare
DECL_NOT_GIMPLE_REG_P.
c/
* gimple-parser.c (c_parser_parse_ssa_name): Do not set
DECL_GIMPLE_REG_P.
cp/
* optimize.c (update_cloned_parm): Copy DECL_NOT_GIMPLE_REG_P.
* gcc.dg/tree-ssa/pr94703.c: New testcase.
The testcase in the current form doesn't FAIL without the patch on
x86_64-linux unless also testing with -m32; as that the 64-bit testing
on that target is probably way more common, and we can use also attributes
that FAIL without the patch with -m64, the following patch adjusts the
test, so that it FAILs without the patch for both -m64 and -m32 (but not
-mx32) and PASSes with the patch.
2020-05-07 Jakub Jelinek <jakub@redhat.com>
PR c++/94946
* g++.dg/ext/attr-parm-1.C: Enable the test also for lp64 x86, use
sysv_abi and ms_abi attributes in that case instead of fastcall and
no attribute.
If the second argument of __builtin_speculation_safe_value is
error_mark_node (or has such a type), we ICE during
useless_typ_conversion_p.
202-05-07 Jakub Jelinek <jakub@redhat.com>
PR c/94968
* c-common.c (speculation_safe_value_resolve_params): Return false if
error_operand_p (val2).
(resolve_overloaded_builtin) <case BUILT_IN_SPECULATION_SAFE_VALUE_N>:
Remove extraneous semicolon.
* gcc.dg/pr94968.c: New test.
The attached patch fixes a bootstrap failure on AArch32 introduced by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=308bc496884706af4b3077171cbac684c7a6f7c6
This makes the declaration of arm_add_stmt_cost match the definition, and removes the redundant
class keyword from the definition.
2020-05-07 Alex Coplan <alex.coplan@arm.com>
* config/arm/arm.c (arm_add_stmt_cost): Fix declaration, remove class
from definition.
This rewrites store-motion to process candidates where we can
ensure order preserving separately and with no need to disambiguate
against all stores. Those candidates we cannot handle this way
are validated to be independent on all stores (w/o TBAA) and then
processed as "unordered" (all conditionally executed stores are so
as well).
This will necessary cause
FAIL: gcc.dg/graphite/pr80906.c scan-tree-dump graphite "isl AST to Gimple succeeded"
because the SM previously performed is not valid for exactly the PR57359
reason, we still perform SM of qc for the innermost loop but that's not enough.
There is still room for improvements because we still check some constraints
for the order preserving cases that are only necessary in the current
strict way for the unordered ones. Leaving that for the furture.
2020-05-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/57359
* tree-ssa-loop-im.c (im_mem_ref::indep_loop): Remove.
(in_mem_ref::dep_loop): Repurpose.
(LOOP_DEP_BIT): Remove.
(enum dep_kind): New.
(enum dep_state): Likewise.
(record_loop_dependence): New function to populate the
dependence cache.
(query_loop_dependence): New function to query the dependence
cache.
(memory_accesses::refs_in_loop): Rename to ...
(memory_accesses::refs_loaded_in_loop): ... this and change to
only record loads.
(outermost_indep_loop): Adjust.
(mem_ref_alloc): Likewise.
(gather_mem_refs_stmt): Likewise.
(mem_refs_may_alias_p): Add tbaa_p parameter and pass it down.
(struct sm_aux): New.
(execute_sm): Split code generation on exits, record state
into new hash-map.
(enum sm_kind): New.
(execute_sm_exit): Exit code generation part.
(sm_seq_push_down): Helper for sm_seq_valid_bb performing
dependence checking on stores reached from exits.
(sm_seq_valid_bb): New function gathering SM stores on exits.
(hoist_memory_references): Re-implement.
(refs_independent_p): Add tbaa_p parameter and pass it down.
(record_dep_loop): Remove.
(ref_indep_loop_p_1): Fold into ...
(ref_indep_loop_p): ... this and generalize for three kinds
of dependence queries.
(can_sm_ref_p): Adjust according to hoist_memory_references
changes.
(store_motion_loop): Don't do anything if the set of SM
candidates is empty.
(tree_ssa_lim_initialize): Adjust.
(tree_ssa_lim_finalize): Likewise.
* gcc.dg/torture/pr57359-1.c: New testcase.
* gcc.dg/torture/pr57359-1.c: Likewise.
* gcc.dg/tree-ssa/ssa-lim-14.c: Likewise.
* gcc.dg/graphite/pr80906.c: XFAIL.
The -fgnat-encodings=minimal switch tells the compiler to generate mostly
pure DWARF for the GNAT compiler and it contains some bugs related to
discriminated record types with variant part.
* dwarf2out.c (add_data_member_location_attribute): Account for
the variant part offset in the computation of the data bit offset.
(add_bit_offset_attribute): Remove CTX parameter. Pass a new
context in the call to field_byte_offset.
(gen_field_die): Adjust call to add_bit_offset_attribute and
remove confusing assertion.
(analyze_variant_discr): Deal with boolean subtypes.
Essentially the same fix as for x86.
2020-05-07 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/alpha/alpha.c (alpha_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
fenv_var and new_fenv_var.
Here we ICE with -std=c++98 since the newly added call to uses_template_parms
(r10-6357): we hit
26530 gcc_assert (cxx_dialect >= cxx11
26531 || INTEGRAL_OR_ENUMERATION_TYPE_P (type));
and TYPE is a record type. The problem is that the argument to
value_dependent_expression_p does not satisfy potential_constant_expression
which it must, as the comment explains. I thought about fixing this in
uses_template_parms -- only call v_d_e_p if p_c_e is true, but in this
case we want to also suppress the warnings if we don't have a constant
expression. I couldn't simply check TREE_CONSTANT as in
compute_array_index_type_loc, because then we'd stop warning in the new
Wtype-limits3.C test.
Fixed by using type_dependent_expression_p_push instead. This means
that we won't suppress the warnings for value-dependent expressions that
aren't type-dependent, e.g. sizeof (T). This only seems to make a
difference for -Wdiv-by-zero, now tested in Wdiv-by-zero-3.C, where I
think it's reasonable to warn. It could make -Wtautological-compare
warn more, but that warning doesn't trigger when it gets constant arguments.
Wtype-limits4.C is a test reduced from poly-int.h and it tests a scenario
that was missing in our testsuite.
This patch also moves the warning_sentinels after the RECURs -- we mean
to use them for build_x_binary_op purposes only.
PR c++/94938
* pt.c (tsubst_copy_and_build): Call type_dependent_expression_p_push
instead of uses_template_parms. Move the warning_sentinels after the
RECURs.
* g++.dg/warn/Wdiv-by-zero-3.C: New test.
* g++.dg/warn/Wtype-limits4.C: New test.
* g++.dg/warn/template-2.C: New test.
* g++.old-deja/g++.pt/crash10.C: Add dg-warning.
Both array concat and array new expressions wrapped any temporaries
created into a BIND_EXPR. This does not work if an expression used to
construct the result requires scope destruction, which is represented by
a TARGET_EXPR with a clean-up, and a CLEANUP_POINT_EXPR at the
location where the temporaries logically go out of scope. The reason
for this not working is because the lowering of cleanup point
expressions does not traverse inside BIND_EXPRs to expand any gimple
cleanup expressions within.
The use of creating BIND_EXPR has been removed at both locations, and
replaced with a normal temporary variable that has initialization
delayed until its address is taken.
gcc/d/ChangeLog:
PR d/94970
* d-codegen.cc (force_target_expr): Move create_temporary_var
implementation inline here.
(create_temporary_var): Remove.
(maybe_temporary_var): Remove.
(bind_expr): Remove.
* d-convert.cc (d_array_convert): Use build_local_temp to generate
temporaries, and generate its assignment.
* d-tree.h (create_temporary_var): Remove.
(maybe_temporary_var): Remove.
(d_array_convert): Remove vars argument.
* expr.cc (ExprVisitor::visit (CatExp *)): Use build_local_temp to
generate temporaries, don't wrap them in a BIND_EXPR.
(ExprVisitor::visit (NewExp *)): Likewise.
gcc/testsuite/ChangeLog:
PR d/94970
* gdc.dg/pr94970.d: New test.
The following testcase gets a bogus warning during build_base_path,
when cp_build_indirect_ref* calls strict_aliasing_warning with a dependent
expression. IMHO calling get_alias_set etc. on dependent types feels wrong
to me, we should just defer the warnings in those cases until instantiation
and only handle the cases where neither type nor expr are dependent.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR c++/94951
* typeck.c (cp_strict_aliasing_warning): New function.
(cp_build_indirect_ref_1, build_reinterpret_cast_1): Use
it instead of strict_aliasing_warning.
* g++.dg/warn/Wstrict-aliasing-bogus-tmpl.C: New test.
On the following testcase we ICE, because synthesize_method is called twice
on the same sfk_comparison method fndecl, the first time it works fine
because start_preparsed_function in that case sets both
current_function_decl and cfun, but second time it is called it only sets
the former and keeps cfun NULL, so we ICE when trying to store
current_function_returns_value.
I think it is just wrong to call synthesize_method multiple times, and most
synthesize_method callers avoid that by not calling it if DECL_INITIAL is
already set, so this patch does that too.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR c++/94907
* method.c (defaulted_late_check): Don't call synthesize_method
on constexpr sfk_comparison if it has been called on it already.
* g++.dg/cpp2a/spaceship-synth8.C: New test.
Extend the overload so that it is used even when _GLIBCXX_DEBUG mode
is activated.
* include/bits/stl_algobase.h (struct _Bit_iterator): New declaration.
(std::__fill_a1(_Bit_iterator, _Bit_iterator, const bool&)): Likewise.
* include/bits/stl_bvector.h (__fill_bvector): Move outside
_GLIBCXX_STD_C namespace.
(fill(_Bit_iterator, _Bit_iterator, const bool&)): Likewise and rename
into...
(__fill_a1): ...this.
* testsuite/25_algorithms/fill/bvector/1.cc: New.
Introduce math_force_eval_div to use generic division to generate
INEXACT as well as INVALID and DIVZERO exceptions.
libgcc/ChangeLog:
* config/i386/sfp-exceptions.c (__math_force_eval): Remove.
(__math_force_eval_div): New define.
(__sfp_handle_exceptions): Use __math_force_eval_div to use
generic division to generate INVALID, DIVZERO and INEXACT
exceptions.
libatomic/ChangeLog:
* config/x86/fenv.c (__math_force_eval): Remove.
(__math_force_eval_div): New define.
(__atomic_deraiseexcept): Use __math_force_eval_div to use
generic division to generate INVALID, DIVZERO and INEXACT
exceptions.
libgfortran/ChangeLog:
* config/fpu-387.h (__math_force_eval): Remove.
(__math_force_eval_div): New define.
(local_feraiseexcept): Use __math_force_eval_div to use
generic division to generate INVALID, DIVZERO and INEXACT
exceptions.
(struct fenv): Define named struct instead of typedef.
Jason's fix for 90570 & 79585 was a bit overzealous. Dependent attribs should still
attach to a parameter decl.
* decl.c (grokdeclarator): Don't splice template attributes in
parm context -- they can apply to the parm.
This just strips out references to the draft standard
numbered n4849. The implementation is now intended to
be applicable to the expected final version.
gcc/cp/ChangeLog:
2020-05-05 Iain Sandoe <iain@sandoe.co.uk>
* coroutines.cc: Remove references to n4849 throughout.
The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.
The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8 $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 *<mask_name> corresponding patterns that allow memory destination
only for the non-masked cases (through <store_mask_constraint>), then 2
*<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.
But, even with that fixed, we are missing 3 further *_maskm patterns and
more importantly, I find the split into 3 separate patterns after subst,
*_maskm for masking with memory destination, *_mask for masking with
register destination and * for non-masking unnecessarily complex and harder
for reload, so the included patch below (non-attached) instead kills all
*_maskm patterns and splits the *<mask_name> patterns into * and *_mask
by hand instead of subst, where the *_mask ones make sure that with v
destination they use 0C, while with m destination they use 0 and as
condition enforce that either destination is not MEM, or rtx_equal_p between
the destination and corresponding merging-masking operand source.
If we had those 3 missing *_maskm patterns, this patch would actually result
in both shorter sse.md and shorter machine description after subst (e.g.
length of tmp-mddump.md), as we don't have them, the patch is actually 16
lines longer sse.md, but still shorter tmp-mddump.md.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR target/93069
* config/i386/subst.md (store_mask_constraint, store_mask_predicate):
Remove.
(avx512dq_vextract<shuffletype>64x2_1_maskm,
avx512f_vextract<shuffletype>32x4_1_maskm,
vec_extract_lo_<mode>_maskm, vec_extract_hi_<mode>_maskm): Remove.
(<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name>): Split
into ...
(*avx512dq_vextract<shuffletype>64x2_1,
avx512dq_vextract<shuffletype>64x2_1_mask): ... these new
define_insns. Even in the masked variant allow memory output but in
that case use 0 rather than 0C constraint on the source of masked-out
elts.
(<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>): Split
into ...
(*avx512f_vextract<shuffletype>32x4_1,
avx512f_vextract<shuffletype>32x4_1_mask): ... these new define_insns.
Even in the masked variant allow memory output but in that case use
0 rather than 0C constraint on the source of masked-out elts.
(vec_extract_lo_<mode><mask_name>): Split into ...
(vec_extract_lo_<mode>, vec_extract_lo_<mode>_mask): ... these new
define_insns. Even in the masked variant allow memory output but in
that case use 0 rather than 0C constraint on the source of masked-out
elts.
(vec_extract_hi_<mode><mask_name>): Split into ...
(vec_extract_hi_<mode>, vec_extract_hi_<mode>_mask): ... these new
define_insns. Even in the masked variant allow memory output but in
that case use 0 rather than 0C constraint on the source of masked-out
elts.
gcc/ChangeLog:
PR c/94230
* common.opt: Add -flarge-source-files.
* doc/invoke.texi: Document it.
* toplev.c (process_options): set line_table->default_range_bits
to 0 when flag_large_source_files is true.
gcc/c-family/ChangeLog:
PR c/94230
* c-indentation.c (get_visual_column): Add a hint to use the new
-flarge-source-files option.
gcc/testsuite/ChangeLog:
PR c/94230
* gcc.dg/plugin/location-overflow-test-1.c (fn_1): New message to
provide hint to use the new -flarge-source-files option.
Use carry flag from addition to implement GEU/LTU compares
with negated operand, so e.g.
~x < y
compiles to:
addq %rsi, %rdi
setc %al
instead of:
notq %rdi
cmpq %rsi, %rdi
setb %al
PR target/94913
* config/i386/predicates.md (add_comparison_operator): New predicate.
* config/i386/i386.md (compare->add splitter): New splitters.
testsuite/ChangeLog:
PR target/94913
* gcc.target/i386/pr94913-1.c: New test.
* gcc.target/i386/pr94913-2.c: Ditto.
This version of the fix uses __getauxval instead of getauxval.
The whole thing is guarded simply on __gnu_linux__.
__getauxval was introduced in 2.16 but the aarch64 port was added in 2.17 so in practice I expect all aarch64 glibcs to support __getauxval.
Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64-none-elf.
2020-05-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/lse-init.c (init_have_lse_atomics): Use __getauxval
instead of getauxval.
(AT_HWCAP): Define.
(HWCAP_ATOMICS): Define.
Guard detection on __gnu_linux__.
This removes trivial instances of SLP_INSTANCE_GROUP_SIZE and refrains
from using a "SLP instance" which nowadays is just one of the possibly
many entries into the SLP graph.
2020-05-06 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_transform_slp_perm_load): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Remove slp_instance parameter, just iterate over all scalar stmts.
(vect_slp_analyze_instance_dependence): Adjust and likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Remove unused BB
parameter.
(vect_schedule_slp): Just iterate over all scalar stmts.
(vect_supported_load_permutation_p): Adjust.
(vect_transform_slp_perm_load): Remove slp_instance parameter,
instead use the number of lanes in the node as group size.
* tree-vect-stmts.c (vect_model_load_cost): Get vectorization
factor instead of slp_instance as parameter.
(vectorizable_load): Adjust.
aarch64_get_extension_string_for_isa_flags is declared in
"aarch64-protos.h", use that instead of re-declaring it improperly.
* config/aarch64/driver-aarch64.c: Include "aarch64-protos.h".
(aarch64_get_extension_string_for_isa_flags): Don't declare.