2018-05-18 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR middle-end/85817
* ipa-pure-const.c (malloc_candidate_p): Remove the check integer_zerop
for retval and return false if all args to phi are zero.
testsuite/
* gcc.dg/tree-ssa/pr83648.c: Change scan-tree-dump to
scan-tree-dump-not for h.
From-SVN: r260358
2018-05-18 Toon Moene <toon@moene.org>
* doc/invoke.texi: Move -floop-unroll-and-jam documentation
directly after that of -floop-interchange. Indicate that both
options are enabled by default when specifying -O3.
From-SVN: r260352
We've a deficiency in our vec_set family of patterns.
We don't support directly loading a vector lane using LD1 for V2DImode and all the vector floating-point modes.
We do do it correctly for the other integer vector modes (V4SI, V8HI etc) though.
The alternatives on the relative floating-point patterns only allow a register-to-register INS instruction.
That means if we want to load a value into a vector lane we must first load it into a scalar register and then
perform an INS, which is wasteful.
There is also an explicit V2DI vec_set expander dangling around for no reason that I can see. It seems to do the
exact same things as the other vec_set expanders. This patch removes that.
It now unifies all vec_set expansions into a single "vec_set<mode>" define_expand using the catch-all VALL_F16 iterator.
With this patch we avoid loading values into scalar registers and then doing an explicit INS on them to move them into
the desired vector lanes. For example for:
typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));
v2di
foo_v2di (long long *a, long long *b)
{
v2di res = { *a, *b };
return res;
}
v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
v4sf res = { *a, *b, *c, *d };
return res;
}
we currently generate:
foo_v2di:
ldr d0, [x0]
ldr x0, [x1]
ins v0.d[1], x0
ret
foo_v4sf:
ldr s0, [x0]
ldr s3, [x1]
ldr s2, [x2]
ldr s1, [x3]
ins v0.s[1], v3.s[0]
ins v0.s[2], v2.s[0]
ins v0.s[3], v1.s[0]
ret
but with this patch we generate the much cleaner:
foo_v2di:
ldr d0, [x0]
ld1 {v0.d}[1], [x1]
ret
foo_v4sf:
ldr s0, [x0]
ld1 {v0.s}[1], [x1]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
ret
* config/aarch64/aarch64-simd.md (vec_set<mode>): Use VALL_F16 mode
iterator. Delete separate integer-mode vec_set<mode> expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set<mode>): Delete all other patterns with that name.
Use VALL_F16 mode iterator. Add LD1 alternative and use vwcore for
the "w, r" alternative.
* gcc.target/aarch64/vect-init-ld1.c: New test.
From-SVN: r260351
2018-05-18 Martin Liska <mliska@suse.cz>
* passes.def: Add pass_lower_switch and pass_lower_switch_O0.
* tree-pass.h (make_pass_lower_switch_O0): New function.
* tree-switch-conversion.c (node_has_low_bound): Remove.
(node_has_high_bound): Likewise.
(node_is_bounded): Likewise.
(class pass_lower_switch): Make it a template type and create
two instances.
(pass_lower_switch::execute): Add template argument.
(make_pass_lower_switch): New function.
(make_pass_lower_switch_O0): New function.
(do_jump_if_equal): Remove.
(emit_case_nodes): Simplify to just handle all 3 cases and leave
all the hard work to tree optimization passes.
2018-05-18 Martin Liska <mliska@suse.cz>
* gcc.dg/tree-ssa/vrp104.c: Adjust dump file that is scanned.
* gcc.dg/tree-prof/update-loopch.c: Likewise.
From-SVN: r260350
There are four optabs for various forms of fused multiply-add:
fma, fms, fnma and fnms. Of these, only fma had a direct gimple
representation. For the other three we relied on special pattern-
matching during expand, although tree-ssa-math-opts.c did have
some code to try to second-guess what expand would do.
This patch removes the old FMA_EXPR representation of fma and
introduces four new internal functions, one for each optab.
IFN_FMA is tied to BUILT_IN_FMA* while the other three are
independent directly-mapped internal functions. It's then
possible to do the pattern-matching in match.pd and
tree-ssa-math-opts.c (via folding) can select the exact
FMA-based operation.
The BRIG & HSA parts are a best guess, but seem relatively simple.
2018-05-18 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* doc/sourcebuild.texi (scalar_all_fma): Document.
* tree.def (FMA_EXPR): Delete.
* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
* internal-fn.c (ternary_direct): New macro.
(expand_ternary_optab_fn): Likewise.
(direct_ternary_optab_supported_p): Likewise.
* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
* builtins.c (fold_builtin_fma): Delete.
(fold_builtin_3): Don't call it.
* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (operand_equal_p): Likewise.
(fold_ternary_loc): Likewise.
* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
* gimple.c (DEFTREECODE): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-eh.c (operation_could_trap_p): Likewise.
(stmt_could_throw_1_p): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
* tree-ssa-loop-im.c (stmt_cost): Likewise.
* tree-ssa-operands.c (get_expr_operands): Likewise.
* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
* fold-const-call.h (fold_fma): Delete.
* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
CFN_FNMA and CFN_FNMS.
(fold_fma): Delete.
* genmatch.c (combined_fn): New enum.
(commutative_ternary_tree_code): Remove FMA_EXPR handling.
(commutative_op): New function.
(commutate): Use it. Handle more than 2 operands.
(dt_operand::gen_gimple_expr): Use commutative_op.
(parser::parse_expr): Allow :c to be used with non-binary
operators if the commutative operand is known.
* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
CFN_FMS, CFN_FNMA and CFN_FNMS.
(backprop::process_assign_use): Remove FMA_EXPR handling.
* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
(gen_hsa_fma): New function.
(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
IFN_FNMA and IFN_FNMS.
* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
* gimple-fold.h (follow_all_ssa_edges): Declare.
* gimple-fold.c (follow_all_ssa_edges): New function.
* tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
gimple_build interface and use follow_all_ssa_edges to fold the result.
(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
instead of checking for optabs directly.
* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
rather than FMA_EXPRs.
* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
call to IFN_FMA instead of an FMA_EXPR.
gcc/brig/
* brigfrontend/brig-function.cc
(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
for BRIG_OPCODE_FMA.
(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
as a call.
gcc/c/
* gimple-parser.c (c_parser_gimple_postfix_expression): Remove
__FMA_EXPR handlng.
gcc/cp/
* constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
(potential_constant_expression_1): Likewise.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_scalar_all_fma):
New proc.
* gcc.dg/fma-1.c: New test.
* gcc.dg/fma-2.c: Likewise.
* gcc.dg/fma-3.c: Likewise.
* gcc.dg/fma-4.c: Likewise.
* gcc.dg/fma-5.c: Likewise.
* gcc.dg/fma-6.c: Likewise.
* gcc.dg/fma-7.c: Likewise.
* gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
scalar_all_fma.
* gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
* gfortran.dg/reassoc_8.f: Likewise.
* gfortran.dg/reassoc_9.f: Likewise.
* gfortran.dg/reassoc_10.f: Likewise.
From-SVN: r260348
gcc/
* expr.c (do_tablejump): When converting index to Pmode, if we have a
sign extended promoted subreg, and the range does not have the sign bit
set, then do a sign extend.
* config/riscv/riscv.c (riscv_extend_comparands): In unsigned QImode
test, check for sign extended subreg and/or constant operands, and
do a sign extend in that case.
gcc/testsuite/
* gcc.target/riscv/switch-qi.c: New.
* gcc.target/riscv/switch-si.c: New.
From-SVN: r260340
Because path.cc is compiled with -std=gnu++17 the static constexpr
data member is implicitly 'inline' and so no definition gets emitted
unless it gets used in that translation unit. Other translation units
built as C++11 or C++14 still require a namespace-scope definition of
the variable, so mark the definition as used.
PR libstdc++/85818
* src/filesystem/path.cc (path::preferred_separator): Add used
attribute.
* testsuite/experimental/filesystem/path/preferred_separator.cc: New.
From-SVN: r260326
PR libstdc++/85812
* libsupc++/cxxabi_init_exception.h (__cxa_free_exception): Declare.
* libsupc++/exception_ptr.h (make_exception_ptr) [__cpp_exceptions]:
Refactor to separate non-throwing and throwing implementations.
[__cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI]: Deallocate the memory
if constructing the object throws.
From-SVN: r260323
This patch gets the gimple FE to parse calls to internal functions.
The only non-obvious thing was how the functions should be written
to avoid clashes with real function names. One option would be to
go the magic number of underscores route, but we already do that for
built-in functions, and it would be good to keep them visually
distinct. In the end I borrowed the local/internal label convention
from asm and used:
x = .SQRT (y);
2018-05-17 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* internal-fn.h (lookup_internal_fn): Declare
* internal-fn.c (lookup_internal_fn): New function.
* gimple.c (gimple_build_call_from_tree): Handle calls to
internal functions.
* gimple-pretty-print.c (dump_gimple_call): Print "." before
internal function names.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-scopedtables.c (expr_hash_elt::print): Likewise.
gcc/c/
* gimple-parser.c: Include internal-fn.h.
(c_parser_gimple_statement): Treat a leading CPP_DOT as a call.
(c_parser_gimple_call_internal): New function.
(c_parser_gimple_postfix_expression): Use it to handle CPP_DOT.
Fix typos in comment.
gcc/testsuite/
* gcc.dg/gimplefe-28.c: New test.
* gcc.dg/asan/use-after-scope-9.c: Adjust expected output for
internal function calls.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
From-SVN: r260316
This patch makes the function versions of gimple_build and
gimple_simplify take combined_fns rather than built_in_codes,
so that they work with internal functions too. The old
gimple_builds were unused, so no existing callers need
to be updated.
2018-05-17 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* gimple-fold.h (gimple_build): Make the function forms take
combined_fn rather than built_in_function.
(gimple_simplify): Likewise.
* gimple-match-head.c (gimple_simplify): Likewise.
* gimple-fold.c (gimple_build): Likewise.
* tree-vect-loop.c (get_initial_def_for_reduction): Use gimple_build
rather than gimple_build_call_internal.
(get_initial_defs_for_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_live_operation): Likewise.
From-SVN: r260315
2018-05-17 Martin Liska <mliska@suse.cz>
* gimple-ssa-sprintf.c (format_directive): Do not use
space in between 'G_' and '('.
2018-05-17 Martin Liska <mliska@suse.cz>
* c-warn.c (overflow_warning): Do not use
space in between 'G_' and '('.
2018-05-17 Martin Liska <mliska@suse.cz>
* gcc.dg/plugin/ggcplug.c (plugin_init): Do not use
space in between 'G_' and '('.
From-SVN: r260314
PR target/85323
* config/i386/i386.c (ix86_fold_builtin): Handle masked shifts
even if the mask is not all ones.
* gcc.target/i386/pr85323-7.c: New test.
* gcc.target/i386/pr85323-8.c: New test.
* gcc.target/i386/pr85323-9.c: New test.
From-SVN: r260313
* config/i386/avx512fintrin.h (_mm512_set_epi16, _mm512_set_epi8,
_mm512_setzero): New intrinsics.
* gcc.target/i386/avx512f-set-v32hi-1.c: New test.
* gcc.target/i386/avx512f-set-v32hi-2.c: New test.
* gcc.target/i386/avx512f-set-v32hi-3.c: New test.
* gcc.target/i386/avx512f-set-v32hi-4.c: New test.
* gcc.target/i386/avx512f-set-v32hi-5.c: New test.
* gcc.target/i386/avx512f-set-v64qi-1.c: New test.
* gcc.target/i386/avx512f-set-v64qi-2.c: New test.
* gcc.target/i386/avx512f-set-v64qi-3.c: New test.
* gcc.target/i386/avx512f-set-v64qi-4.c: New test.
* gcc.target/i386/avx512f-set-v64qi-5.c: New test.
* gcc.target/i386/avx512f-setzero-1.c: New test.
From-SVN: r260310
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting the remaining bits, and simply move the first element to the corrcet
place (implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
to catch:
(vec_merge:OUTER
(vec_duplicate:OUTER x:INNER)
(subreg:OUTER y:INNER 0)
(const_int N))
And simplify it to:
(vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function, without requiring the second operand to also be a vec_duplicate.
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
code generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
* gcc.target/aarch64/vect-slp-dup.c: New.
Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
From-SVN: r260309
2018-05-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/85757
* tree-ssa-dse.c (dse_classify_store): Record a PHI def and
remove defs that only feed that PHI from further processing.
* gcc.dg/tree-ssa/ssa-dse-34.c: New testcase.
From-SVN: r260306
DWARF5 defines a small header for .debug_str_offsets. Since we only use
it for split dwarf .dwo files we don't need to keep track of the actual
index offset in an attribute.
gcc/ChangeLog
* dwarf2out.c (count_index_strings): New function.
(output_indirect_strings): Call count_index_strings and generate
header for dwarf_version >= 5.
From-SVN: r260298
We already emit DWARF5 attributes and tables for indirect addresses
and string offsets, but still use GNU forms. Add a new helper function
dwarf_FORM () for emitting the right form.
Currently we only use the uleb128 forms. But DWARF5 also allows
1, 2, 3 and 4 byte forms (DW_FORM_strx[1234] and DW_FORM_addrx[1234])
which might be more space efficient.
gcc/ChangeLog
* dwarf2out.c (dwarf_FORM): New function.
(set_indirect_string): Use dwarf_FORM.
(reset_indirect_string): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(add_skeleton_AT_string): Likewise.
(output_macinfo_op): Likewise.
(index_string): Likewise.
(output_index_string_offset): Likewise.
(output_index_string): Likewise.
From-SVN: r260297
gcc/ChangeLog:
2018-05-16 Carl Love <cel@us.ibm.com>
* config/rs6000/rs6000.md (prefetch): Generate ISA 2.06 instructions
dcbt and dcbtstt with TH=16 if operands[2] is 0 and Power 8 or newer.
From-SVN: r260296
gcc/testsuite/ChangeLog:
2018-05-16 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vsx-vector-6-be.c: Remove file.
* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file.
* gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for
xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi.
From-SVN: r260294
This patch improves register allocation of fma by preferring to update the
accumulator register. This is done by adding fma insns with operand 1 as the
accumulator. The register allocator considers copy preferences only in operand
order, so if the first operand is dead, it has the highest chance of being
reused as the destination. As a result code using fma often has a better
register allocation. Performance of SPECFP2017 improves by over 0.5% on some
implementations, while it had no effect on other implementations. Fma is more
readable too, in a simple example we now generate:
fmadd s16, s2, s1, s16
fmadd s7, s17, s16, s7
fmadd s6, s16, s7, s6
fmadd s5, s7, s6, s5
instead of:
fmadd s16, s16, s2, s1
fmadd s7, s7, s16, s6
fmadd s6, s6, s7, s5
fmadd s5, s5, s6, s4
gcc/
* config/aarch64/aarch64.md (fma<mode>4): Change into expand pattern.
(fnma<mode>4): Likewise.
(fms<mode>4): Likewise.
(fnms<mode>4): Likewise.
(aarch64_fma<mode>4): Rename insn, reorder accumulator operand.
(aarch64_fnma<mode>4): Likewise.
(aarch64_fms<mode>4): Likewise.
(aarch64_fnms<mode>4): Likewise.
(aarch64_fnmadd<mode>4): Likewise.
From-SVN: r260292
2018-05-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (struct stmt_info_for_cost): Add where member.
(dump_stmt_cost): Declare.
(add_stmt_cost): Dump cost we add.
(add_stmt_costs): New function.
(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
No longer exported.
(vect_analyze_stmt): Adjust prototype.
(vectorizable_condition): Likewise.
(vectorizable_live_operation): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
* tree-vect-loop.c (vect_analyze_loop_operations): Create local
cost vector to pass to vectorizable_ and record afterwards.
(vect_model_reduction_cost): Take cost vector argument and adjust.
(vect_model_induction_cost): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_analyze_slp_cost_1): Remove.
(vect_analyze_slp_cost): Likewise.
(vect_slp_analyze_node_operations): Take visited args and
a target cost vector. Avoid processing already visited stmt sets.
(vect_slp_analyze_operations): Use a local cost vector to gather
costs and register those of non-discarded instances.
(vect_bb_vectorization_profitable_p): Use add_stmt_costs.
(vect_schedule_slp_instance): Remove copying of
SLP_TREE_NUMBER_OF_VEC_STMTS. Instead assert that it is not
zero.
* tree-vect-stmts.c (record_stmt_cost): Remove path directly
adding cost. Record cost entry location.
(vect_prologue_cost_for_slp_op): Function to compute cost of
a constant or invariant generated for SLP vect in the prologue,
split out from vect_analyze_slp_cost_1.
(vect_model_simple_cost): Make static. Adjust for SLP costing.
(vect_model_promotion_demotion_cost): Likewise.
(vect_model_store_cost): Likewise, make static.
(vect_model_load_cost): Likewise.
(vectorizable_bswap): Add cost vector arg and adjust.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(can_vectorize_live_stmts): Likewise.
(vect_analyze_stmt): Likewise.
(vect_transform_stmt): Adjust calls to vectorizable_*.
* tree-vectorizer.c: Include gimple-pretty-print.h.
(dump_stmt_cost): New function.
From-SVN: r260289
2018-05-16 Richard Biener <rguenther@suse.de>
* params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param.
* doc/invoke.texi (dse-max-alias-queries-per-store): Document.
* tree-ssa-dse.c: Include tree-ssa-loop.h.
(check_name): New callback.
(dse_classify_store): Track cycles via a visited bitmap of PHI
defs and simplify handling of in-loop and across loop dead stores
and properly fail for loop-variant refs. Handle byte-tracking with
multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for
limiting the walk.
* gcc.dg/tree-ssa/ssa-dse-32.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-33.c: Likewise.
* gcc.dg/uninit-pr81897-2.c: Use -fno-tree-dse.
From-SVN: r260288
The SLP unrolling factor is calculated by finding the smallest
scalar type for each SLP statement and taking the number of required
lanes from the vector versions of those scalar types. E.g. for an
int32->int64 conversion, it's the vector of int32s rather than the
vector of int64s that determines the unroll factor.
We rely on tree-vect-patterns.c to replace boolean operations like:
bool a, b, c;
a = b & c;
with integer operations of whatever the best size is in context.
E.g. if b and c are fed by comparisons of ints, a, b and c will become
the appropriate size for an int comparison. For most targets this means
that a, b and c will end up as int-sized themselves, but on targets like
SVE and AVX512 with packed vector booleans, they'll instead become a
small bitfield like :1, padded to a byte for memory purposes.
The SLP code would then take these scalar types and try to calculate
the vector type for them, causing the unroll factor to be much higher
than necessary.
This patch tries to make the SLP code use the same approach as the
loop vectorizer, by splitting out the code that calculates the
statement vector type and the vector type that should be used for
the number of units.
2018-05-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,
split out from...
(vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt
to determine the statement's vector type and the vector type that
should be used for calculating nunits. Deal with cases in which
the type has to be deferred.
(vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt
and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1)
(vect_determine_vf_for_stmt): New functions, split out from...
(vect_determine_vectorization_factor): ...here.
* tree-vect-stmts.c (vect_get_vector_types_for_stmt)
(vect_get_mask_type_for_stmt): New functions, split out from
vect_determine_vectorization_factor.
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_10.c: New test.
* gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
From-SVN: r260287