This patch fixes a bug in the bswap pass. In big-endian BIT_FIELD_REF uses
big-endian bit numbering so we need to adjust the bit position.
The existing version could potentially generate incorrect code however GCC
doesn't emit a BIT_FIELD_REF to access the low byte in a register, so the
symbolic number never matches in big-endian.
gcc/
* tree-ssa-math-opts.c (find_bswap_or_nop_1): Adjust bitnumbering
for big-endian BIT_FIELD_REF.
From-SVN: r237822
* Makefile.in: Don't cat ../stage_current if it does not exist.
c/
* Make-lang.in: Don't cat ../stage_current if it does not exist.
cp/
* Make-lang.in: Don't cat ../stage_current if it does not exist.
lto/
* Make-lang.in: Don't cat ../stage_current if it does not exist.
From-SVN: r237817
PR rtl-optimization/71673
* internal-fn.c (expand_arith_overflow_result_store): Use
OPTAB_LIB_WIDEN instead of OPTAB_DIRECT as last argument to
expand_simple_binop.
From-SVN: r237815
PR middle-end/66867
* builtins.c (expand_ifn_atomic_compare_exchange_into_call,
expand_ifn_atomic_compare_exchange): New functions.
* internal-fn.c (expand_ATOMIC_COMPARE_EXCHANGE): New function.
* tree.h (build_call_expr_internal_loc): Rename to ...
(build_call_expr_internal_loc_array): ... this. Fix up type of
last argument.
* internal-fn.def (ATOMIC_COMPARE_EXCHANGE): New internal fn.
* predict.c (expr_expected_value_1): Handle IMAGPART_EXPR of
ATOMIC_COMPARE_EXCHANGE result.
* builtins.h (expand_ifn_atomic_compare_exchange): New prototype.
* gimple-fold.h (optimize_atomic_compare_exchange_p,
fold_builtin_atomic_compare_exchange): New prototypes.
* gimple-fold.c (optimize_atomic_compare_exchange_p,
fold_builtin_atomic_compare_exchange): New functions..
* tree-ssa.c (execute_update_addresses_taken): If
optimize_atomic_compare_exchange_p, ignore &var in 2nd argument
of call when finding addressable vars, and if such var becomes
non-addressable, call fold_builtin_atomic_compare_exchange.
From-SVN: r237814
The splitter for ashdi3_extswsli_dot for cr0 with memory uses emit_insn
gen_ashdi3_extswsli_dot, which does not work because that emits a scratch,
while the splitter runs after reload so there should be a real register
instead. We can laboriously fix that up, or emit using
gen_ashdi3_extswsli_dot2 instead. This patch does the latter.
PR target/71670
* config/rs6000/rs6000.md (ashdi3_extswsli_dot): Use
gen_ashdi3_extswsli_dot2 instead of gen_ashdi3_extswsli_dot.
gcc/testsuite/
PR target/71670
* gcc.target/powerpc/pr71670.c: New testcase.
From-SVN: r237813
gcc/
PR target/71656
* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add
OPTION_MASK_P9_DFORM_VECTOR.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Do not
disable -mpower9-dform-vector when using reload.
(quad_address_p): Remove 'gpr_p' argument and all associated code.
New 'strict' argument. Update all callers. Add strict addressing
support.
(rs6000_legitimate_offset_address_p): Remove call to
virtual_stack_registers_memory_p.
(rs6000_legitimize_reload_address): Add quad address support.
(rs6000_legitimate_address_p): Move call to quad_address_p above
call to virtual_stack_registers_memory_p. Adjust quad_address_p args
to account for new strict usage.
(rs6000_output_move_128bit): Adjust quad_address_p args to account
for new strict usage.
* config/rs6000/predicates.md (quad_memory_operand): Likewise.
gcc/testsuite/
PR target/71656
* gcc.target/powerpc/pr71656-1.c: New test.
* gcc.target/powerpc/pr71656-2.c: New test.
From-SVN: r237811
[gcc]
2016-06-27 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/vsx.md (UNSPEC_P9_MEMORY): New unspec to support
loading and storing byte/half-word values in the vector registers.
(vsx_sign_extend_hi_<mode>): Enable the generator function.
(p9_lxsi<wd>zx): New insns to load zero-extended bytes and
half-words on ISA 3.0 to the vector registers.
(p9_stxsi<wd>zx): New insns to store zero-extended bytes and
half-words on ISA 3.0 from the vector registers.
* config/rs6000/rs6000.md (FP_ISA3): New iterator to optimize
converting char/half-word items to floating point on ISA 3.0.
(float<QHI:mode><FP_ISA3:mode>2): On ISA 3.0 generate the lxsihzx
and lxsibzx instructions if we are converting an 8-bit or 16-bit
item from memory to floating point.
(float<QHI:mode><FP_ISA3:mode>2_internal): Likewise.
(floatuns<QHI:mode><FP_ISA3:mode>2): Likewise.
(floatuns<QHI:mode><FP_ISA3:mode>2_internal): Likewise.
(fix_trunc<SFDF:mode><QHI:mode>2): On ISA 3.0 generate the stxsihx
and stxsibx instructions to store floating point values converted
to 8 or 16-bit integers.
(fixuns_trunc<mode>si2): Likewise.
[gcc/testsuite]
2016-06-27 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/p9-fpcvt-1.c: New test to test ISA 3.0 load
byte/half-word to vector registers and store byte/half-word from
vector register instructions.
* gcc.target/powerpc/p9-fpcvt-2.c: Likewise.
From-SVN: r237806
* gcc.dg/predict-12.c: New testcase.
* predict.c: Include gimple-pretty-print.h
(predicted_by_loop_heuristics_p): Check also
PRED_LOOP_EXIT_WITH_RECURSION
(predict_loops): Find self recursive calls and use special purpose
predictors for them; dump log about decisions.
(pass_profile::execute): Dump info about #of iterations.
* predict.def (PRED_LOOP_EXIT_WITH_RECURSION,
(PRED_LOOP_GUARD_WITH_RECURSION): New predictors.
From-SVN: r237791
* config/pa/pa.c (pa_output_indirect_call): Rework to combine
output_asm_insn calls and shorten long lines. Output .CALL
argument descriptor using pa_output_arg_descriptor. Add various
inline $$dyncall and other optimizations.
(pa_attr_length_indirect_call): Adjust ordering and lengths.
From-SVN: r237790
PR tree-optimization/71631
* tree-ssa-reassoc.c (reassociate_bb): Pass true as last argument
to rewrite_expr_tree even if negate_result, move new_lhs var
declaration and initialization earlier, for powi_result set afterwards
new_lhs to lhs. For negate_result, use new_lhs instead of tmp
if new_lhs != lhs, and don't shadow gsi var.
* gcc.c-torture/execute/pr71631.c: New test.
From-SVN: r237782
[gcc]
2016-06-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
(BU_FLOAT128_1): Likewise.
(FABSQ): Likewise.
(COPYSIGNQ): Likewise.
(RS6000_BUILTIN_NANQ): Likewise.
(RS6000_BUILTIN_NANSQ): Likewise.
(RS6000_BUILTIN_INFQ): Likewise.
(RS6000_BUILTIN_HUGE_VALQ): Likewise.
* config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
(TARGET_FOLD_BUILTIN): New #define.
(rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
(rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
(rs6000_fold_builtin): New target hook implementation, handling
folding of 128-bit NaNs and infinities.
(rs6000_init_builtins): Initialize const_str_type_node; ensure all
entries are filled in to avoid problems during bootstrap
self-test; define builtins for 128-bit NaNs and infinities.
(rs6000_opt_mask): Add entry for float128.
* config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
(RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
(rs6000_builtin_type_index): Add RS6000_BTI_const_str.
(const_str_type_node): New #define.
* config/rs6000/rs6000.md (copysign<mode>3 for IEEE128): Convert
to a define_expand that dispatches to either copysign<mode>3_soft
or copysign<mode>3_hard.
(copysign<mode>3_hard): Rename from copysign<mode>3.
(copysign<mode>3_soft): New define_insn.
* doc/extend.texi: Document new builtins.
[gcc/testsuite]
2016-06-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/abs128-1.c: New.
* gcc.target/powerpc/copysign128-1.c: New.
* gcc.target/powerpc/inf128-1.c: New.
* gcc.target/powerpc/nan128-1.c: New.
From-SVN: r237774
PR tree-optimization/71647
* omp-low.c (lower_rec_input_clauses): Convert
omp_clause_aligned_alignment (c) to size_type_node for the
last argument of __builtin_assume_aligned.
* gcc.target/i386/pr71647.c: New test.
From-SVN: r237769
There are extensions to x86-64 psABI:
https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
and i386 psABI:
https://groups.google.com/forum/#!topic/ia32-abi/awsRSvJOJfs
to call tls_get_addr via GOT. X86 assembler and linker in binutils 2.27
implemented
call *__tls_get_addr@GOTPCREL(%rip)
in 64-bit and
call *___tls_get_addr@GOT(%reg)
in 32-bit to access global and local thread loal variables in shared
library. We check if 32-bit x86 assembler and linker work with
call *___tls_get_addr@GOT(%reg)
as 32-bit and 64-bit assembler and linker are enabled togther.
In 32-bit, since any integer register except EAX, which is used to pass
parameter to ___tls_get_addr, and ESP, can be used as GOT base, a new
register class, TLS_GOTBASE_REGS, along with a new constraint, Yb, are
added. They are used to improve register allocation for 32-bit dynamic
TLS patterns.
gcc/
* configure.ac (calling ___tls_get_addr via GOT): New
assembler/linker check.
(HAVE_AS_IX86_TLS_GET_ADDR_GOT): New. Defined to 1 if 32-bit
assembler and linker supports calling ___tls_get_addr via GOT.
Otherise, defined to 0.
* config.in: Regenerated.
* configure: Likewise.
* config/i386/constraints.md (Yb): New constraint.
* config/i386/i386.h (reg_class): Add TLS_GOTBASE_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
* config/i386/i386.md (*tls_global_dynamic_32_gnu): Replace
the b constraint with the Yb constraint. Call ___tls_get_addr
via GOT for GNU TLS with -fno-plt if HAVE_AS_IX86_TLS_GET_ADDR_GOT
is 1.
(*tls_local_dynamic_base_32_gnu): Likewise.
(*tls_global_dynamic_64_<mode>): Call _tls_get_addr via GOT for
GNU TLS with -fno-plt if HAVE_AS_IX86_TLS_GET_ADDR_GOT is 1.
(*tls_local_dynamic_base_64_<mode>): Likewise.
gcc/testsuite/
* gcc.target/i386/noplt-gd-1.c: New test.
* gcc.target/i386/noplt-gd-2.c: Likewise.
* gcc.target/i386/noplt-gd-3.c: Likewise.
* gcc.target/i386/noplt-ld-1.c: Likewise.
* gcc.target/i386/noplt-ld-2.c: Likewise.
* gcc.target/i386/noplt-ld-3.c: Likewise.
* lib/target-supports.exp
(check_effective_target_tls_get_addr_via_got): New.
From-SVN: r237765
* analyze_brprob.py: Parse and display average number
of loop iterations.
* cfgloop.c (flow_loop_dump): Dump average number of loop iterations.
* cfgloop.h: Change 'struct loop' to 'const struct loop' for a
few functions.
* cfgloopanal.c (expected_loop_iterations_unbounded): Set a new
argument to true if the expected number of iterations is
loop-based.
From-SVN: r237762
* configure.ac (HAVE_AS_GOTOF_IN_DATA): Use $as_ix86_gas_32_opt to
assemble for 32bit target.
(HAVE_AS_IX86_TLSGDPLT): Use $as_ix86_gas_32_opt to assemble
and $ld_ix86_gld_32_opt to link for 32bit target.
(HAVE_AS_IX86_TLSLDMPLT): Ditto.
* configure: Regenerate.
From-SVN: r237758
Since non-PIC noplt works on 32-bit x86 target now with assembler/linker
support, enable non-PIC noplt tests on 32-bit x86 target. main in
noplt-2.c and noplt-4.c are renamed to bar to avoid stack re-alignment
in main for 32-bit target, which disables tailcall optimization.
* gcc.target/i386/noplt-1.c: Don't disable for ia32. Scan for
ia32 if R_386_GOT32X relocation is supported.
* gcc.target/i386/noplt-3.c: Likewise.
* gcc.target/i386/noplt-2.c: Likewise.
(main): Renamed to ...
(bar): This.
* gcc.target/i386/noplt-4.c: Likewise.
(main): Renamed to ...
(bar): This.
* gcc.target/i386/pr67400-3.c: Don't disable for ia32.
* gcc.target/i386/pr67400-5.c: Likewise.
From-SVN: r237756
* call.c (magic_varargs_p): Return 3 for __builtin_*_overflow_p.
(build_over_call): For magic == 3, do no conversion only on 3rd
argument.
* c-c++-common/torture/builtin-arith-overflow-p-19.c: Run for C++ too.
* g++.dg/ext/builtin-arith-overflow-2.C: New test.
From-SVN: r237755
* internal-fn.c (expand_arith_set_overflow): New function.
(expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
Use it.
(expand_arith_overflow_result_store): Likewise. Handle precision
smaller than mode precision.
* tree-vrp.c (extract_range_basic): For imag part, handle
properly signed 1-bit precision result.
* doc/extend.texi (__builtin_add_overflow): Document that last
argument can't be pointer to enumerated or boolean type.
(__builtin_add_overflow_p): Document that last argument can't
have enumerated or boolean type.
* c-common.c (check_builtin_function_arguments): Require last
argument of BUILT_IN_*_OVERFLOW_P to have INTEGER_TYPE type.
Adjust wording of diagnostics for BUILT_IN_*_OVERLFLOW
if the last argument is pointer to enumerated or boolean type.
* c-c++-common/builtin-arith-overflow-1.c (generic_wrong_type, f3,
f4): Adjust expected diagnostics.
* c-c++-common/torture/builtin-arith-overflow.h (TP): New macro.
(T): If OVFP is defined, redefine to TP.
* c-c++-common/torture/builtin-arith-overflow-12.c: Adjust comment.
* c-c++-common/torture/builtin-arith-overflow-p-1.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-2.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-3.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-4.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-5.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-6.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-7.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-8.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-9.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-10.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-11.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-12.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-13.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-14.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-15.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-16.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-17.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-18.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-19.c: New test.
* g++.dg/ext/builtin-arith-overflow-1.C: Pass 0 instead of C
as last argument to __builtin_add_overflow_p.
From-SVN: r237754
[gcc]
2016-06-23 Michael Meissner <meissner@linux.vnet.ibm.com>
Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/predicates.md (splat_input_operand): Rework.
Don't allow constants, since the insns that use this predicate
don't support constants. Constants are handled by other insns
that are created via combine. During and after register
allocation, only allow indexed or indirect addresses, and not
general addresses. Only allow modes supported by the hardware.
* config/rs6000/rs6000.c (xxsplitb_constant_p): Update usage
comment. Move check for using VSPLTIS<x> to a common location,
instead of doing it in two different places.
[gcc/testsuite]
2016-06-23 Michael Meissner <meissner@linux.vnet.ibm.com>
Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/p9-splat-5.c: New test.
Co-Authored-By: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
From-SVN: r237743
* config/i386/driver-i386.c (host_detect_local_cpu): Set
PROCESSOR_PENTIUMPRO for signature_CENTAUR_ebx family >= 9.
<case PROCESSOR_PENTIMUMPRO>: Pass c7 or nehemiah for
signature_CENTAUR_ebx.
From-SVN: r237741