On all targets I managed to test (21) this results in better code. Only
alpha ends up with slightly bigger code.
* combine.c (combine_simplify_rtx): Don't make IF_THEN_ELSE RTL.
From-SVN: r271047
If a string([]byte) conversion is used immediately as a key for a
map read, we don't need to copy the backing store of the byte
slice, as mapaccess does not keep a reference to it.
The gc compiler does more than this: it also avoids the copy if
the map key is a composite literal that contains the conversion
as a field, like, T{ ... { ..., string(b), ... }, ... }. For now,
we just optimize the simple case, which is probably most common.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/176197
* go.dg/mapstring.go: New test.
From-SVN: r271044
-mtpcs-leaf-frame causes an APCS-style backtrace frame to be created
on the stack. This should probably be deprecated, but it did reveal
an issue with the patch I committed previously to improve the code
generation when pushing high registers, in that
thumb_find_work_register had a different idea as to which registers
were available as scratch registers.
The new code actually does a better job of finding a viable work
register and doesn't rely so much on assumptions about the ABI, so it
seems better to adapt thumb_find_work_register to the new approach.
This way we can eliminate some rather crufty code.
gcc:
PR target/90405
* config/arm/arm.c (callee_saved_reg_p): Move before
thumb_find_work_register.
(thumb1_prologue_unused_call_clobbered_lo_regs): Move before
thumb_find_work_register. Only call df_get_live_out once.
(thumb1_epilogue_unused_call_clobbered_lo_regs): Likewise.
(thumb_find_work_register): Use
thumb1_prologue_unused_call_clobbered_lo_regs instead of ad hoc
algorithms to locate a spare call clobbered reg.
gcc/testsuite:
PR target/90405
* gcc.target/arm/pr90405.c: New test.
From-SVN: r271036
2019-05-09 Martin Liska <mliska@suse.cz>
* gimple-pretty-print.c (dump_binary_rhs): Dump MIN_EXPR
and MAX_EXPR in GIMPLE FE format.
2019-05-09 Martin Liska <mliska@suse.cz>
* gimple-parser.c (c_parser_gimple_statement): Support __MIN and
__MAX.
(c_parser_gimple_unary_expression): Parse also binary expression
__MIN and __MAX.
(c_parser_gimple_parentized_binary_expression): New function.
2019-05-09 Martin Liska <mliska@suse.cz>
* gcc.dg/gimplefe-39.c: New test.
From-SVN: r271035
2019-05-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/90395
* tree-ssa-forwprop.c (pass_forwprop::execute): Do not
rewrite vector stores that throw internally.
* gcc.dg/torture/pr90395.c: New testcase.
From-SVN: r271031
The recent trunk r270914 for PR89221 "--enable-frame-pointer does not work as
intended" fixed a scripting defect in the x86 '--enable-frame-pointer'
handling.
This has the side effect that, for example, for '--target=i686-gnu' this is now
enabled by default: 'USE_IX86_FRAME_POINTER=1' is added to 'tm_defines'. Given
that it's highly unlikely that anyone would now suddenly want
'--enable-frame-pointer' as the default for any kind of GNU system, I'm
changing the default back for GNU systems, to match that of a 'target_os' of
'linux* | darwin[8912]*'.
gcc/
PR target/89221
* configure.ac (--enable-frame-pointer): Disable by default for
GNU systems.
* configure: Regenerate.
From-SVN: r271028
This patch makes a number of corrections to rs6000_register_move_cost,
adds a new register union class, GEN_OR_VSX_REGS, and adjusts insn
alternative costs to suit.
The patch initially just corrected register move cost when direct
moves are available, but that resulted in regressions. Inspection of
those regressions showed ALL_REGS being used as the register allocno
class, which isn't ideal. gcc/doc/tm.texi says: "You should define a
class for the union of two classes whenever some instruction allows
both classes". Thus, define GEN_OR_VSX_REGS for the register
allocator. (IRA wants to use the union of two register classes when
the costs of the classes are below memory cost, which happens more
often with the low direct move cost.)
As per https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89271#c11 we ought
to be returning the minimal cost for union classes. That can be done
by rs6000_register_move_cost testing for vsx first, where the number
of regs for a given mode might be smaller than the same mode in gprs,
and changing the LINK_OR_CTR_REGS case to exclude SPEC_OR_GEN_REGS and
NON_FLOAT_REGS.
I removed the VECTOR_MEM_VSX_P test since that leads to silly results
for scalar mode moves between altivec and float when TARGET_VSX. eg.
rs6000_register_move_cost:, ret=2, mode=DF, from=FLOAT_REGS, to=FLOAT_REGS
rs6000_register_move_cost:, ret=16, mode=DF, from=FLOAT_REGS, to=ALTIVEC_REGS
rs6000_register_move_cost:, ret=2, mode=DF, from=FLOAT_REGS, to=VSX_REGS
The patch also fixes wrong results for moves within and between any of
the non-gpr, non-vsx special reg classes. The comment about "moving
between two similar registers is just one instruction" is false. We
can't move lr to ctr directly, for example. I believe the intent of
the "reg_classes_intersect_p (to, from)" was to cover moves within
float or altivec, so I moved that test inside the code handling vsx,
and made sure the intersection wasn't anything besides vsx by masking
off everything else. Masking isn't strictly necessary at the moment,
but would be if we create a GEN_OR_ALTIVEC_REGS class some time in the
future.
TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
as an allocno class. It is similar to the aarch64 version but without
any selection by regno mode if the best class is a union class.
PR target/89271
* config/rs6000/rs6000.h (enum reg_class, REG_CLASS_NAMES),
(REG_CLASS_CONTENTS): Add GEN_OR_VSX_REGS class.
* config/rs6000/rs6000.c (rs6000_register_move_cost): Correct
cost for general <-> vsx when direct moves are available.
Cost union classes at minimal cost for any reg in the class.
Correct calculation for moves between vsx, float, and altivec.
Don't return a low cost for moves between special regs. Don't
use hard coded register numbers.
(TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define.
(rs6000_ira_change_pseudo_allocno_class): New function.
* config/rs6000/rs6000.md (movsi_internal1, mov<mode>_internal),
(movdi_internal32, movdi_internal64): Remove '*' from vsx register
alternatives.
(movsi_internal1): Don't disparage vector alternatives.
(mov<mode>_internal): Likewise, excepting alternative that
will be split.
* config/rs6000/vsx.md (vsx_splat_<mode>_reg): Don't disparage
we <- b alternative.
From-SVN: r271022
If a string([]byte) conversion is used immediately in a string
comparison, we don't need to copy the backing store of the byte
slice, as the string comparison doesn't hold any reference to
it. Instead, just create a string header from the byte slice and
pass it for comparison.
A new type of expression, String_value_expression, is introduced,
for constructing string headers.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/170894
* go.dg/cmpstring.go: New test.
From-SVN: r271021
2019-05-08 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/90351
PR fortran/90329
* gfortran.dg/dump-parse-tree.c: Include version.h.
(gfc_dump_external_c_prototypes): New function.
(get_c_type_name): Select "char" as a name for a simple char.
Adjust to handling external functions. Also handle complex.
(write_decl): Add argument bind_c. Adjust for dumping of external
procedures.
(write_proc): Likewise.
(write_interop_decl): Add bind_c argument to call of write_proc.
* gfortran.h: Add prototype for gfc_dump_external_c_prototypes.
* lang.opt: Add -fc-prototypes-external flag.
* parse.c (gfc_parse_file): Move dumping of BIND(C) prototypes.
Call gfc_dump_external_c_prototypes if option is set.
* invoke.texi: Document -fc-prototypes-external.
From-SVN: r271018
The builtin copy function is lowered to runtime functions
slicecopy, stringslicecopy, or typedslicecopy. The first two are
basically thin wrappers of memmove. Instead of making a runtime
call, we can just use __builtin_memmove. This gives the compiler
backend opportunities for further optimizations.
Move the lowering of builtin copy function to flatten phase for
the ease of rewriting.
Also do this optimization for the copy part of append(s1, s2...).
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/170005
From-SVN: r271017
PR c++/59813
PR tree-optimization/89060
* tree-ssa-live.h (live_vars_map): New typedef.
(compute_live_vars, live_vars_at_stmt, destroy_live_vars): Declare.
* tree-ssa-live.c: Include gimple-walk.h and cfganal.h.
(struct compute_live_vars_data): New type.
(compute_live_vars_visit, compute_live_vars_1, compute_live_vars,
live_vars_at_stmt, destroy_live_vars): New functions.
* tree-tailcall.c: Include tree-ssa-live.h.
(live_vars, live_vars_vec): New global variables.
(find_tail_calls): Perform variable life analysis before punting.
(tree_optimize_tail_calls_1): Clean up live_vars and live_vars_vec.
* tree-inline.h (struct copy_body_data): Add eh_landing_pad_dest
member.
* tree-inline.c (add_clobbers_to_eh_landing_pad): Remove BB argument.
Perform variable life analysis to select variables that really need
clobbers added.
(copy_edges_for_bb): Don't call add_clobbers_to_eh_landing_pad here,
instead set id->eh_landing_pad_dest and assert it is the same.
(copy_cfg_body): Call it here if id->eh_landing_pad_dest is non-NULL.
* gcc.dg/tree-ssa/pr89060.c: New test.
From-SVN: r271013
This patch fixes a problem with the thumb1 prologue code where the link
register could be unconditionally used as a scratch register even if the
return value was still live at the end of the prologue.
Additionally, the patch improves the code generated when we are not
using many low call-saved registers to make use of any unused call
clobbered registers to help with the saving of high registers that
cannot be pushed directly (quite rare in normal code as the register
allocator correctly prefers low registers).
2019-05-08 Mihail Ionescu <mihail.ionescu@arm.com>
Richard Earnshaw <rearnsha@arm.com>
gcc:
PR target/88167
* config/arm/arm.c (thumb1_prologue_unused_call_clobbered_lo_regs): New
function.
(thumb1_epilogue_unused_call_clobbered_lo_regs): New function.
(thumb1_compute_save_core_reg_mask): Don't force a spare work
register if both the epilogue and prologue can use call-clobbered
regs.
(thumb1_unexpanded_epilogue): Use
thumb1_epilogue_unused_call_clobbered_lo_regs. Reverse the logic for
picking temporaries for restoring high regs to match that of the
prologue where possible.
(thumb1_expand_prologue): Add any usable call-clobbered low registers to
the list of work registers. Detect if the return address is still live
at the end of the prologue and avoid using it for a work register if so.
If the return address is not live, add LR to the list of pushable regs
after the first pass.
gcc/testsuite:
PR target/88167
* gcc.target/arm/pr88167-1.c: New test.
* gcc.target/arm/pr88167-2.c: New test.
Co-Authored-By: Richard Earnshaw <rearnsha@arm.com>
From-SVN: r271012
PR tree-optimization/90078
* tree-ssa-loop-ivopts.c (INFTY): Increase value for infinite cost.
(struct comp_cost): Promote type of members to int64_t.
(infinite_cost): Don't set complexity in initialization.
(comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
overflows to infinite_cost.
(adjust_setup_cost): Promote type of parameter and cost computation
to int64_t.
(struct ainc_cost_data, struct iv_ca): Promote type of member to
int64_t.
(get_scaled_computation_cost_at, determine_iv_cost): Promote type of
cost computation to int64_t.
(determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
int64_t's format specifier in dump.
gcc/testsuite
* g++.dg/tree-ssa/pr90078.C: New test.
From-SVN: r271008
PR tree-optimization/90240
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Scale cost
with respect to scaling factor pre-computed for each basic block.
(try_improve_iv_set): Return bool if best_cost equals to iv_ca cost.
(find_optimal_iv_set_1): Free iv_ca set if it has infinite_cost.
(COST_SCALING_FACTOR_BOUND, determine_scaling_factor): New.
(tree_ssa_iv_optimize_loop): Call determine_scaling_factor. Extend
live range for array of loop's basic blocks. Cleanup aux field of
loop's basic blocks.
gcc/testsuite
* gfortran.dg/graphite/pr90240.f: New test.
From-SVN: r271007
PR tree-optimization/90356
* match.pd ((X +/- 0.0) +/- 0.0): Optimize into X +/- 0.0 if possible.
* gcc.dg/tree-ssa/pr90356-1.c: New test.
* gcc.dg/tree-ssa/pr90356-2.c: New test.
* gcc.dg/tree-ssa/pr90356-3.c: New test.
* gcc.dg/tree-ssa/pr90356-4.c: New test.
From-SVN: r271001
For a direct interface type T with a value method M, its pointer
type (*T)'s method table includes a stub method of M which takes
a (*T) as the receiver instead of a T. However, for the "typ"
field of the method table entry, we added another layer of
indirection, which makes it appear to take a **T, which is wrong.
This causes problems when using reflect.Type.Method to get the
method. This CL fixes the second, incorrect, indirection.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/175837
From-SVN: r270999
Add a -fgo-debug-optimization option to emit optimization
diagnostics. This can be used for testing optimizations. Apply
this to the range clear optimizations of maps and arrays.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/170002
gcc/go:
* lang.opt (-fgo-debug-optimization): New option.
* go-c.h (struct go_create_gogo_args): Add debug_optimization
field.
* go-lang.c (go_langhook_init): Set debug_optimization field.
* gccgo.texi (Invoking gccgo): Document -fgo-debug-optimization.
gcc/testsuite:
* go.dg/arrayclear.go: New test.
* go.dg/mapclear.go: New test.
From-SVN: r270993
This change ensures that std::common_type<> is a complete type (LWG
2408), and that std::common_type<T>, std::common_type<cv T1, cv T2>, and
std::common_type<T1, T2, R...> will use program-defined specializations
for std::common_type<T1, T2> (LWG 2465).
The implementation of common_type<T1, T2, R...> is changed to use
void_t, and the specializations for duration and time_point are modified
to also use void_t instead of depending on implementation details of
common_type.
PR libstdc++/89102
* doc/xml/manual/intro.xml: Document DR 2408 and 2465 changes.
* include/std/chrono (__duration_common_type_wrapper): Replace with ...
(__duration_common_type): New helper.
(common_type<chrono::duration<R1, P2>, chrono::duration<R2, P2>>): Use
__duration_common_type.
(__timepoint_common_type_wrapper): Replace with ...
(__timepoint_common_type): New helper.
(common_type<chrono::time_point<C, D2>, chrono::time_point<C, D2>>):
Use __time_point_common_type.
* include/std/type_traits (common_type<>): Define, as per LWG 2408.
(__common_type_impl): If either argument is transformed by decay,
use the common_type of the decayed types.
(__common_type_impl<_Tp, _Up, _Tp, _Up>): If the types are already
decayed, use __do_common_type_impl to get the common_type.
(common_type<_Tp>): Use common_type<_Tp, _Tp>.
(__do_member_type_wrapper, __member_type_wrapper)
(__expanded_common_type_wrapper): Remove.
(__common_type_pack, __common_type_fold): New helpers.
(common_type<_Tp, _Up, _Vp...>): Use new helpers instead of
__member_type_wrapper and __expanded_common_type_wrapper.
* testsuite/20_util/common_type/requirements/explicit_instantiation.cc:
Test zero-length template argument list.
* testsuite/20_util/common_type/requirements/sfinae_friendly_1.cc:
Test single argument cases and argument types that should decay.
* testsuite/20_util/common_type/requirements/sfinae_friendly_2.cc:
Adjust expected error.
* testsuite/20_util/duration/literals/range_neg.cc: Use zero for
dg-error lineno.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
From-SVN: r270987
When fixing 90171 it struck me as undesirable to have so many separate
functions that all needed to know about the definition of a usual
deallocation function. So this patch condenses them into one. I left
destroying_delete_p because it is used by other files as well.
* call.c (struct dealloc_info): New.
(usual_deallocation_fn_p): Take a dealloc_info*.
(aligned_deallocation_fn_p, sized_deallocation_fn_p): Remove.
(build_op_delete_call): Adjust.
From-SVN: r270986
* typeck.c (build_static_cast_1): Use cp_build_addr_expr.
For GCC 9 I fixed this bug with a patch to gimplify_cond_expr, but this
function was also doing the wrong thing.
Using build_address does not push the ADDR_EXPR down into the arms of a
COND_EXPR, which we need for proper handling of conversion of an lvalue ?:
to another reference type.
From-SVN: r270985
There are a few things left in the rs6000 port that are unused now
that we do not support old reload anymore. This removes those.
* config/rs6000/rs6000-protos.h (rs6000_legitimize_reload_address_ptr):
Delete declaration.
* config/rs6000/rs6000.c (rs6000_legitimize_reload_address): Delete.
(rs6000_debug_legitimize_reload_address): Delete.
(rs6000_legitimize_reload_address_ptr): Delete.
(rs6000_option_override_internal): Adjust.
(mem_operand_gpr): Adjust comment.
(legitimate_lo_sum_address_p): Ditto.
(rs6000_legitimize_reload_address): Delete.
(rs6000_debug_legitimize_reload_address): Delete.
* config/rs6000/rs6000.h (LEGITIMIZE_RELOAD_ADDRESS): Delete.
From-SVN: r270983
gcc/ChangeLog:
2019-05-07 Kelvin Nilsen <kelvin@gcc.gnu.org>
PR target/89765
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
In handling of ALTIVEC_BUILTIN_VEC_INSERT, use modular arithmetic
to compute vector element selector for both constant and variable
operands.
gcc/testsuite/ChangeLog:
2019-05-07 Kelvin Nilsen <kelvin@gcc.gnu.org>
PR target/89765
* gcc.target/powerpc/pr89765-mc.c: New test.
* gcc.target/powerpc/vsx-builtin-10c.c: New test.
* gcc.target/powerpc/vsx-builtin-10d.c: New test.
* gcc.target/powerpc/vsx-builtin-11c.c: New test.
* gcc.target/powerpc/vsx-builtin-11d.c: New test.
* gcc.target/powerpc/vsx-builtin-12c.c: New test.
* gcc.target/powerpc/vsx-builtin-12d.c: New test.
* gcc.target/powerpc/vsx-builtin-13c.c: New test.
* gcc.target/powerpc/vsx-builtin-13d.c: New test.
* gcc.target/powerpc/vsx-builtin-14c.c: New test.
* gcc.target/powerpc/vsx-builtin-14d.c: New test.
* gcc.target/powerpc/vsx-builtin-15c.c: New test.
* gcc.target/powerpc/vsx-builtin-15d.c: New test.
* gcc.target/powerpc/vsx-builtin-16c.c: New test.
* gcc.target/powerpc/vsx-builtin-16d.c: New test.
* gcc.target/powerpc/vsx-builtin-17c.c: New test.
* gcc.target/powerpc/vsx-builtin-17d.c: New test.
* gcc.target/powerpc/vsx-builtin-18c.c: New test.
* gcc.target/powerpc/vsx-builtin-18d.c: New test.
* gcc.target/powerpc/vsx-builtin-19c.c: New test.
* gcc.target/powerpc/vsx-builtin-19d.c: New test.
* gcc.target/powerpc/vsx-builtin-20c.c: New test.
* gcc.target/powerpc/vsx-builtin-20d.c: New test.
* gcc.target/powerpc/vsx-builtin-9c.c: New test.
* gcc.target/powerpc/vsx-builtin-9d.c: New test.
From-SVN: r270982
* config/i386/i386.md (cvt_mnemonic): New mode attribute.
(ashr<mode>3_cvt): Merge insn pattern from ashrsi3_cvt and
ashrdi3_cvt using SWI48 mode iterator.
From-SVN: r270981