2018-05-07 Edward Smith-Rowland <3dw4rd@verizon.net>
Moar PR libstdc++/80506
* include/bits/random.tcc (gamma_distribution::__generate_impl()):
Fix magic number used in loop condition.
From-SVN: r260001
The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.
Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.
For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.
This also aligns GCC with what LLVM does for Falkor.
Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.
2018-05-07 Luis Machado <luis.machado@linaro.org>
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
From-SVN: r259996
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.
* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.
I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.
The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.
2018-05-07 Luis Machado <luis.machado@linaro.org>
Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
From-SVN: r259995
2018-05-07 Tom de Vries <tom@codesourcery.com>
PR testsuite/85677
* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
include directory in ALWAYS_CFLAGS out of $blddir != "" condition.
From-SVN: r259992
PR c++/85659
* cfgexpand.c (expand_asm_stmt): Don't create a temporary if
the type is addressable. Don't force op into register if it has
BLKmode.
* g++.dg/ext/asm14.C: New test.
* g++.dg/ext/asm15.C: New test.
* g++.dg/ext/asm16.C: New test.
From-SVN: r259981
2018-05-06 Andrew Sadek <andrew.sadek.se@gmail.com>
* gcc.target/microblaze/others/picdtr.c: Add test for
-fPIE -mpic-data-is-text-relative.
From-SVN: r259975
gcc/
PR other/77609
* varasm.c (default_section_type_flags): Set SECTION_NOTYPE for
any section for which we don't know a specific type it should have,
regardless of name. Previously this was done only for the exact
names ".init_array", ".fini_array", and ".preinit_array".
(default_elf_asm_named_section): Add comment about
relationship with default_section_type_flags and SECTION_NOTYPE.
(get_section): Don't consider it a type conflict if one side has
SECTION_NOTYPE and the other doesn't, as long as neither has the
SECTION_BSS et al used in the default_section_type_flags logic.
From-SVN: r259969
2018-05-05 Paolo Carlini <paolo.carlini@oracle.com>
* cvt.c (ocp_convert): Early handle the special case of a
null_ptr_cst_p expr converted to a NULLPTR_TYPE_P type.
From-SVN: r259966
Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
is given, these optimizations are disabled.
With this flag, gccbrig can generate GENERIC that assumes we are
targeting a phsa-runtime based implementation, which allows us
to expose the work-item context accesses to retrieve WI IDs etc.
which helps optimizers.
First optimization that takes advantage of this is to get rid of
the setworkitemid calls whenever we have non-inlined calls that
use IDs internally.
Other optimizations added in this commit:
- expand absoluteid to similar level of simplicity as workitemid.
At the moment absoluteid is the best indexing ID to end up with
WG vectorization.
- propagate ID variables closer to their uses. This is mainly
to avoid known useless casts, which confuse at least scalar
evolution analysis.
- use signed long long for storing IDs. Unsigned integers have
defined wraparound semantics, which confuse at least scalar
evolution analysis, leading to unvectorizable WI loops.
- also refactor some BRIG function generation helpers to brig_function.
- no point in having the wi-loop as a for-loop. It's really
a do...while and SCEV can analyze it just fine still.
- add consts to ptrs etc. in BRIG builtin defs.
Improves optimization opportunities.
- add qualifiers to generated function parameters.
Const and restrict on the hidden local/private pointers,
the arg buffer and the context pointer help some optimizations.
From-SVN: r259957
HSA assumes all program scope HSAIL symbols can be queried from
the host runtime API, thus cannot be removed by the IPA.
Getting some inlining happening in the finalized binary required:
* explicitly marking the 'prog' scope functions and the launcher
function "externally_visible" to avoid the inliner removing it
* also the host_def ptr is set to externally visible, otherwise
IPA assumes it's never set
* adding the 'inline' keyword to functions to enable inlining,
otherwise GCC defaults to replaceable functions (one can link
over the previous one) which cannot be inlined
* replacing all calls to declarations with calls to definitions to
enable the inliner to find the definition
* to fix missing hidden argument types in the generated functions.
These were ignored silently until GCC started to be able to
inline calls to such functions.
* do not gimplify before fixing the call targets. Otherwise the
calls get detached and the definitions are not found. The reason
why this happens is not clear, but gimplifying only after call
target decl->def conversion fixes this.
From-SVN: r259943
We didn't preserve additional space for the alloca frame pointers that
are needed to be saved in the alloca space.
Fixes libgomp.c++/target-6.C execution test.
From-SVN: r259942
The ELFv1 ABI says: "Single precision floating point values are mapped
to the second word in a single doubleword" and also "Floating point
registers f1 through f13 are used consecutively to pass up to 13
floating point values, one member aggregates passed by value
containing a floating point value, and to pass complex floating point
values".
libffi wasn't expecting float args in the second word, and wasn't
passing one member aggregates in fp registers. This patch fixes those
problems, making use of the existing ELFv2 homogeneous aggregate
support since a one element fp struct is a special case of an
homogeneous aggregate.
I've also set a flag when returning pointers that might be used one
day. This is just a tidy since the ppc64 assembly support code
currently doesn't test FLAG_RETURNS_64BITS for integer types..
* src/powerpc/ffi_linux64.c (discover_homogeneous_aggregate):
Compile for ELFv1 too, handling single element aggregates.
(ffi_prep_cif_linux64_core): Call discover_homogeneous_aggregate
for ELFv1. Set FLAG_RETURNS_64BITS for FFI_TYPE_POINTER return.
(ffi_prep_args64): Call discover_homogeneous_aggregate for ELFv1,
and handle single element structs containing float or double
as if the element wasn't wrapped in a struct. Store floats in
second word of doubleword slot when big-endian.
(ffi_closure_helper_LINUX64): Similarly.
From-SVN: r259934
2018-05-04 Richard Biener <rguenther@suse.de>
PR middle-end/85627
* tree-complex.c (update_complex_assignment): We are always in SSA form.
(expand_complex_div_wide): Likewise.
(expand_complex_operations_1): Likewise.
(expand_complex_libcall): Preserve EH info of the original stmt.
(tree_lower_complex): Handle removed blocks.
* tree.c (build_common_builtin_nodes): Do not set ECF_NOTRHOW
on complex multiplication and division libcall builtins.
* g++.dg/torture/pr85627.C: New testcase.
From-SVN: r259923
2018-05-04 Richard Biener <rguenther@suse.de>
PR middle-end/85574
* fold-const.c (negate_expr_p): Restrict negation of operand
zero of a division to when we know that can happen without
overflow.
(fold_negate_expr_1): Likewise.
* gcc.dg/torture/pr85574.c: New testcase.
* gcc.dg/torture/pr57656.c: Use dg-additional-options.
From-SVN: r259922
PR libstdc++/85466
* real.h (real_nextafter): Declare.
* real.c (real_nextafter): New function.
* fold-const-call.c (fold_const_nextafter): New function.
(fold_const_call_sss): Call it for CASE_CFN_NEXTAFTER and
CASE_CFN_NEXTTOWARD.
(fold_const_call_1): For CASE_CFN_NEXTTOWARD call fold_const_call_sss
even when arg1_mode is different from arg0_mode.
* gcc.dg/nextafter-1.c: New test.
* gcc.dg/nextafter-2.c: New test.
* gcc.dg/nextafter-3.c: New test.
* gcc.dg/nextafter-4.c: New test.
From-SVN: r259921
In https://golang.org/cl/111097 the gc version of cmd/go was updated
to include some gofrontend-specific changes. The gofrontend code
already has different versions of those changes; this CL makes the
gofrontend match the upstream code.
Reviewed-on: https://go-review.googlesource.com/111099
From-SVN: r259918
Following a recent change for PR 82644 the non-standard hypergeomtric
functions are not defined by <cmath> when __STRICT_ANSI__ is defined
(e.g. for -std=c++17, or -std=c++14 -D__STDCPP_WANT_MATH_SPEC_FUNCS__).
That caused errors in <tr1/cmath> because the using-declarations for
tr1::hyperg et al are invalid in strict modes.
The solution is to define the TR1 hypergeometric functions inline in
<tr1/cmath> if __STRICT_ANSI__ is defined.
PR libstdc++/82644
* include/tr1/cmath [__STRICT_ANSI__] (hypergf, hypergl, hyperg): Use
inline definitions instead of using-declarations.
[__STRICT_ANSI__] (conf_hypergf, conf_hypergl, conf_hyperg): Likewise.
* testsuite/tr1/5_numerical_facilities/special_functions/
07_conf_hyperg/compile_cxx17.cc: New.
* testsuite/tr1/5_numerical_facilities/special_functions/
17_hyperg/compile_cxx17.cc: New.
From-SVN: r259912
On 32-bit targets any values over 4GB would wrap and produce the wrong
result.
PR libstdc++/85632 use uintmax_t for arithmetic
* src/filesystem/ops.cc (experimental::filesystem::space): Perform
arithmetic in result type.
* src/filesystem/std-ops.cc (filesystem::space): Likewise.
* testsuite/27_io/filesystem/operations/space.cc: Check total capacity
is greater than free space.
* testsuite/experimental/filesystem/operations/space.cc: New.
From-SVN: r259901