* gimplify.c (gimplify_scan_omp_clauses): Don't sorry_at on lastprivate
conditional on combined for simd.
* omp-low.c (struct omp_context): Add combined_into_simd_safelen0
member.
(lower_rec_input_clauses): For gimple_omp_for_combined_into_p max_vf 1
constructs, don't remove lastprivate_conditional_map, but instead set
ctx->combined_into_simd_safelen0 and adjust hash_map, so that it points
to parent construct temporaries.
(lower_lastprivate_clauses): Handle ctx->combined_into_simd_safelen0
like !ctx->lastprivate_conditional_map.
(lower_omp_1) <case GIMPLE_ASSIGN>: If up->combined_into_simd_safelen0,
use up->outer context instead of up.
* omp-expand.c (expand_omp_for_generic): Perform cond_var bump even if
gimple_omp_for_combined_p.
(expand_omp_for_static_nochunk): Likewise.
(expand_omp_for_static_chunk): Add forgotten cond_var bump that was
probably moved over into expand_omp_for_generic rather than being copied
there.
gcc/cp/
* cp-tree.h (CP_OMP_CLAUSE_INFO): Allow for any clauses up to _condvar_
instead of only up to linear.
gcc/testsuite/
* c-c++-common/gomp/lastprivate-conditional-2.c (foo): Don't expect
a sorry_at on any of the clauses.
libgomp/
* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-9.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-10.c: New test.
From-SVN: r271907
2019-06-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/90738
Revert
2019-06-03 Richard Biener <rguenther@suse.de>
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Get original
full reference tree and record in ref->ref.
(vn_reference_lookup_3): Pass in original ref to
ao_ref_init_from_vn_reference.
(vn_reference_lookup): Likewise.
* tree-ssa-sccvn.h (ao_ref_init_from_vn_reference): Adjust prototype.
* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p):
Handle non-decl bases in the original reference.
* gcc.dg/tree-ssa/alias-access-path-1.c: Scan fre1.
* gcc.dg/torture/pr90738.c: New testcase.
From-SVN: r271902
2019-06-04 Martin Liska <mliska@suse.cz>
* ipa-icf.c (sem_item_optimizer::add_item_to_class): Count
number of references.
(sem_item_optimizer::do_congruence_step):
(sem_item_optimizer::worklist_push): Dump how references
a class has.
(sem_item_optimizer::worklist_pop): Use heap.
(sem_item_optimizer::process_cong_reduction): Likewise.
* ipa-icf.h: Use fibonacci_heap insteam of std::list.
From-SVN: r271901
2019-06-04 Martin Liska <mliska@suse.cz>
* ipa-icf.h (struct sem_usage_pair_hash): New.
(sem_usage_pair_hash::hash): Likewise.
(sem_usage_pair_hash::equal): Likewise.
(struct sem_usage_hash): Likewise.
* ipa-icf.c (sem_item::sem_item): Initialize
referenced_by_count.
(sem_item::add_reference): Register a reference
in ref_map and not in target->usages.
(sem_item::setup): Remove initialization of
dead vectors.
(sem_item::~sem_item): Remove usage of dead vectors.
(sem_item::dump): Remove dump of references.
(sem_item_optimizer::sem_item_optimizer): Initialize
m_references.
(sem_item_optimizer::read_section): Remove useless
dump.
(sem_item_optimizer::parse_funcs_and_vars): Likewise here.
(sem_item_optimizer::build_graph): Pass m_references
to ::add_reference.
(sem_item_optimizer::verify_classes): Remove usage of dead
vectors.
(sem_item_optimizer::traverse_congruence_split): Return true
when a class is split.
(sem_item_optimizer::do_congruence_step_for_index): Use
hash_map for look up of (sem_item *, index). That brings
significant speed up.
(sem_item_optimizer::do_congruence_step): Return true
when a split is done.
(congruence_class::is_class_used): Use referenced_by_count.
2019-06-04 Martin Liska <mliska@suse.cz>
* c-c++-common/goacc/acc-icf.c: Change scanned pattern.
* gfortran.dg/goacc/pr78027.f90: Likewise.
From-SVN: r271900
Currently, the compiler already generates common symbols for type
descriptors, so the type descriptors are unique. However, when a
type is created through reflection, it is not deduplicated with
compiler-generated types. As a consequence, we cannot assume type
descriptors are unique, and cannot use pointer equality to
compare them. Also, when constructing a reflect.Type, it has to
go through a canonicalization map, which introduces overhead to
reflect.TypeOf, and lock contentions in concurrent programs.
In order for the reflect package to deduplicate types with
compiler-created types, we register all the compiler-created type
descriptors at startup time. The reflect package, when it needs
to create a type, looks up the registry of compiler-created types
before creates a new one. There is no lock contention since the
registry is read-only after initialization.
This lets us get rid of the canonicalization map, and also makes
it possible to compare type descriptors with pointer equality.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/179598
From-SVN: r271894
When the runtime collects a stack trace to associate it with some
profiling event (mem alloc, mutex, etc) there is a skip count passed
to runtime.Callers (or equivalent) to skip some known count of frames
in order to get to the "interesting" frame corresponding to the
profile event. Now that the profiling mechanism uses lazy fixup (when
removing compiler artifacts like thunks, morestack calls etc), we also
need to move the frame skipping logic after the fixup, so as to insure
that the skip count isn't thrown off by these artifacts.
Fixesgolang/go#32290.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/179740
From-SVN: r271892
This requires tracking all references to unexported variables, so that
we can make them global symbols in the object file, and can export
them so that other compilations can see the right definition for their
own inline bodies.
This introduces a syntax for referencing names defined in other
packages: a <pNN> prefix, where NN is the package index. This will
need to be added to gccgoimporter, but I didn't do it yet since it
isn't yet possible to create an object for which gccgoimporter will
see a <pNN> prefix.
This increases the number of inlinable functions in the standard
library from 181 to 215, adding functions like context.Background.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/177920
From-SVN: r271891
The "wg" constraint is used for the floating point side on mfpgpr
instructions. Those instructions do not exist on any relevant
hardware. This patch deletes the constraint and the insns using it.
* config/rs6000/constraints.md (define_register_constraint "wg"):
Delete.
* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Delete
RS6000_CONSTRAINT_wg.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Adjust.
(rs6000_init_hard_regno_mode_ok): Adjust.
* config/rs6000/rs6000.md (*mov<mode>_softfloat32, *movdi_internal64):
Delete "wg" alternatives.
* doc/md.texi (Machine Constraints): Adjust.
From-SVN: r271888
Improve the fix for PR64242. Various optimizations can change a memory
reference into a frame access. Given there are multiple virtual frame pointers
which may be replaced by multiple hard frame pointers, there are no checks for
writes to the various frame pointers. So updates to a frame pointer tends to
generate incorrect code. Improve the previous fix to also add clobbers of
several frame pointers and add a scheduling barrier. This should work in most
cases until GCC supports a generic "don't optimize across this instruction"
feature.
Bootstrap OK. Testcase passes on AArch64 and x86-64. Inspected x86, Arm,
Thumb-1 and Thumb-2 assembler which looks correct.
gcc/
PR middle-end/64242
* builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule
block.
(expand_builtin_nonlocal_goto): Likewise.
testsuite/
PR middle-end/64242
* gcc.c-torture/execute/pr64242.c: Update test.
From-SVN: r271870
A dynamic linker with lazy binding support may need to handle vector PCS
function symbols specially, so an ELF symbol table marking was
introduced for such symbols.
Function symbol references and definitions that follow the vector PCS
are marked in the generated assembly with .variant_pcs and then the
STO_AARCH64_VARIANT_PCS st_other flag is set on the symbol in the object
file. The marking is propagated to the dynamic symbol table by the
static linker so a dynamic linker can handle such symbols specially.
For this to work, the assembler, the static linker and the dynamic
linker has to be updated on a system. Old assembler does not support
the new .variant_pcs directive, so a toolchain with old binutils won't
be able to compile code that references vector PCS symbols.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_asm_output_alias): Declare.
(aarch64_asm_output_external): Declare.
* config/aarch64/aarch64.c (aarch64_asm_output_variant_pcs): New.
(aarch64_declare_function_name): Call aarch64_asm_output_variant_pcs.
(aarch64_asm_output_alias): New.
(aarch64_asm_output_external): New.
* config/aarch64/aarch64.h (ASM_OUTPUT_DEF_FROM_DECLS): Define.
(ASM_OUTPUT_EXTERNAL): Define.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pcs_attribute-2.c: New test.
* gcc.target/aarch64/torture/simd-abi-4.c: Check .variant_pcs support.
* lib/target-supports.exp (check_effective_target_aarch64_variant_pcs):
New.
From-SVN: r271869
In previous standards it is undefined for a container and its allocator
to have a different value_type. Libstdc++ has traditionally allowed it
as an extension, automatically rebinding the allocator to the
container's value_type. Since GCC 8.1 that extension has been disabled
for C++11 and later when __STRICT_ANSI__ is defined (i.e. for
-std=c++11, -std=c++14, -std=c++17 and -std=c++2a).
Since the acceptance of P1463R1 into the C++2a draft an incorrect
allocator::value_type now requires a diagnostic. This patch implements
that by enabling the static_assert for -std=gnu++2a as well.
* doc/xml/manual/status_cxx2020.xml: Document P1463R1 status.
* include/bits/forward_list.h [__cplusplus > 201703]: Enable
allocator::value_type assertion for C++2a.
* include/bits/hashtable.h: Likewise.
* include/bits/stl_deque.h: Likewise.
* include/bits/stl_list.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* testsuite/23_containers/deque/48101-3_neg.cc: New test.
* testsuite/23_containers/forward_list/48101-3_neg.cc: New test.
* testsuite/23_containers/list/48101-3_neg.cc: New test.
* testsuite/23_containers/map/48101-3_neg.cc: New test.
* testsuite/23_containers/multimap/48101-3_neg.cc: New test.
* testsuite/23_containers/multiset/48101-3_neg.cc: New test.
* testsuite/23_containers/set/48101-3_neg.cc: New test.
* testsuite/23_containers/unordered_map/48101-3_neg.cc: New test.
* testsuite/23_containers/unordered_multimap/48101-3_neg.cc: New test.
* testsuite/23_containers/unordered_multiset/48101-3_neg.cc: New test.
* testsuite/23_containers/unordered_set/48101-3_neg.cc: New test.
* testsuite/23_containers/vector/48101-3_neg.cc: New test.
From-SVN: r271866
Fix the alignment option parser to always allow up to 4 alignments.
Now -falign-functions=16:8:8:8 no longer reports an error.
gcc/
PR driver/90684
* opts.c (parse_and_check_align_values): Allow 4 alignment values.
M gcc/ChangeLog
M gcc/opts.c
From-SVN: r271864
Wilco pointed out that when the Dot Product instructions are available we can use them
to generate an even more efficient expansion for the [us]sadv16qi optab.
Instead of the current:
uabdl2 v0.8h, v1.16b, v2.16b
uabal v0.8h, v1.8b, v2.8b
uadalp v3.4s, v0.8h
we can generate:
(1) mov v4.16b, 1
(2) uabd v0.16b, v1.16b, v2.16b
(3) udot v3.4s, v0.16b, v4.16b
Instruction (1) can be CSEd across multiple such expansions and even hoisted outside of loops,
so when this sequence appears frequently back-to-back (like in x264_r) we essentially only have 2 instructions
per sum. Also, the UDOT instruction does the byte-to-word accumulation in one step, which allows us to use
the much simpler UABD instruction before it.
This makes it a shorter and lower-latency sequence overall for targets that support it.
* config/aarch64/iterators.md (MAX_OPP): New code attr.
* config/aarch64/aarch64-simd.md (*aarch64_<su>abd<mode>_3): Rename to...
(aarch64_<su>abd<mode>_3): ... This.
(<sur>sadv16qi): Add TARGET_DOTPROD expansion.
* gcc.target/aarch64/ssadv16qi.c: Add +nodotprod to pragma.
* gcc.target/aarch64/usadv16qi.c: Likewise.
* gcc.target/aarch64/ssadv16qi-dotprod.c: New test.
* gcc.target/aarch64/usadv16qi-dotprod.c: Likewise.
From-SVN: r271863
2019-06-03 Richard Biener <rguenther@suse.de>
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Get original
full reference tree and record in ref->ref.
(vn_reference_lookup_3): Pass in original ref to
ao_ref_init_from_vn_reference.
(vn_reference_lookup): Likewise.
* tree-ssa-sccvn.h (ao_ref_init_from_vn_reference): Adjust prototype.
* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p):
Handle non-decl bases in the original reference.
* gcc.dg/tree-ssa/alias-access-path-1.c: Scan fre1.
From-SVN: r271860
2019-06-03 Martin Liska <mliska@suse.cz>
* fold-const.c (operand_equal_p): Fix typo as compare_tree_int
returns 0 when operands are equal.
From-SVN: r271859
2019-06-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/90716
* tree-loop-distribution.c (destroy_loop): Process blocks in
correct order.
* gcc.dg/guality/pr90716.c: New testcase.
From-SVN: r271858
This patch fixes bug 90681. It was caused by trying to SLP vectorize a non
groupped load. We've fixed it by tweaking a bit the implementation: mark
masked loads as not vectorizable, but support them as an special case. Then
the detect them in the test for normal non-groupped loads that was already
there.
From-SVN: r271856
2019-06-02 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/90539
* trans-expr.c (gfc_conv_subref_array_arg): If the size of the
expression can be determined to be one, treat it as contiguous.
Set likelyhood of presence of an actual argument according to
PRED_FORTRAN_ABSENT_DUMMY and likelyhood of being contiguous
according to PRED_FORTRAN_CONTIGUOUS.
2019-06-02 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/90539
* predict.def (PRED_FORTRAN_CONTIGUOUS): New predictor.
2019-06-02 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/90539
* gfortran.dg/internal_pack_24.f90: New test.
From-SVN: r271844
We don't have support for -mcmodel={medium, large, kernel} so don't
expect tests for those things to work.
For now mark them as xfail where possible and skip where that isn't.
These changes will be logged onto the PR and therefore can be backed
out when the facility is implemented.
gcc/testsuite/ChangeLog:
2019-06-01 Iain Sandoe <iain@sandoe.co.uk>
PR target/90698
* gcc.target/i386/pr49866.c: XFAIL for Darwin.
* gcc.target/i386/pr63538.c: Likewise.
* gcc.target/i386/pr61599-1.c: Skip for Darwin.
From-SVN: r271839
* alias.c: Include ipa-utils.h.
(get_alias_set): Try to complete ODR type via ODR type hash lookup.
* ipa-devirt.c (prevailing_odr_type): New.
* ipa-utils.h (previaling_odr_type): Declare.
* g++.dg/lto/alias-1_0.C: New testcase.
* g++.dg/lto/alias-1_1.C: New testcase.
From-SVN: r271837
NOTE_INSN_DELETED_LABEL is used to mark what used to be a 'code_label',
but was not used for other purposes than taking its address which cannot
be used as target for indirect jumps.
Tested on Linux/x86-64 with -fcf-protection.
For x86-64 libc.so on glibc master branch (commit f43b8dd55588c3),
Before: 2961 endbr64
After: 2943 endbr64
gcc/
PR target/89355
* config/i386/i386-features.c (rest_of_insert_endbranch): Remove
NOTE_INSN_DELETED_LABEL check.
gcc/testsuite/
PR target/89355
* gcc.target/i386/cet-label-3.c: New test.
* gcc.target/i386/cet-label-4.c: Likewise.
* gcc.target/i386/cet-label-5.c: Likewise.
Co-Authored-By: Hongtao Liu <hongtao.liu@intel.com>
From-SVN: r271828