gcc/ada/
* exp_ch4.adb (Expand_N_Type_Conversion): Handle the case of
applying an invariant check for a conversion to a class-wide
type whose root type has a type invariant, when the conversion
appears within the immediate scope of the type and the
expression is of a specific tagged type.
* sem_ch3.adb (Is_Private_Primitive): New function to determine
whether a primitive subprogram is a private operation.
(Check_Abstract_Overriding): Enforce the restriction imposed by
AI12-0042 of requiring overriding of an inherited nonabstract
private operation when the ancestor has a class-wide type
invariant and the ancestor's private operation is visible.
(Derive_Subprogram): Set Requires_Overriding on a subprogram
inherited from a visible private operation of an ancestor to
which a Type_Invariant'Class expression applies.
The latest Solaris 11.4/x86 update uncovered a libsanitizer bug that
caused one test to FAIL for 32-bit:
+FAIL: c-c++-common/asan/null-deref-1.c -O0 output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O1 output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O2 output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -flto-partition=none
output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O3 -g output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -Os output pattern test
I've identified the problem and the fix has just landed in upstream
llvm-project:
https://reviews.llvm.org/D83664
Tested on i386-pc-solaris2.11 and x86_64-pc-linux.gnu.
libsanitizer:
* sanitizer_common/sanitizer_linux.cpp: Cherry-pick llvm-project
revision f0e9b76c3500496f8f3ea7abe6f4bf801e3b41e7.
CMPXCHG instruction sets ZF flag if the values in the destination operand
and EAX register are equal; otherwise the ZF flag is cleared and value
from destination operand is loaded to EAX. Following assembly:
movl %esi, %eax
lock cmpxchgl %edx, (%rdi)
cmpl %esi, %eax
sete %al
can be optimized by removing the unneeded comparison, since set ZF flag
signals that no update to EAX happened.
2020-15-07 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/95355
* config/i386/sync.md
(peephole2 to remove unneded compare after CMPXCHG): New pattern.
gcc/testsuite/ChangeLog:
PR target/95355
* gcc.target/i386/pr96189.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/alloc-1.F90: Use c_size_t to
avoid conversion on 32bit systems from 32bit to 64bit due
to -fdefault-integer-8.
I've missed
+FAIL: libgomp.c/loop-21.c execution test
during testing of the recent patch. The problem is that while
for the number of iterations computation it doesn't matter if we compute
min_inner_iterations as (m2 * first + n2 + (adjusted step) + m1 * first + n1) / step
or (m2 * last + n2 + (adjusted step) + m1 * last + n1) / step provided that
in the second case we use as factor (m1 - m2) * ostep / step rather than
(m2 - m1) * ostep / step, for the logical to actual iterator values computation
it does matter and in my hand written C implementations of all the cases (outer
vs. inner loop with increasing vs. decreasing iterator) I'm using the same computation
and it worked well for all the pseudo-random iterators testing it was doing.
It also means min_inner_iterations is misnamed, because it is not really
minimum number of inner iterations, whether the first or last outer iteration
results in the smaller or larger value of this can be (sometimes) only
determined at runtime.
So this patch also renames it to first_inner_iterations.
2020-07-15 Jakub Jelinek <jakub@redhat.com>
PR libgomp/96198
* omp-general.h (struct omp_for_data): Rename min_inner_iterations
member to first_inner_iterations, adjust comment.
* omp-general.c (omp_extract_for_data): Adjust for the above change.
Always use n1first and n2first to compute it, rather than depending
on single_nonrect_cond_code. Similarly, always compute factor
as (m2 - m1) * outer_step / inner_step rather than sometimes m1 - m2
depending on single_nonrect_cond_code.
* omp-expand.c (expand_omp_for_init_vars): Rename min_inner_iterations
to first_inner_iterations and min_inner_iterationsd to
first_inner_iterationsd.
cp_parser_declaration copies tokens to local variables, before inspecting
(some of) their fields. There's no need. Just point at them in the token
buffer -- they don't move. Also, we never look at the second token if the
first is EOF, so no need for some kind of dummy value in that case.
gcc/cp/
* parser.c (cp_parser_declaration): Avoid copying tokens.
(cp_parser_block_declaration): RAII token pointer.
gcc/ada/
* exp_aggr.adb (Flatten): Adjust description.
(Convert_To_Positional): Remove obsolete ??? comment and use
Compile_Time_Known_Value in the final test.
gcc/ada/
* par-ch4.adb (P_Iterated_Component_Association): Extended to
recognzize the similar Iterated_Element_Association. This node
is only generated when an explicit Key_Expression is given.
Otherwise the distinction between the two iterated forms is done
during semantic analysis.
* sinfo.ads: New node N_Iterated_Element_Association, for
Ada202x container aggregates. New field Key_Expression.
* sinfo.adb: Subprograms for new node and newn field.
* sem_aggr.adb (Resolve_Iterated_Component_Association): Handle
the case where the Iteration_Scheme is an
Iterator_Specification.
* exp_aggr.adb (Wxpand_Iterated_Component): Handle a component
with an Iterated_Component_Association, generate proper loop
using given Iterator_Specification.
* exp_util.adb (Insert_Axtions): Handle new node as other
aggregate components.
* sem.adb, sprint.adb: Handle new node.
* tbuild.adb (Make_Implicit_Loop_Statement): Handle properly a
loop with an Iterator_ specification.
gcc/ada/
* einfo.ads (Delayed Freezing and Elaboration): Adjust description.
* freeze.adb (Freeze_Object_Declaration): Likewise.
* sem_ch3.adb (Delayed_Aspect_Present): Likewise. Do not return
true for Alignment.
* sem_ch13.adb (Analyze_Aspect_Specifications): Do not always delay
for Alignment. Moreover, for Alignment and various Size aspects,
do not delay if the expression is an attribute whose prefix is the
Standard package.
gcc/ada/
* exp_spark.adb (Expand_SPARK_Delta_Or_Update): Apply scalar
range checks against the base type of an index type, not against
the index type itself.
gcc/ada/
* einfo.ads (Delayed Freezing and Elaboration): Minor tweaks.
Document the discrepancy between the aspect and the non-aspect
cases for alignment settings in object declarations.
gcc/ada/
* exp_ch3.adb (Freeze_Type): Remove warning in expander,
replaced by a corresponding error in sem_ch13.adb. Replace
RTE_Available by RTU_Loaded to avoid adding unnecessary
dependencies.
* sem_ch13.adb (Associate_Storage_Pool): New procedure.
(Analyze_Attribute_Definition_Clause
[Attribute_Simple_Storage_Pool| Attribute_Storage_Pool]): Call
Associate_Storage_Pool to add proper legality checks on
subpools.
gcc/ada/
* sem_ch3.adb (Delayed_Aspect_Present): Fix oversight in loop.
* freeze.adb (Freeze_Object_Declaration): Use Declaration_Node
instead of Parent for the sake of consistency.
gcc/ada/
* doc/gnat_ugn/about_this_guide.rst: Remove old section and
update for Ada 202x.
* doc/gnat_ugn/getting_started_with_gnat.rst: Add a system
requirements section. Remove obsolete section and minimal
rewording on the getting started section.
* gnat_ugn.texi: Regenerate.
gcc/ada/
* exp_ch5.adb (Expand_Assign_Array): Use short-circuit operator
(style).
* sem_res.adb (Resolve_Indexed_Component): Fix style in comment.
* sem_util.adb (Is_Effectively_Volatile_Object): Handle slices
just like indexed components; handle qualified expressions and
type conversions lie in Is_OK_Volatile_Context.
(Is_OK_Volatile_Context): Handle qualified expressions just like
type conversions.
gcc/ada/
* sem_prag.adb (Atomic_Components): Simplify with Ekind_In.
(Complex_Representation): Fix type of E_Id, which just like when
for pragma Atomic_Components will hold an N_Identifier node, not
an entity.
* sem_util.adb (Is_Effectively_Volatile): Refactor to avoid
unnecessary computation.
gcc/ada/
* exp_ch6.adb: Add a comma and fix a typo (machinary =>
machinery) in comment.
* exp_aggr.adb: Reformat, fix capitalization, and add a couple
of commas in a comment. Adjust columns in several code
fragments.
* sem_aggr.adb: Reformat and add a comma in a comment.
gcc/ada/
* sem_aggr.adb (Resolve_Iterated_Component_Association): New
procedure, internal to Resolve_Container_Aggregate, to complete
semantic analysis of Iterated_Component_Associations.
* exp_aggr.adb (Expand_Iterated_Component): New procedure,
internal to Expand_Container_Aggregate, to expand the construct
into an implicit loop that performs individual insertions into
the target aggregate.
gcc/ada/
* exp_ch6.adb (Make_Build_In_Place_Call_Allocator): Normalize
the associated node for internally generated objects to be like
their SOAAT counter-parts.
Parser error recovery can get confused by the tokens within a deferred
pragma, as treats those as regular tokens. This adjusts the recovery
so that the pragma is treated as a unit. Also, the preprocessor now
ensures that we never have an EOF token inside a pragma -- the pragma
is always closed first.
gcc/cp/
* parser.c (cp_parser_skip_to_closing_parenthesis_1): Deal with
meeting a deferred pragma.
(cp_parser_skip_to_end_of_statement): Likewise.
(cp_parser_skip_to_end_of_block_or_statement): Likewise.
(cp_parser_skip_to_pragma_eol): We should never meet EOF.
(cp_parser_omp_declare_simd): Likewise.
(cp_parser_omp_declare_reduction, cp_parser_oacc_routine)
(pragma_lex): Likewise.
gcc/testsuite/
* g++.dg/parse/pragma-recovery.C: New.
As the Fortran PR 95837 has been fixed, the test could be be added.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Remove unused
variables; add character(kind=4) tests; update TODO comment.
The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
intrinsics have an argument which must have a constant passed to it
and so use an inline version only for ifdef __OPTIMIZE__ and have
a #define for -O0. But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
don't need a constant argument, they are essentially the first
set with the constant added to them implicitly based on the comparison
name, and so there is no #define version for them (correctly).
But their inline versions are defined in between the first and s[ds]
set and so inside of ifdef __OPTIMIZE__, which means that with -O0
they aren't defined at all.
This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
use #define #endif block.
2020-07-15 Jakub Jelinek <jakub@redhat.com>
PR target/96174
* config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
_mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
_mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
_mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
_mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
_mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
_mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
_mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
_mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
section.
* gcc.target/i386/avx512f-vcmppd-3.c: New test.
* gcc.target/i386/avx512f-vcmpps-3.c: New test.
As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
The problem is that the builtin for the char/short cases has the arguments
promoted to int and combine gives up, because the instructions have
MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
when volatile_ok is false, and nothing afterwards optimizes the
(reg:SI a) = (zero_extend:SI (reg:QI a))
... (subreg:QI (reg:SI a) 0) ...
The following patch fixes it at expansion time, we already have a function
that is meant to undo the promotion, so this just adds the very common case
to that.
2020-07-15 Jakub Jelinek <jakub@redhat.com>
PR target/96176
* builtins.c: Include gimple-ssa.h, tree-ssa-live.h and
tree-outof-ssa.h.
(expand_expr_force_mode): If exp is a SSA_NAME with different mode
from MODE and get_gimple_for_ssa_name is a cast from MODE, use the
cast's rhs.
* gcc.target/i386/pr96176.c: New test.
For very small loops (< 6 insns), it would be fine to unroll 4
times to run fast with less latency and better cache usage. Like
below loops:
while (i) a[--i] = NULL; while (p < e) *d++ = *p++;
With this patch enhances, we could see some performance improvement
for some workloads(e.g. SPEC2017).
2020-07-13 Jiufu Guo <guojiufu@cn.ibm.com>
* config/rs6000/rs6000.c (rs6000_loop_unroll_adjust): Refine hook.
Fixed in r224162. That came without a test so adding this one.
Previously, we issued a bogus "too few arguments to function" error.
gcc/testsuite/ChangeLog:
PR c++/59978
* g++.dg/cpp0x/vt-59978.C: New test.
Replace glibc specific __glibc_unlikely with __builtin_expect.
PR target/95443
* gcc.target/i386/pr95443-1.c (simple_strstr): Replace
__glibc_unlikely with __builtin_expect.
convert_like issues errors about bad_p conversions at the beginning
of the function, but in the ck_ref_bind case, it only issues them
after we've called convert_like on the next conversion.
This doesn't work as expected since r10-7096 because when we see
a conversion from/to class type in a template, we return early, thereby
missing the error, and a bad_p conversion goes by undetected. That
made the attached test to compile even though it should not.
I had thought that I could just move the ck_ref_bind/bad_p errors
above to the rest of them, but that regressed diagnostics because
expr then wasn't converted yet by the nested convert_like_real call.
So, for bad_p conversions, do the normal processing, but still return
the IMPLICIT_CONV_EXPR to avoid introducing trees that the template
processing can't handle well. This I achieved by adding a wrapper
function.
gcc/cp/ChangeLog:
PR c++/95789
PR c++/96104
PR c++/96179
* call.c (convert_like_real_1): Renamed from convert_like_real.
(convert_like_real): New wrapper for convert_like_real_1.
gcc/testsuite/ChangeLog:
PR c++/95789
PR c++/96104
PR c++/96179
* g++.dg/conversion/ref4.C: New test.
* g++.dg/conversion/ref5.C: New test.
* g++.dg/conversion/ref6.C: New test.
movsi_from_sf uses rldimi instruction, which will cause the compiler to ICE
in 32 bit mode. This patch limits the recently added pattern and call to
TARGET_POWERPC64.
2020-07-14 David Edelsohn <dje.gcc@gmail.com>
gcc/ChangeLog
* config/rs6000/rs6000.md (rotldi3_insert_sf): Add TARGET_POWERPC64
condition.
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add
TARGET_POWERPC64 requirement to TARGET_P8_VECTOR case.
The handling of PCH is a little trick, because we have to deal with it before
allocating memory. I found the layering somewhat confusing. This patch
reorganizes that, so that the stopping of PCH is done in exactly one place,
and the ordering of lexer creation relative to that is much clearer.
I also changed the error message about multiple source files as with C++20,
'modules' means something rather specific.
Other than the error message changes, no functional changes.
gcc/cp/
* parser.c (cp_lexer_alloc): Do not deal with PCH here.
(cp_lexer_new_main): Deal with PCH here. Store the tokens directly
into the buffer.
(cp_lexer_new_from_tokens): Assert last token isn't purged either.
(cp_lexer_get_preprocessor_token): Change first arg to flags, adjust.
(cp_parser_new): Pass the lexer in, don't create it here.
(cp_parser_translation_unit): Initialize access checks here.
(cp_parser_initial_pragma): First token is provided by caller,
don't deal with PCH stopping here. Adjust error message.
(c_parse_file): Adjust, change error message to avoid C++20 module
confusion.
The version of nvprof in CUDA 9.0 causes a hang when used to profile an
OpenACC program. This is because it calls acc_get_device_type from
a callback called during device initialization, which then attempts
to acquire acc_device_lock while it is already taken, resulting in
deadlock. This works around the issue by returning acc_device_none
from acc_get_device_type without attempting to acquire the lock when
initialization has not completed yet.
2020-07-14 Tom de Vries <tom@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
New variable.
(acc_init_1): Set acc_init_thread to pthread_self (). Set
acc_init_state to initializing at the start, and to initialized at the
end.
(self_initializing_p): New function.
(acc_get_device_type): Return acc_device_none if called by thread that
is currently executing acc_init_1.
* libgomp.texi (acc_get_device_type): Update documentation.
(Implementation Status and Implementation-Defined Behavior): Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c: New.