Whilst working on a backend patch, I noticed that the middle-end's
RTL optimizers weren't simplifying a truncation of a paradoxical
subreg extension, though it does transform closely related (more
complex) expressions. The main (first) part of this patch
implements this simplification, reusing much of the logic already
in place.
I briefly considered suggesting that it's difficult to provide a new
testcase for this change, but then realized the reviewer's response
would be that this type of transformation should be self-tested
in simplify-rtx, so this patch adds a bunch of tests that integer
extensions and truncations are simplified as expected. No good
deed goes unpunished and I was equally surprised to see that we
don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
i.e. useless no-op extensions to the same mode. So I've added
some logic to simplify (or more accurately prevent us generating
dubious RTL for) those.
2021-08-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* simplify-rtx.c (simplify_truncation): Generalize simplification
of (truncate:A (subreg:B X)).
(simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
already has the desired machine mode.
(test_scalar_int_ops): Add tests that useless extensions and
truncations are optimized away.
(test_scalar_int_ext_ops): New self-test function to confirm
that truncations of extensions are correctly simplified.
(test_scalar_int_ext_ops2): New self-test function to check
truncations of truncations, extensions of extensions, and
truncations of extensions.
(test_scalar_ops): Call the above two functions with a
representative sampling of integer machine modes.
This short patch teaches fold that it is "safe" to change the sign
of a left shift, to reduce the number of type conversions in gimple.
As an example:
unsigned int foo(unsigned int i) {
return (int)i << 8;
}
is currently optimized to:
unsigned int foo (unsigned int i)
{
int i.0_1;
int _2;
unsigned int _4;
<bb 2> [local count: 1073741824]:
i.0_1 = (int) i_3(D);
_2 = i.0_1 << 8;
_4 = (unsigned int) _2;
return _4;
}
with this patch, this now becomes:
unsigned int foo (unsigned int i)
{
unsigned int _2;
<bb 2> [local count: 1073741824]:
_2 = i_1(D) << 8;
return _2;
}
which generates exactly the same assembly language. Aside from the
reduced memory usage, the real benefit is that no-op conversions tend
to interfere with many folding optimizations. For example,
unsigned int bar(unsigned char i) {
return (i ^ (i<<16)) | (i<<8);
}
currently gets (tangled in conversions and) optimized to:
unsigned int bar (unsigned char i)
{
unsigned int _1;
unsigned int _2;
int _3;
int _4;
unsigned int _6;
unsigned int _8;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_5(D);
_2 = _1 * 65537;
_3 = (int) i_5(D);
_4 = _3 << 8;
_8 = (unsigned int) _4;
_6 = _2 | _8;
return _6;
}
but with this patch, bar now optimizes down to:
unsigned int bar(unsigned char i)
{
unsigned int _1;
unsigned int _4;
<bb 2> [local count: 1073741824]:
_1 = (unsigned int) i_3(D);
_4 = _1 * 65793;
return _4;
}
2021-08-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd (shift transformations): Change the sign of an
LSHIFT_EXPR if it reduces the number of explicit conversions.
gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-1.c: New test case.
* gcc.dg/fold-convlshift-2.c: New test case.
The following patch recognizes in the bswap pass (only there for now,
haven't done it for store merging pass yet) code sequences that can
be handled by (int32) __builtin_bswap64 (arg), i.e. where we have
0x05060708 n->n with 64-bit non-memory argument (if it is memory, we
can just load the 32-bit at 4 bytes into the address and n->n would
be 0x01020304; and only 64 -> 32 bit, because 64 -> 16 bit or 32 -> 16 bit
would mean only two bytes in the result and probably not worth it),
and furthermore the case where we have in the 0x0102030405060708 etc.
numbers some bytes 0 (i.e. known to contain zeros rather than source bytes),
as long as we have at least two original bytes in the right
positions (and no unknown bytes). This can be handled by
__builtin_bswap64 (arg) & 0xff0000ffffff00ffULL etc.
The latter change is the reason why counting the bswap messages doesn't work
too well in optimize-bswap* tests anymore, while the pass iterates from end
of basic block towards start, it will often match both the bswap at the end
and some of the earlier bswaps with some masks (not a problem generally,
we'll just DCE it away whenever possible). The pass right now doesn't
handle __builtin_bswap* calls in the pattern matching (which is the reason
why it operates backwards), but it uses FOR_EACH_BB_FN (bb, fun) order
of handling blocks and matched sequences can span multiple blocks, so I was
worried about cases like:
void bar (unsigned long long);
unsigned long long
foo (unsigned long long value, int x)
{
unsigned long long tmp = (((value & 0x00000000000000ffull) << 56)
| ((value & 0x000000000000ff00ull) << 40)
| ((value & 0x00000000ff000000ull) << 8));
if (x)
bar (tmp);
return (tmp
| ((value & 0x000000ff00000000ull) >> 8)
| ((value & 0x0000ff0000000000ull) >> 24)
| ((value & 0x0000000000ff0000ull) << 24)
| ((value & 0x00ff000000000000ull) >> 40)
| ((value & 0xff00000000000000ull) >> 56));
}
but it seems we handle even that fine, while bb2 ending in GIMPLE_COND
is processed first, we recognize there a __builtin_bswap64 (value) & mask1,
in the last bb we recognize tmp | (__builtin_bswap64 (value) & mask2) and
PRE optimizes that into t = __builtin_bswap64 (value); tmp = t & mask1;
in the first bb and return t; in the last one.
2021-08-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/86723
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Add
cast64_to_32 argument, set *cast64_to_32 to false, unless n is
non-memory permutation of 64-bit src which only has bytes of
0 or [5..8] and n->range is 4.
(find_bswap_or_nop): Add cast64_to_32 and mask arguments, adjust
find_bswap_or_nop_finalize caller, support bswap with some bytes
zeroed, as long as at least two bytes are not zeroed.
(bswap_replace): Add mask argument and handle masking of bswap
result.
(maybe_optimize_vector_constructor): Adjust find_bswap_or_nop
caller, punt if cast64_to_32 or mask is not all ones.
(pass_optimize_bswap::execute): Adjust find_bswap_or_nop_finalize
caller, for now punt if cast64_to_32.
* gcc.dg/pr86723.c: New test.
* gcc.target/i386/pr86723.c: New test.
* gcc.dg/optimize-bswapdi-1.c: Use -fdump-tree-optimized instead of
-fdump-tree-bswap and scan for number of __builtin_bswap64 calls.
* gcc.dg/optimize-bswapdi-2.c: Likewise.
* gcc.dg/optimize-bswapsi-1.c: Use -fdump-tree-optimized instead of
-fdump-tree-bswap and scan for number of __builtin_bswap32 calls.
* gcc.dg/optimize-bswapsi-5.c: Likewise.
* gcc.dg/optimize-bswapsi-3.c: Likewise. Expect one __builtin_bswap32
call instead of zero.
This replicates tree-eh.c in_array_bound_p into VNs
vn_reference_may_trap to fix hoisting of a possibly trapping
ARRAY_REF across a call that might not return.
2021-08-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/79334
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Record
a type also for COMPONENT_REFs.
(vn_reference_may_trap): Check ARRAY_REF with constant index
against the array domain.
* gcc.dg/torture/pr79334-0.c: New testcase.
* gcc.dg/torture/pr79334-1.c: Likewise.
The following patch emits DW_AT_location for global register variables
already during early dwarf, since usually late_global_decl hook isn't even
called for those, as nothing needs to be emitted for them.
2021-08-23 Jakub Jelinek <jakub@redhat.com>
PR debug/101905
* dwarf2out.c (gen_variable_die): Add DW_AT_location for global
register variables already during early_dwarf if possible.
* gcc.dg/guality/pr101905.c: New test.
This is a followup to Srinath's recent patch: the newly added test is
failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.
It is also failing on arm-eabi with R/M profile multilibs if the
execution engine does not support v8.1-M instructions.
The patch avoids this by adding check_effective_target_FUNC_multilib
in target-supports.exp which effectively checks whether the target
supports linking and execution, like what is already done for other
ARM effective targets. pr100856.c is updated to use it instead of
arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
duplicate with check_effective_target_FUNC_multilib).
In addition, I noticed that requiring MVE does not seem necessary and
this enables the test to pass even when targeting a CPU without MVE:
since the test does not involve actual CDE instructions, it can pass
on other architecture versions. For instance, when requiring MVE, we
have to use cortex-m55 under QEMU for the test to pass because the
memset() that comes from v8.1-m.main+mve multilib uses LOB
instructions (DLS) (memset is used during startup). Keeping
arm_v8_1m_main_cde_mve_ok would mean we would enable the test provided
we have the right multilibs, causing a runtime error if the simulator
does not support LOB instructions (e.g. when targeting cortex-m7).
I do not update sourcebuild.texi since the CDE effective targets are
already collectively documented.
Finally, the patch fixes two typos in comments.
2021-07-15 Christophe Lyon <christophe.lyon@foss.st.com>
PR target/100856
gcc/
* config/arm/arm.opt: Fix typo.
* config/arm/t-rmprofile: Fix typo.
gcc/testsuite/
* gcc.target/arm/acle/pr100856.c: Use arm_v8m_main_cde_multilib
and arm_v8m_main_cde.
* lib/target-supports.exp: Add
check_effective_target_FUNC_multilib for ARM CDE.
With strict: modifier on these clauses, the standard is explicit about
how many iterations (and which) each generated task of taskloop directive
should contain. For num_tasks it actually matches what we were already
implementing, but for grainsize it does not (and even violates the old
rule - without strict it requires that the number of iterations (unspecified
which exactly) handled by each generated task is >= grainsize argument and
< 2 * grainsize argument, with strict: it requires that each generated
task handles exactly == grainsize argument iterations, except for the
generated task handling the last iteration which can handles <= grainsize
iterations).
The following patch implements it for C and C++.
2021-08-23 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree.h (OMP_CLAUSE_GRAINSIZE_STRICT): Define.
(OMP_CLAUSE_NUM_TASKS_STRICT): Define.
* tree-pretty-print.c (dump_omp_clause) <case OMP_CLAUSE_GRAINSIZE,
case OMP_CLAUSE_NUM_TASKS>: Print strict: modifier.
* omp-expand.c (expand_task_call): Use GOMP_TASK_FLAG_STRICT in iflags
if either grainsize or num_tasks clause has the strict modifier.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_tasks,
c_parser_omp_clause_grainsize): Parse the optional strict: modifier.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_tasks,
cp_parser_omp_clause_grainsize): Parse the optional strict: modifier.
include/
* gomp-constants.h (GOMP_TASK_FLAG_STRICT): Define.
libgomp/
* taskloop.c (GOMP_taskloop): Handle GOMP_TASK_FLAG_STRICT.
* testsuite/libgomp.c-c++-common/taskloop-4.c (main): Fix up comment.
* testsuite/libgomp.c-c++-common/taskloop-5.c: New test.
When -mloongson-mmi is enabled, SHIFT_COUNT_TRUNCATED is turned off.
This causes untruncated immediate shift amount outputed into the asm,
and the GNU assembler refuses to assemble it.
Truncate immediate shift amount when outputing the asm instruction to
make GAS happy again.
gcc/
PR target/101922
* config/mips/mips-protos.h (mips_msa_output_shift_immediate):
Declare.
* config/mips/mips.c (mips_msa_output_shift_immediate): New
function.
* config/mips/mips-msa.md (vashl<mode>3, vashr<mode>3,
vlshr<mode>3): Call it.
gcc/testsuite/
PR target/101922
* gcc.target/mips/pr101922.c: New test.
2021-08-22 Jonathan Yong <10walls@gmail.com>
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/gcc_tmpnam.h: Fix tmpnam case on Windows
where it can return a filename with "\" to indicate current
directory.
* gcc.c-torture/execute/fprintf-2.c: Use wrapper.
* gcc.c-torture/execute/printf-2.c: Use wrapper.
* gcc.c-torture/execute/user-printf.c: Use wrapper.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
2021-08-22 Martin Uecker <muecker@gwdg.de>
gcc/c/
PR c/98397
* c-typeck.c (comp_target_types): Change pedwarn to pedwarn_c11
for pointers to arrays with qualifiers.
(build_conditional_expr): For C23 don't lose qualifiers for pointers
to arrays when the other pointer is a void pointer. Update warnings.
(convert_for_assignment): Update warnings for C2X when converting from
void* with qualifiers to a pointer to array with the same qualifiers.
gcc/testsuite/
PR c/98397
* gcc.dg/c11-qual-1.c: New test.
* gcc.dg/c2x-qual-1.c: New test.
* gcc.dg/c2x-qual-2.c: New test.
* gcc.dg/c2x-qual-3.c: New test.
* gcc.dg/c2x-qual-4.c: New test.
* gcc.dg/c2x-qual-5.c: New test.
* gcc.dg/c2x-qual-6.c: New test.
* gcc.dg/c2x-qual-7.c: New test.
* gcc.dg/pointer-array-quals-1.c: Remove unnecessary flag.
* gcc.dg/pointer-array-quals-2.c: Remove unnecessary flag.
... and add a minimum amount of offloading testing.
(Leaving aside that 'fwrite' to 'stderr' probably wouldn't work anyway) the
'fwrite' calls in 'libgomp/error.c:GOMP_warning', 'libgomp/error.c:GOMP_error'
drag in 'isatty', which isn't provided by my nvptx newlib build at present, so
we get, for example:
[...]
FAIL: libgomp.c/../libgomp.c-c++-common/declare_target-1.c (test for excess errors)
Excess errors:
unresolved symbol isatty
mkoffload: fatal error: [...]/build-gcc/./gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
[...]
..., and many more.
Fix up for recent commit 0d973c0a0d
"openmp: Implement the error directive".
libgomp/
* config/nvptx/error.c (fwrite, exit): Override, too.
* testsuite/libgomp.c-c++-common/error-1.c: Add a minimum amount
of offloading testing.
* testsuite/libgomp.fortran/error-1.f90: Likewise.
Since 'Remove obsolete IRIX 6.5 support' [1] we only use
gp-relative jump-tables for PIC code. We can fall back to
default behaviour for asm_function_rodata_section.
[1] https://gcc.gnu.org/ml/libstdc++/2012-03/msg00067.html
2018-06-04 Dragan Mladjenovic <dragan.mladjenovic@rt-rk.com>
gcc/
* config/mips/mips.c (mips_function_rodata_section,
TARGET_ASM_FUNCTION_RODATA_SECTION): Removed.
Special-casing checks for in-tree gas features is unnecessary since
r100007 which made configure-gcc depend on all-gas, and thus making
alternate code path in gcc_GAS_CHECK_FEATURE for in-tree gas
redundant.
Along the way this fixes PR 91602, which is caused by incorrect guess
of leb128 support presence in RISC-V.
First patch removes alternate code path in gcc_GAS_CHECK_FEATURE and
related code, the rest are further cleanups. Patches 2 and 3 in
series make no functional changes, thus configure is unchanged.
gcc/ChangeLog:
PR target/91602
* acinclude.m4 (_gcc_COMPUTE_GAS_VERSION, _gcc_GAS_VERSION_GTE_IFELSE)
(gcc_GAS_VERSION_GTE_IFELSE): Remove.
(gcc_GAS_CHECK_FEATURE): Do not handle in-tree case specially.
* configure.ac: Remove gcc_cv_gas_major_version, gcc_cv_gas_minor_version.
Remove remaining checks for in-tree assembler.
* configure: Regenerate.
gcc/
* config/h8300/h8300.c (shift_alg_hi): Improve arithmetic shift right
by 15 bits for H8/300H and H8/S. Improve logical shifts by 12
bits for H8/S.
(shift_alg_si): Improve arithmetic right shift by 28-30 bits for
H8/300H. Improve arithmetic shift right by 15 bits for H8/S.
Improve logical shifts by 27 bits for H8/S.
(get_shift_alg): Corresponding changes.
(h8300_option_override): Revert to loops for -Os when profitable.
Tests that depend on filesystem permissions FAIL if run on Windows or as
root. Add a helper function to detect those cases, so the tests can skip
those checks gracefully.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/90787
* testsuite/27_io/filesystem/iterators/directory_iterator.cc:
Use new __gnu_test::permissions_are_testable() function.
* testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
* testsuite/27_io/filesystem/operations/exists.cc: Likewise.
* testsuite/27_io/filesystem/operations/is_empty.cc: Likewise.
* testsuite/27_io/filesystem/operations/remove.cc: Likewise.
* testsuite/27_io/filesystem/operations/remove_all.cc: Likewise.
* testsuite/27_io/filesystem/operations/status.cc: Likewise.
* testsuite/27_io/filesystem/operations/symlink_status.cc:
Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/operations/exists.cc:
Likewise.
* testsuite/experimental/filesystem/operations/is_empty.cc:
Likewise.
* testsuite/experimental/filesystem/operations/remove.cc:
Likewise.
* testsuite/experimental/filesystem/operations/remove_all.cc:
Likewise.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise.
* testsuite/util/testsuite_fs.h (__gnu_test::permissions_are_testable):
New function to guess whether testing permissions will work.
This patch adds support for the 'll' (long double)
and 'w' (HOST_WIDE_INT) length modifiers to the
Fortran FE diagnostic function (gfc_error, gfc_warning, ...)
gcc/c-family/ChangeLog:
* c-format.c (gcc_gfc_length_specs): Add 'll' and 'w'.
(gcc_gfc_char_table): Add T9L_LL and T9L_ULL to
"di" and "u", respecitively; fill with BADLEN to match
size of 'types'.
(get_init_dynamic_hwi): Split off from ...
(init_dynamic_diag_info): ... here. Call it.
(init_dynamic_gfc_info): Call it.
gcc/fortran/ChangeLog:
* error.c
(error_uinteger): Take 'long long unsigned' instead
of 'long unsigned' as argumpent.
(error_integer): Take 'long long' instead of 'long'.
(error_hwuint, error_hwint): New.
(error_print): Update to handle 'll' and 'w'
length modifiers.
* simplify.c (substring_has_constant_len): Use '%wd'
in gfc_error.
This uses the group_id computed to ensure DRs in different BBs do
not get merged into a DR group. To achieve this we seed the
group from the BB index when group_ids are not computed and we
make sure to bump the group_id when advancing to the next BB for
BB SLP analysis.
This paves the way for relaxing the grouping for BB vectorization
by adjusting its group_id computation.
2021-08-20 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (dr_group_sort_cmp): Do not compare
BBs.
(vect_analyze_data_ref_accesses): Likewise. Assign the BB
index as group_id when dataref_groups were not computed.
* tree-vect-slp.c (vect_slp_bbs): Bump current_group when
we advace to the next BB.
This patch implements the error directive. Depending on clauses it is either
a compile time diagnostics (in that case diagnosed right away) or runtime
diagnostics (libgomp API call that diagnoses at runtime), and either fatal
or warning (error or warning at compile time or fatal error vs. error at
runtime) and either has no message or user supplied message (this kind of
e.g. deprecated attribute). The directive is also stand-alone directive
when at runtime while utility (thus disappears from the IL as if it wasn't
there for parsing like nothing directive) at compile time.
There are some clarifications in the works ATM, so this patch doesn't yet
require that for compile time diagnostics the user message must be a constant
string literal, there are uncertainities on what exactly is valid argument
of message clause (whether just const char * type, convertible to const char *,
qualified/unqualified const char * or char * or what else) and what to do
in templates. Currently even in templates it is diagnosed right away for
compile time diagnostics, if we'll need to substitute it, we'd need to queue
something into the IL, have pt.c handle it and diagnose only later.
2021-08-20 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-builtins.def (BUILT_IN_GOMP_WARNING, BUILT_IN_GOMP_ERROR): New
builtins.
gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_ERROR.
* c-pragma.c (omp_pragmas): Add error directive.
* c-omp.c (omp_directives): Uncomment error directive entry.
gcc/c/
* c-parser.c (c_parser_omp_error): New function.
(c_parser_pragma): Handle PRAGMA_OMP_ERROR.
gcc/cp/
* parser.c (cp_parser_handle_statement_omp_attributes): Determine if
PRAGMA_OMP_ERROR directive is C_OMP_DIR_STANDALONE.
(cp_parser_omp_error): New function.
(cp_parser_pragma): Handle PRAGMA_OMP_ERROR.
gcc/fortran/
* types.def (BT_FN_VOID_CONST_PTR_SIZE): New DEF_FUNCTION_TYPE_2.
* f95-lang.c (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST): Define.
gcc/testsuite/
* c-c++-common/gomp/error-1.c: New test.
* c-c++-common/gomp/error-2.c: New test.
* c-c++-common/gomp/error-3.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Add error directive test.
* g++.dg/gomp/attrs-2.C (bar): Add error directive test.
* g++.dg/gomp/attrs-13.C: New test.
* g++.dg/gomp/error-1.C: New test.
libgomp/
* libgomp.map (GOMP_5.1): Add GOMP_error and GOMP_warning.
* libgomp_g.h (GOMP_warning, GOMP_error): Declare.
* error.c (GOMP_warning, GOMP_error): New functions.
* testsuite/libgomp.c-c++-common/error-1.c: New test.
While working on error directive, I've noticed a few spots in OpenMP
parsing where we consume and don't diagnose superfluous commas at the end
(either of depend sink arguments or at the end of requires pragma).
2021-08-20 Jakub Jelinek <jakub@redhat.com>
gcc/c/
* c-parser.c (c_parser_omp_clause_depend_sink): Reject spurious
comma at the end of list.
(c_parser_omp_requires): Likewise.
gcc/cp/
* parser.c (cp_parser_omp_clause_depend_sink): Reject spurious
comma at the end of list. Don't parse closing paren here...
(cp_parser_omp_clause_depend): ... but here instead.
gcc/testsuite/
* c-c++-common/gomp/sink-5.c: New test.
* c-c++-common/gomp/requires-3.c: Add test for spurious comma
at the end of pragma line.
PR gcov-profile/89961
gcc/ChangeLog:
* gcov.c (make_gcov_file_name): Rewrite using std::string.
(mangle_name): Simplify, do not used the second argument.
(strip_extention): New function.
(get_md5sum): Likewise.
(get_gcov_intermediate_filename): Handle properly -p and -x
options.
(output_gcov_file): Use string type.
(generate_results): Likewise.
(md5sum_to_hex): Remove.
I noticed that the xx built-in functions (xxspltiw, xxspltidp, xxsplti32dx,
xxeval, xxblend, and xxpermx) were all defined in altivec.md. However, since
the XX instructions can take both traditional floating point and Altivec
registers, these built-in functions should be in vsx.md.
This patch just moves the insns from altivec.md to vsx.md.
I also moved the VM3 mode iterator and VM3_char mode attribute from altivec.md
to vsx.md, since the only use of these were for the XXBLEND insns.
2021-08-20 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/altivec.md (UNSPEC_XXEVAL): Move to vsx.md.
(UNSPEC_XXSPLTIW): Move to vsx.md.
(UNSPEC_XXSPLTID): Move to vsx.md.
(UNSPEC_XXSPLTI32DX): Move to vsx.md.
(UNSPEC_XXBLEND): Move to vsx.md.
(UNSPEC_XXPERMX): Move to vsx.md.
(VM3): Move to vsx.md.
(VM3_char): Move to vsx.md.
(xxspltiw_v4si): Move to vsx.md.
(xxspltiw_v4sf): Move to vsx.md.
(xxspltiw_v4sf_inst): Move to vsx.md.
(xxspltidp_v2df): Move to vsx.md.
(xxspltidp_v2df_inst): Move to vsx.md.
(xxsplti32dx_v4si_inst): Move to vsx.md.
(xxsplti32dx_v4sf): Move to vsx.md.
(xxsplti32dx_v4sf_inst): Move to vsx.md.
(xxblend_<mode>): Move to vsx.md.
(xxpermx): Move to vsx.md.
(xxpermx_inst): Move to vsx.md.
* config/rs6000/vsx.md (UNSPEC_XXEVAL): Move from altivec.md.
(UNSPEC_XXSPLTIW): Move from altivec.md.
(UNSPEC_XXSPLTID): Move from altivec.md.
(UNSPEC_XXSPLTI32DX): Move from altivec.md.
(UNSPEC_XXBLEND): Move from altivec.md.
(UNSPEC_XXPERMX): Move from altivec.md.
(VM3): Move from altivec.md.
(VM3_char): Move from altivec.md.
(xxspltiw_v4si): Move from altivec.md.
(xxspltiw_v4sf): Move from altivec.md.
(xxspltiw_v4sf_inst): Move from altivec.md.
(xxspltidp_v2df): Move from altivec.md.
(xxspltidp_v2df_inst): Move from altivec.md.
(xxsplti32dx_v4si_inst): Move from altivec.md.
(xxsplti32dx_v4sf): Move from altivec.md.
(xxsplti32dx_v4sf_inst): Move from altivec.md.
(xxblend_<mode>): Move from altivec.md.
(xxpermx): Move from altivec.md.
(xxpermx_inst): Move from altivec.md.
An issue with a backend patch I've been investigating has revealed
a missed optimization opportunity during GCC's vector lowering pass.
An unrecognized insn for "(set (reg:SI) (not:SI (const_int 0))"
revealed that not only was my expander not expecting a NOT with
a constant operand, but also that veclower was producing the
dubious tree expression ~0.
The attached patch replaces a call to gimple_build_assign with a
call to either gimplify_build1 or gimplify_build2 depending upon
whether the operation takes one or two operands. The net effect
is that where GCC previously produced the following optimized
gimple for testsuite/c-c++common/Wunused-var-16.c (notice the ~0
and the "& 0"):
void foo ()
{
V x;
V y;
vector(16) unsigned char _1;
unsigned char _7;
unsigned char _8;
y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
_7 = ~0;
_1 = {_7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7};
_8 = 0 & _7;
y_4 = {_8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8};
v = y_4;
return;
}
With this patch we now generate:
void foo ()
{
V x;
V y;
vector(16) unsigned char _1;
y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
_1 = { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 2
55, 255 };
y_4 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
v = y_4;
return;
}
2021-08-20 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* tree-vect-generic.c (expand_vector_operations_1): Use either
gimplify_build1 or gimplify_build2 instead of gimple_build_assign
when constructing scalar splat expressions.
gcc/testsuite/ChangeLog
* c-c++-common/Wunused-var-16.c: Add an extra check that ~0
is optimized away.
PR101849 shows we ICE on a test case when we pass a non __vector_pair *
pointer to the __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins
that is cast to __vector_pair *. The problem is that when we expand
the built-in, the cast has already been removed from gimple and we are
only given the base pointer. The solution used here (which fixes the ICE)
is to catch this case and convert the pointer to a __vector_pair * pointer
when expanding the built-in.
2021-08-19 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101849
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Cast
pointer to __vector_pair *.
gcc/testsuite/
PR target/101849
* gcc.target/powerpc/pr101849.c: New test.
It is intended that the default for the NeXT runtime at ABI 2 is to
check for nil message receivers. This updates this to match the
documented behaviour and to match the behaviour of the system tools.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/objc/ChangeLog:
* objc-next-runtime-abi-02.c (objc_next_runtime_abi_02_init):
Default receiver nilchecks on.
This adds my new SHOW_HEADERFILE option, and removes some obsolete
options.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in: Update to Doxygen 1.9.2