In r262961 I only updated the out-of-line copy of ceil_log2. This patch
applies the same change to the other (inline) one.
2018-07-30 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/86506
* hwint.h (ceil_log2): Resync with hwint.c implementation.
From-SVN: r263064
The idea behind the rclass loop in spill_hard_reg_in_range() seems to
be: find a hard_regno, which in general conflicts with reload regno,
but does not do so between `from` and `to`, and then do the live range
splitting based on this information. To check the absence of conflicts,
we make use of insn_bitmap, which does not contain insns which clobber
the hard_regno.
gcc/ChangeLog:
2018-07-30 Ilya Leoshkevich <iii@linux.ibm.com>
PR target/86547
* lra-constraints.c (spill_hard_reg_in_range): When selecting the
hard_regno, make sure no insn between `from` and `to` clobbers it.
From-SVN: r263063
Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.
This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.
2018-07-30 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
(nvptx_exec): Ensure worker and vector default dims don't exceed
targ_fn->max_threads_per_block.
From-SVN: r263062
The default dimensions are calculated using per-device properties, but
initialized once and used on all devices.
This patch fixes this problem by introducing per-device default dimensions.
2018-07-30 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field.
(nvptx_open_device): Init default_dims for device.
(nvptx_exec): Use default_dims from device.
From-SVN: r263061
Currently, if the user doesn't specify the number of workers for an openacc
region, the compiler hardcodes it to a default value.
This patch removes this functionality, such that the libgomp runtime can decide
on a default value.
2018-07-30 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
(PTX_DEFAULT_RUNTIME_DIM): ... this.
(nvptx_goacc_validate_dims): Set default worker and gang dims to
PTX_DEFAULT_RUNTIME_DIM.
(nvptx_dim_limit): Ignore GOMP_DIM_WORKER.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r263060
This makes it easier to compare cp_printer with gcc_cxxdiag_char_table
in c-format.c.
No functional change intended.
gcc/cp/ChangeLog:
* error.c (cp_printer): In the leading comment, move "%H" and "%I"
into alphabetical order, and add missing "%G" and "%K". Within
the switch statement, move cases 'G', 'H', 'I' and 'K' so that the
cases are in alphabetical order.
From-SVN: r263046
2018-07-27 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/constraints.md (wG constraint): Delete, no longer
used.
* config/rs6000/predicates.md (p9_fusion_reg_operand): Rename
predicate to reflect toc fusion has been deleted.
(toc_fusion_mem_raw): Delete, no longer used.
(toc_fusion_mem_wrapped): Likewise.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc
fusion mask bit.
* config/rs6000/rs6000-protos.h (fusion_wrap_memory_address):
Delete, no longer used.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields
meant to be used for toc fusion.
(rs6000_debug_print_mode): Delete toc fusion debugging.
(rs6000_debug_reg_global): Likewise.
(rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc
fusion and secondary reload support that were never used.
(rs6000_option_override_internal): Delete TOC fusion, that was only
partially defined, and it did not work unless you also used the
-mcmodel= switch.
(rs6000_legitimate_address_p): Delete TOC fusion support.
(rs6000_opt_masks): Likewise.
(fusion_wrap_memory_address): Delete function, no longer used.
(fusion_split_address); Delete TOC fusion support.
* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no
longer used with toc fusion being deleted.
(TARGET_TOC_FUSION_FP): Likewise.
* config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion
UNSPEC.
(toc fusion spliter): Delete TOC fusion support.
(toc_fusionload_<mode>): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_<mode>): Delete generator function, this insn no
longer needs to be named. Rename predicate to delete TOC fusion.
(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
(fusion_vsx_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
(fusion_vsx_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
(p9 fusion peephole2s): Rename predicate to delete TOC fusion.
From-SVN: r263039
When writing stack frames to the pprof CPU profile machinery, it is
very important to insure that the frames emitted do not contain any
frames corresponding to artifacts of the profiling process itself
(signal handlers, sigprof, etc). This patch changes runtime.sigprof to
strip out those frames from the raw stack generated by
"runtime.callers".
Fixesgolang/go#26595.
Reviewed-on: https://go-review.googlesource.com/126175
From-SVN: r263035
gcc/ChangeLog:
2018-07-27 Kelvin Nilsen <kelvin@gcc.gnu.org>
* doc/extend.texi (Basic PowerPC Built-in Functions Available on
ISA 2.05): Replace __uint128_t with __uint128 and __int128_t with
__int128 in built-in function prototypes.
(PowerPC AltiVec Built-in Functions on ISA 2.07): Likewise.
(PowerPC AltiVec Built-in Functions on ISA 3.0): Likewise.
From-SVN: r263033
gcc/ChangeLog:
PR tree-optimization/86696
* tree-ssa-strlen.c (get_min_string_length): Handle all integer
types, including enums.
(handle_char_store): Be prepared for the above function to fail.
gcc/testsuite/ChangeLog:
PR tree-optimization/86696
* gcc.dg/pr86696.C: New test.
From-SVN: r263032
CET kernel has been changed to place a restore token on shadow stack for
signal handler to enhance security. It is usually transparent to user
programs since kernel will pop the restore token when signal handler
returns. But when an exception is thrown from a signal handler, now
we need to remove _Unwind_Frames_Increment to pop the the restore token
from shadow stack. Otherwise, we get
FAIL: g++.dg/torture/pr85334.C -O0 execution test
FAIL: g++.dg/torture/pr85334.C -O1 execution test
FAIL: g++.dg/torture/pr85334.C -O2 execution test
FAIL: g++.dg/torture/pr85334.C -O3 -g execution test
FAIL: g++.dg/torture/pr85334.C -Os execution test
FAIL: g++.dg/torture/pr85334.C -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
PR libgcc/85334
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
Removed.
From-SVN: r263030
PR testsuite/86660
* testsuite/libgomp.c++/for-15.C (results): Include it in
omp declare target region.
(main): Use map (always, tofrom: results) instead of
map (tofrom: results).
From-SVN: r263011
PR middle-end/86660
* omp-low.c (scan_sharing_clauses): Don't ignore map clauses for
declare target to variables if they have always,{to,from,tofrom} map
kinds.
* testsuite/libgomp.c/pr86660.c: New test.
From-SVN: r263010
Cherry-pick compiler-rt revision 337603:
When shadow stack from Intel CET is enabled, the first instruction of all
indirect branch targets must be a special instruction, ENDBR.
lib/asan/asan_interceptors.cc has
...
int res = REAL(swapcontext)(oucp, ucp);
...
REAL(swapcontext) is a function pointer to swapcontext in libc. Since
swapcontext may return via indirect branch on x86 when shadow stack is
enabled, as in this case,
int res = REAL(swapcontext)(oucp, ucp);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This function may be
returned via an indirect branch.
Here compiler must insert ENDBR after call, like
call *bar(%rip)
endbr64
I opened an LLVM bug:
https://bugs.llvm.org/show_bug.cgi?id=38207
to add the indirect_return attribute so that it can be used to inform
compiler to insert ENDBR after REAL(swapcontext) call. We mark
REAL(swapcontext) with the indirect_return attribute if it is available.
This fixed:
https://bugs.llvm.org/show_bug.cgi?id=38249
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D49608
PR target/86560
* asan/asan_interceptors.cc (swapcontext) Cherry-pick
compiler-rt revision 337603.
* sanitizer_common/sanitizer_internal_defs.h (__has_attribute):
Likewise.
From-SVN: r263009
The throw_allocator extension depends on <tr1/random> which depends on
_GLIBCXX_USE_C99_STDINT_TR1.
The Transactional Memory support uses fixed-width integer types from
<stdint.h>.
* include/ext/throw_allocator.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(random_condition, throw_value_random, throw_allocator_random)
(std::hash<throw_value_random>): Do not define when <tr1/random> is
not usable.
* src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_C99_STDINT_TR1]: Do not
define transactional memory support when <stdint.h> is not usable.
From-SVN: r263004
std::__detail::__clp2 used uint_fast32_t and uint_fast64_t without
checking _GLIBCXX_USE_C99_STDINT_TR1 which was a potential bug. A
simpler implementation based on the new std::__ceil2 code performs
better and doesn't depend on <stdint.h> types.
std::align and other C++11 functions in <memory> where unnecessarily
missing when _GLIBCXX_USE_C99_STDINT_TR1 was not defined.
* include/bits/hashtable_policy.h (__detail::__clp2): Use faster
implementation that doesn't depend on <stdint.h> types.
* include/std/memory (align) [!_GLIBCXX_USE_C99_STDINT_TR1]: Use
std::size_t when std::uintptr_t is not usable.
[!_GLIBCXX_USE_C99_STDINT_TR1] (pointer_safety, declare_reachable)
(undeclare_reachable, declare_no_pointers, undeclare_no_pointers):
Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
From-SVN: r263003
The char16_t and char32_t types are automatically defined by the
compiler and do not depend on support in <stdint.h>. The char_traits
specializations depend on uint_leastNN_t but can be made to work anyway
by using the predefined macros, or as a last resort make_unsigned.
* include/bits/basic_string.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(hash<u16string>, hash<u32string>): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1.
* include/bits/char_traits.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(char_traits<char16_t>, char_traits<char32_t>): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1. Use __UINT_LEAST16_TYPE__ and
__UINT_LEAST32_TYPE__ or make_unsigned when <stdint.h> is not usable.
* include/bits/codecvt.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(codecvt<char16_t, char, mbstate_t>)
(codecvt<char32_t, char, mbstate_t>)
(codecvt_byname<char16_t, char, mbstate_t>)
(codecvt_byname<char32_t, char, mbstate_t>): Remove dependency
on _GLIBCXX_USE_C99_STDINT_TR1.
* include/bits/locale_facets.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(_GLIBCXX_NUM_UNICODE_FACETS): Likewise.
* include/bits/stringfwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(char_traits<char16_t>, char_traits<char32_t>)
(basic_string<char16_t>, basic_string<char32_t>): Remove dependency
on _GLIBCXX_USE_C99_STDINT_TR1.
* include/experimental/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
(u16string_view, u32string_view, hash<u16string_view>)
(hash<u32string_view>, operator""sv(const char16_t, size_t))
(operator""sv(const char32_t, size_t)): Likewise.
* include/ext/vstring.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(hash<__u16vstring>, hash<__u32vstring>): Likewise.
* include/ext/vstring_fwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(__u16vstring, __u16sso_string, __u16rc_string, __u32vstring)
(__u32sso_string, __u32rc_string): Likewise.
* include/std/codecvt [!_GLIBCXX_USE_C99_STDINT_TR1] (codecvt_mode)
(codecvt_utf8, codecvt_utf16, codecvt_utf8_utf16): Likewise.
* include/std/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
(u16string_view, u32string_view, hash<u16string_view>)
(hash<u32string_view>, operator""sv(const char16_t, size_t))
(operator""sv(const char32_t, size_t)): Likewise.
* src/c++11/codecvt.cc: Likewise.
* src/c++98/locale_init.cc: Likewise.
* src/c++98/localename.cc: Likewise.
From-SVN: r263002
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* lto-wrapper.c: Add linker_output as prefix
for ltrans_output_file.
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* libiberty.h (make_temp_file_with_prefix): New function.
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* make-temp-file.c (TEMP_FILE): Remove leading 'cc'.
(make_temp_file): Call make_temp_file_with_prefix with
first argument set to NULL.
(make_temp_file_with_prefix): Support also prefix.
From-SVN: r262999
Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.
This patch detects this situation, and errors out with a hint on how to fix it.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
sufficient resources to launch a kernel, and give a hint on how to fix
it.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r262997
Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds. This simplifies the default dimension
initialization code in nvptx_open.
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
max_threads_per_block and max_threads_per_multiprocessor fields.
(nvptx_open_device): Initialize new fields.
(nvptx_exec): Use num_sms, and new fields.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r262996
The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode. This is
completely wrong for non-pseudos. For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts. The value written to the DImode is not the same as that
written to the lowest-numbered QImode!
PR rtl-optimization/85805
* combine.c (reg_nonzero_bits_for_combine): Only use the last set
value for hard registers if that was written in the same mode.
From-SVN: r262994
2018-07-26 Martin Liska <mliska@suse.cz>
PR gcov-profile/86536
* gcov.c (format_gcov): Use printf format %.*f directly
and do not handle special values.
2018-07-26 Martin Liska <mliska@suse.cz>
PR gcov-profile/86536
* gcc.misc-tests/gcov-pr86536.c: New test.
From-SVN: r262991
In testcase lib-12.f90, all acc_async_test calls are placed in a location
where they are not guaranteed to succeed, which explains why there's an xfail
for the lower optimization levels.
This patch fixes the problem by moving the acc_async_test calls to the correct
locations.
Reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-fortran/lib-12.f90: Move acc_async_test calls
to correct locations. Remove xfail.
From-SVN: r262990
The purpose of the lib-13.f90 test-case is to test acc_wait_all_async. The
test indeed calls acc_wait_all_async, but then subsequentlys calls
acc_wait_all, so the acc_wait_all_async functionality is not tested.
Furthermore, all acc_async_test calls are placed in a location where they are
not guaranteed to succeed, which explains why there's an xfail for the lower
optimization levels.
This patch fixes the problems by replacing acc_wait_all with an acc_wait on
the async id used for the acc_wait_all_async call, and moving the
acc_async_test calls to the correct locations.
Reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-fortran/lib-13.f90: Replace acc_wait_all with
acc_wait. Move acc_async_test calls to correct locations. Remove
xfail.
From-SVN: r262989
PR libstdc++/86676
* testsuite/20_util/monotonic_buffer_resource/release.cc: Allow for
buffer being misaligned and so returned pointer not being at start.
From-SVN: r262980