Commit Graph

1546 Commits

Author SHA1 Message Date
liuhongt
d61ce6ab04 Adjust testcase for O2 vectorization enabling
This issue was observed in rs6000 specific PR102658 as well.

I've looked into it a bit, it's caused by the "conditional store replacement" which
is originally disabled without vectorization as below code.

  /* If either vectorization or if-conversion is disabled then do
     not sink any stores.  */
  if (param_max_stores_to_sink == 0
      || (!flag_tree_loop_vectorize && !flag_tree_slp_vectorize)
      || !flag_tree_loop_if_convert)
    return false;

The new change makes the innermost loop look like

for (int c1 = 0; c1 <= 1499; c1 += 1) {
  if (c1 <= 500) {
     S_10(c0, c1);
  } else {
      S_9(c0, c1);
  }
  S_11(c0, c1);
}

and can not be splitted as:

for (int c1 = 0; c1 <= 500; c1 += 1)
  S_10(c0, c1);

for (int c1 = 501; c1 <= 1499; c1 += 1)
  S_9(c0, c1);

So instead of disabling vectorization, could we just disable this cs replacement
with parameter "--param max-stores-to-sink=0"?

I tested this proposal on ppc64le, it should work as well.

2021-10-11  Kewen Lin  <linkw@linux.ibm.com>

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-8.c: Add --param max-stores-to-sink=0.
2021-10-12 15:24:12 +08:00
GCC Administrator
732d763847 Daily bump. 2021-10-12 00:17:02 +00:00
Marcel Vollweiler
f70977936a libgomp: Add tests for omp_atv_serialized and deprecate omp_atv_sequential.
The variable omp_atv_sequential was replaced by omp_atv_serialized in OpenMP
5.1. This was already implemented by Jakub (C/C++, commit ea82325afe) and
Tobias (Fortran, commit fff15bad1a).

This patch adds two tests to check if omp_atv_serialized is available (one test
for C/C++ and one for Fortran). Besides that omp_atv_sequential is marked as
deprecated in C/C++ and Fortran for OpenMP 5.1.

libgomp/ChangeLog:

	* allocator.c (omp_init_allocator): Replace omp_atv_sequential with
	omp_atv_serialized.
	* omp.h.in: Add deprecated flag for omp_atv_sequential.
	* omp_lib.f90.in: Add deprecated flag for omp_atv_sequential.
	* testsuite/libgomp.c-c++-common/alloc-10.c: New test.
	* testsuite/libgomp.fortran/alloc-12.f90: New test.
2021-10-11 04:34:51 -07:00
Jakub Jelinek
07dd3bcda1 openmp: Add omp_set_num_teams, omp_get_max_teams, omp_[gs]et_teams_thread_limit
OpenMP 5.1 adds env vars and functions to set and query new ICVs used
as fallback if thread_limit or num_teams clauses aren't specified on
teams construct.

The following patch implements those, though further work will be needed:
1) OpenMP 5.1 also changed the num_teams clause, so that it can specify
   both lower and upper limit for how many teams should be created and
   changed the meaning when only one expression is provided, instead of
   num_teams(expr) in 5.0 meaning num_teams(1:expr) in 5.1, it now means
   num_teams(expr:expr), i.e. while previously we could create 1 to expr
   teams, in 5.1 we have some low limit by default equal to the single
   expression provided and may not create fewer teams.
   For host teams (which we don't currently implement efficiently for
   NUMA hosts) we trivially satisfy it now by always honoring what the
   user asked for, but for the offloading teams I think we'll need to
   rethink the APIs; currently teams construct is just a call that returns
   and possibly lowers the number of teams; and whenever possible we try
   to evaluate num_teams/thread_limit already on the target construct
   and the GOMP_teams call just sets the number of teams to the minimum
   of provided and requested teams; for some cases e.g. where target
   is not combined with teams and num_teams expression calls some functions
   etc., we need to call those functions in the target region and so it is
   late to figure number of teams, but also hw could just limit what it
   is willing to create; in that case I'm afraid we need to run the target
   body multiple times and arrange for omp_get_team_num () returning the
   right values
2) we need to finally implement the NUMA handling for GOMP_teams_reg
3) I now realize I haven't added some testcase coverage, will do that
   incrementally
4) libgomp.texi needs updates for these new APIs, but also others like
   the allocator

2021-10-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-low.c (omp_runtime_api_call): Handle omp_get_max_teams,
	omp_[sg]et_teams_thread_limit and omp_set_num_teams.
libgomp/
	* omp.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.f90.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* libgomp.h (gomp_nteams_var, gomp_teams_thread_limit_var): Declare.
	* libgomp.map (OMP_5.1): Export omp_get_max_teams{,_},
	omp_get_teams_thread_limit{,_}, omp_set_num_teams{,_,_8_} and
	omp_set_teams_thread_limit{,_,_8_}.
	* icv.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): New
	functions.
	* env.c (gomp_nteams_var, gomp_teams_thread_limit_var): Define.
	(omp_display_env): Print OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.
	(initialize_env): Handle OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env
	vars.
	* teams.c (GOMP_teams_reg): If thread_limit is not specified, use
	gomp_teams_thread_limit_var as fallback if not zero.  If num_teams
	is not specified, use gomp_nteams_var.
	* fortran.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Add
	ialias_redirect.
	(omp_set_num_teams_, omp_set_num_teams_8_, omp_get_max_teams_,
	omp_set_teams_thread_limit_, omp_set_teams_thread_limit_8_,
	omp_get_teams_thread_limit_): New functions.
2021-10-11 12:20:22 +02:00
GCC Administrator
c9db17b880 Daily bump. 2021-10-10 00:16:19 +00:00
liuhongt
b4e81f6dd4 Adjust more testcases for O2 vectorization enabling.
libgomp/ChangeLog:

	* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.
	* testsuite/libgomp.c++/scan-11.C: Ditto.
	* testsuite/libgomp.c++/scan-12.C: Ditto.
	* testsuite/libgomp.c++/scan-13.C: Ditto.
	* testsuite/libgomp.c++/scan-14.C: Ditto.
	* testsuite/libgomp.c++/scan-15.C: Ditto.
	* testsuite/libgomp.c++/scan-16.C: Ditto.
	* testsuite/libgomp.c++/scan-9.C: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: Ditto.
	* testsuite/libgomp.c/scan-11.c: Ditto.
	* testsuite/libgomp.c/scan-12.c: Ditto.
	* testsuite/libgomp.c/scan-13.c: Ditto.
	* testsuite/libgomp.c/scan-14.c: Ditto.
	* testsuite/libgomp.c/scan-15.c: Ditto.
	* testsuite/libgomp.c/scan-16.c: Ditto.
	* testsuite/libgomp.c/scan-17.c: Ditto.
	* testsuite/libgomp.c/scan-18.c: Ditto.
	* testsuite/libgomp.c/scan-19.c: Ditto.
	* testsuite/libgomp.c/scan-20.c: Ditto.
	* testsuite/libgomp.c/scan-21.c: Ditto.
	* testsuite/libgomp.c/scan-22.c: Ditto.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-ssa/pr94403.C: Add -fno-tree-vectorize
	* gcc.dg/optimize-bswapsi-5.c: Ditto.
	* gcc.dg/optimize-bswapsi-6.c: Ditto.
	* gcc.dg/Warray-bounds-51.c: Add additional option
	-mtune=generic for target x86/i?86
	* gcc.dg/Wstringop-overflow-14.c: Ditto.
2021-10-09 16:28:11 +08:00
Jakub Jelinek
875124eb08 openmp: Add support for OpenMP 5.1 structured-block-sequences
Related to this is the addition of structured-block-sequence in OpenMP 5.1,
which doesn't change anything for Fortran, but for C/C++ allows multiple
statements instead of just one possibly compound around the separating
directives (section and scan).

I've also made some updates to the OpenMP 5.1 support list in libgomp.texi.

2021-10-09  Jakub Jelinek  <jakub@redhat.com>

gcc/c/
	* c-parser.c (c_parser_omp_structured_block_sequence): New function.
	(c_parser_omp_scan_loop_body): Use it.
	(c_parser_omp_sections_scope): Likewise.
gcc/cp/
	* parser.c (cp_parser_omp_structured_block): Remove disallow_omp_attrs
	argument.
	(cp_parser_omp_structured_block_sequence): New function.
	(cp_parser_omp_scan_loop_body): Use it.
	(cp_parser_omp_sections_scope): Likewise.
gcc/testsuite/
	* c-c++-common/gomp/sections1.c (foo): Don't expect errors on
	multiple statements in between section directive(s).  Add testcases
	for invalid no statements in between section directive(s).
	* gcc.dg/gomp/sections-2.c (foo): Don't expect errors on
	multiple statements in between section directive(s).
	* g++.dg/gomp/sections-2.C (foo): Likewise.
	* g++.dg/gomp/attrs-6.C (foo): Add testcases for multiple
	statements in between section directive(s).
	(bar): Add testcases for multiple statements in between scan
	directive.
	* g++.dg/gomp/attrs-7.C (bar): Adjust expected error recovery.
libgomp/
	* libgomp.texi (OpenMP 5.1): Mention implemented support for
	structured block sequences in C/C++.  Mention support for
	unconstrained/reproducible modifiers on order clause.
	Mention partial (C/C++ only) support of extentensions to atomics
	construct.  Mention partial (C/C++ on clause only) support of
	align/allocator modifiers on allocate clause.
2021-10-09 10:14:36 +02:00
GCC Administrator
e3e07b8955 Daily bump. 2021-10-03 00:16:17 +00:00
Tobias Burnus
703d8a4d39 Add libgomp.fortran/order-reproducible-*.f90
libgomp/ChangeLog:

	* testsuite/libgomp.fortran/order-reproducible-1.f90: New test
	based on libgomp.c-c++-common/order-reproducible-1.c.
	* testsuite/libgomp.fortran/order-reproducible-2.f90: Likewise.
	* testsuite/libgomp.fortran/my-usleep.c: New test.
2021-10-02 11:29:35 +02:00
GCC Administrator
9d116bcc55 Daily bump. 2021-10-02 00:16:31 +00:00
Tobias Burnus
2a93d18da3 Add/update libgomp.fortran/alloc-*.f90
libgomp/ChangeLog:

	* testsuite/libgomp.fortran/alloc-10.f90: Fix alignment check.
	* testsuite/libgomp.fortran/alloc-7.f90: Fix array access.
	* testsuite/libgomp.fortran/alloc-8.f90: Likewise.
	* testsuite/libgomp.fortran/alloc-11.f90: New test for omp_realloc,
	based on libgomp.c-c++-common/alloc-9.c.
2021-10-01 20:03:25 +02:00
Jakub Jelinek
e705b8533a openmp: Differentiate between order(concurrent) and order(reproducible:concurrent)
While OpenMP 5.1 implies order(concurrent) is the same thing as
order(reproducible:concurrent), this is going to change in OpenMP 5.2, where
essentially order(concurrent) means nothing is stated on whether it is
reproducible or unconstrained (and is determined by other means, e.g. for/do
with schedule static or runtime with static being selected is implicitly
reproducible, distribute with dist_schedule static is implicitly reproducible,
loop is implicitly reproducible) and when the modifier is specified explicitly,
it overrides the implicit behavior either way.
And, when order(reproducible:concurrent) is used with e.g. schedule(dynamic)
or some other schedule that is by definition not reproducible, it is
implementation's duty to ensure it is reproducible, either by remembering how
it scheduled some loop and then replaying the same schedule when seeing loops
with the same directive/schedule/number of iterations, or by overriding the
schedule to some reproducible one.

This patch doesn't implement the 5.2 wording just yet, but in the FEs
differentiates between the 3 states - no explicit modifier, explicit reproducible
or explicit unconstrainted, so that the middle-end can easily switch any time.
Instead it follows the 5.1 wording where both order(concurrent) (implicit or
explicit) or order(reproducible:concurrent) imply reproducibility.
And, it implements the easier method, when for/do should be reproducible, it
just chooses static schedule.  order(concurrent) implies no OpenMP APIs in the
loop body nor threadprivate vars, so the exact scheduling isn't (easily at least)
observable.

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_ORDER_REPRODUCIBLE): Define.
	* tree-pretty-print.c (dump_omp_clause) <case OMP_CLAUSE_ORDER>: Print
	reproducible: for OMP_CLAUSE_ORDER_REPRODUCIBLE.
	* omp-general.c (omp_extract_for_data): If OMP_CLAUSE_ORDER is seen
	without OMP_CLAUSE_ORDER_UNCONSTRAINED, overwrite sched_kind to
	OMP_CLAUSE_SCHEDULE_STATIC.
gcc/c-family/
	* c-omp.c (c_omp_split_clauses): Also copy
	OMP_CLAUSE_ORDER_REPRODUCIBLE.
gcc/c/
	* c-parser.c (c_parser_omp_clause_order): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for explicit reproducible: modifier.
gcc/cp/
	* parser.c (cp_parser_omp_clause_order): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for explicit reproducible: modifier.
gcc/fortran/
	* gfortran.h (gfc_omp_clauses): Add order_reproducible bitfield.
	* dump-parse-tree.c (show_omp_clauses): Print REPRODUCIBLE: for it.
	* openmp.c (gfc_match_omp_clauses): Set order_reproducible for
	explicit reproducible: modifier.
	* trans-openmp.c (gfc_trans_omp_clauses): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for order_reproducible.
	(gfc_split_omp_clauses): Also copy order_reproducible.
gcc/testsuite/
	* gfortran.dg/gomp/order-5.f90: Adjust scan-tree-dump-times regexps.
libgomp/
	* testsuite/libgomp.c-c++-common/order-reproducible-1.c: New test.
	* testsuite/libgomp.c-c++-common/order-reproducible-2.c: New test.
2021-10-01 10:45:48 +02:00
Jakub Jelinek
3749c3aff6 openmp: Avoid PLT relocations for omp_* symbols in libgomp
This patch avoids the following relocations:
readelf -Wr libgomp.so.1.0.0 | grep omp_
00000000000470e0  0000020700000007 R_X86_64_JUMP_SLOT     000000000001d9d0 omp_fulfill_event@@OMP_5.0.1 + 0
0000000000047170  000000b800000007 R_X86_64_JUMP_SLOT     000000000000e760 omp_display_env@@OMP_5.1 + 0
00000000000471e0  000000e800000007 R_X86_64_JUMP_SLOT     000000000000f910 omp_get_initial_device@@OMP_4.5 + 0
0000000000047280  0000019500000007 R_X86_64_JUMP_SLOT     0000000000015940 omp_get_active_level@@OMP_3.0 + 0
00000000000472c8  0000020d00000007 R_X86_64_JUMP_SLOT     0000000000035210 omp_get_team_num@@OMP_4.0 + 0
00000000000472f0  0000014700000007 R_X86_64_JUMP_SLOT     0000000000035200 omp_get_num_teams@@OMP_4.0 + 0
by using ialias{,_call,_redirect} macros as needed.

We still have many acc_* PLT relocations, could somebody please fix those?
readelf -Wr libgomp.so.1.0.0 | grep acc_
0000000000046fb8  000001ed00000006 R_X86_64_GLOB_DAT      0000000000036350 acc_prof_unregister@@OACC_2.5.1 + 0
0000000000046fd8  000000a400000006 R_X86_64_GLOB_DAT      0000000000035f30 acc_prof_register@@OACC_2.5.1 + 0
0000000000046fe0  000001d100000006 R_X86_64_GLOB_DAT      0000000000035ee0 acc_prof_lookup@@OACC_2.5.1 + 0
0000000000047058  000001dd00000007 R_X86_64_JUMP_SLOT     0000000000031f40 acc_create_async@@OACC_2.5 + 0
0000000000047068  0000011500000007 R_X86_64_JUMP_SLOT     000000000002fc60 acc_get_property@@OACC_2.6 + 0
0000000000047070  000001fb00000007 R_X86_64_JUMP_SLOT     0000000000032ce0 acc_wait_all@@OACC_2.0 + 0
0000000000047080  0000006500000007 R_X86_64_JUMP_SLOT     000000000002f990 acc_on_device@@OACC_2.0 + 0
0000000000047088  000000ae00000007 R_X86_64_JUMP_SLOT     0000000000032140 acc_attach_async@@OACC_2.6 + 0
0000000000047090  0000021900000007 R_X86_64_JUMP_SLOT     000000000002f550 acc_get_device_type@@OACC_2.0 + 0
0000000000047098  000001cb00000007 R_X86_64_JUMP_SLOT     0000000000032090 acc_copyout_finalize@@OACC_2.5 + 0
00000000000470a8  0000005200000007 R_X86_64_JUMP_SLOT     0000000000031f80 acc_copyin@@OACC_2.0 + 0
00000000000470b8  000001ad00000007 R_X86_64_JUMP_SLOT     0000000000032030 acc_delete_finalize@@OACC_2.5 + 0
00000000000470e8  0000010900000007 R_X86_64_JUMP_SLOT     0000000000031f00 acc_create@@OACC_2.0 + 0
00000000000470f8  0000005900000007 R_X86_64_JUMP_SLOT     0000000000032b70 acc_wait_async@@OACC_2.0 + 0
0000000000047110  0000013100000007 R_X86_64_JUMP_SLOT     0000000000032860 acc_async_test@@OACC_2.0 + 0
0000000000047118  000001ff00000007 R_X86_64_JUMP_SLOT     000000000002f720 acc_get_device_num@@OACC_2.0 + 0
0000000000047128  0000019100000007 R_X86_64_JUMP_SLOT     0000000000032020 acc_delete_async@@OACC_2.5 + 0
0000000000047130  000001d200000007 R_X86_64_JUMP_SLOT     000000000002efa0 acc_shutdown@@OACC_2.0 + 0
0000000000047150  000000d000000007 R_X86_64_JUMP_SLOT     0000000000031f00 acc_present_or_create@@OACC_2.0 + 0
0000000000047188  0000019200000007 R_X86_64_JUMP_SLOT     0000000000031910 acc_is_present@@OACC_2.0 + 0
0000000000047190  000001aa00000007 R_X86_64_JUMP_SLOT     000000000002fca0 acc_get_property_string@@OACC_2.6 + 0
00000000000471d0  000001bf00000007 R_X86_64_JUMP_SLOT     0000000000032120 acc_update_self_async@@OACC_2.5 + 0
0000000000047200  0000020500000007 R_X86_64_JUMP_SLOT     0000000000032e00 acc_wait_all_async@@OACC_2.0 + 0
0000000000047208  000000a600000007 R_X86_64_JUMP_SLOT     0000000000031790 acc_deviceptr@@OACC_2.0 + 0
0000000000047218  0000007500000007 R_X86_64_JUMP_SLOT     0000000000032000 acc_delete@@OACC_2.0 + 0
0000000000047238  000001e900000007 R_X86_64_JUMP_SLOT     000000000002f3a0 acc_set_device_type@@OACC_2.0 + 0
0000000000047240  000001f600000007 R_X86_64_JUMP_SLOT     000000000002ef20 acc_init@@OACC_2.0 + 0
0000000000047248  0000018800000007 R_X86_64_JUMP_SLOT     0000000000032060 acc_copyout@@OACC_2.0 + 0
0000000000047258  0000021f00000007 R_X86_64_JUMP_SLOT     0000000000032a80 acc_wait@@OACC_2.0 + 0
0000000000047270  000001bc00000007 R_X86_64_JUMP_SLOT     0000000000032100 acc_update_self@@OACC_2.0 + 0
0000000000047288  0000011400000007 R_X86_64_JUMP_SLOT     0000000000032080 acc_copyout_async@@OACC_2.5 + 0
0000000000047290  0000013d00000007 R_X86_64_JUMP_SLOT     000000000002f850 acc_set_device_num@@OACC_2.0 + 0
00000000000472a8  000000c500000007 R_X86_64_JUMP_SLOT     00000000000320e0 acc_update_device_async@@OACC_2.5 + 0
00000000000472c0  0000014600000007 R_X86_64_JUMP_SLOT     0000000000031fc0 acc_copyin_async@@OACC_2.5 + 0
00000000000472f8  0000006a00000007 R_X86_64_JUMP_SLOT     000000000002f310 acc_get_num_devices@@OACC_2.0 + 0
0000000000047350  0000021700000007 R_X86_64_JUMP_SLOT     0000000000031f80 acc_present_or_copyin@@OACC_2.0 + 0
0000000000047360  0000020900000007 R_X86_64_JUMP_SLOT     00000000000320c0 acc_update_device@@OACC_2.0 + 0
0000000000047380  0000008400000007 R_X86_64_JUMP_SLOT     0000000000032950 acc_async_test_all@@OACC_2.0 + 0

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

	* affinity-fmt.c (omp_get_team_num, omp_get_num_teams): Add
	ialias_redirect.
	* env.c (handle_omp_display_env): Use ialias_call.
	* icv-device.c: Move ialias right below each function.
	(omp_get_device_num): Use ialias_call.
	* fortran.c (omp_fulfill_event): Add ialias_redirect.
	* icv.c (omp_get_active_level): Add ialias_redirect.
2021-10-01 10:42:07 +02:00
Jakub Jelinek
998e434f8f openmp: Add alloc_align attribute to omp_aligned_*alloc and testcase for omp_realloc
This patch adds alloc_align attribute to omp_aligned_{,c}alloc so that if
the first argument is constant, GCC can assume requested alignment.

Additionally, it adds testsuite coverage for omp_realloc which I haven't
managed to write in the patch from yesterday.

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

	* omp.h.in (omp_aligned_alloc, omp_aligned_calloc): Add
	__alloc_align__ (1) attribute.
	* testsuite/libgomp.c-c++-common/alloc-9.c: New test.
2021-10-01 10:32:10 +02:00
GCC Administrator
2467998373 Daily bump. 2021-10-01 00:16:27 +00:00
Tobias Burnus
ef37ddf477 libgomp.fortran/alloc-*.f90: Add missing dg-prune-output
libgomp/
	* testsuite/libgomp.fortran/alloc-7.f90: Add dg-prune-output
	for -fintrinsic-modules-path= warning of the C compiler.
	* testsuite/libgomp.fortran/alloc-9.f90: Likewise.
	* testsuite/libgomp.fortran/alloc-10.f90: Likewise.
2021-09-30 14:44:06 +02:00
Tobias Burnus
70de20db23 openmp: Add omp_aligned_{,c}alloc and omp_{c,re}alloc for Fortran
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Add omp_aligned_{,c}alloc and
	omp_{c,re}alloc, fix omp_alloc/omp_free.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.1): Set implementation status to Y for
	omp_aligned_{,c}alloc and omp_{c,re}alloc routines.
	* omp_lib.f90.in (omp_aligned_alloc, omp_aligned_calloc, omp_calloc,
	omp_realloc): Add.
	* omp_lib.h.in (omp_aligned_alloc, omp_aligned_calloc, omp_calloc,
	omp_realloc): Add.
	* testsuite/libgomp.fortran/alloc-10.f90: New test.
	* testsuite/libgomp.fortran/alloc-6.f90: New test.
	* testsuite/libgomp.fortran/alloc-7.c: New test.
	* testsuite/libgomp.fortran/alloc-7.f90: New test.
	* testsuite/libgomp.fortran/alloc-8.f90: New test.
	* testsuite/libgomp.fortran/alloc-9.f90: New test.
2021-09-30 14:26:46 +02:00
Jakub Jelinek
b38a4bd102 openmp: Add omp_aligned_{,c}alloc and omp_{c,re}alloc
This patch adds new OpenMP 5.1 allocator entrypoints and in addition to that
fixes an omp_alloc bug which is hard to test for - if the first allocator
fails but has a larger alignment trait and has a fallback allocator, either
the default behavior or a user fallback, then the extra alignment will be used
even in the fallback allocation, rather than just starting with whatever
alignment has been requested (in GOMP_alloc or the minimum one in omp_alloc).

Jonathan's comment on IRC this morning made me realize that I should add
alloc_align attributes to 2 of the prototypes and I still need to add testsuite
coverage for omp_realloc, will do that in a follow-up.

2021-09-30  Jakub Jelinek  <jakub@redhat.com>

	* omp.h.in (omp_aligned_alloc, omp_calloc, omp_aligned_calloc,
	omp_realloc): New prototypes.
	(omp_alloc): Move after omp_free prototype, add __malloc__ (omp_free)
	attribute.
	* allocator.c: Include string.h.
	(omp_aligned_alloc): No longer static, add ialias.  Add new_alignment
	variable and use it instead of alignment so that when retrying the old
	alignment is used again.  Don't retry if new alignment is the same
	as old alignment, unless allocator had pool size.
	(omp_alloc, GOMP_alloc, GOMP_free): Use ialias_call.
	(omp_aligned_calloc, omp_calloc, omp_realloc): New functions.
	* libgomp.map (OMP_5.0.2): Export omp_aligned_alloc, omp_calloc,
	omp_aligned_calloc and omp_realloc.
	* testsuite/libgomp.c-c++-common/alloc-4.c (main): Add
	omp_aligned_alloc, omp_calloc and omp_aligned_calloc tests.
	* testsuite/libgomp.c-c++-common/alloc-5.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-6.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-7.c: New test.
	* testsuite/libgomp.c-c++-common/alloc-8.c: New test.
2021-09-30 09:30:18 +02:00
GCC Administrator
fd1334791e Daily bump. 2021-09-29 00:16:26 +00:00
Tobias Burnus
1f0a57bd54 libgomp: Only check for 2*sizeof(void*) int type with Fortran [PR96661]
The depend type is a struct with two pointer members for C/C++ - but for
Fortran OpenMP requires an integer type with kind = omp_depend_kind. Thus,
libgomp's configure checks that an integer type/kind with size 2*sizeof(void*)
is available. However, this integer type/kind is not needed when building without
Fortran support. Thus, only check this when Fortran is enabled.

libgomp/
	PR libgomp/96661
	* configure.ac: Only check for int-type = 2*size_t support when
	building with Fortran support.
	* configure: Regenerate.
2021-09-28 15:15:47 +02:00
Thomas Schwinge
a43ae03a05 Further test case adjustment re "Fortran: Fix assumed-size to assumed-rank passing"
Fix-up for recent commit 00f6de9c69
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]",
and commit da1f6391b7
"libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note".

Due to use of '#if !ACC_MEM_SHARED' conditionals in
'libgomp.oacc-fortran/if-1.f90', 'target { !  openacc_host_selected }'
needs some special care (ignoring the pre-existing mismatch of
'ACC_MEM_SHARED' vs. 'openacc_host_selected').

As seen with GCN offloading, we need to revert to another bit of the
original code in 'libgomp.oacc-fortran/privatized-ref-2.f90'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2021-09-28 14:18:21 +02:00
GCC Administrator
cf966403d9 Daily bump. 2021-09-28 00:16:21 +00:00
Aldy Hernandez
0288527f47 Replace VRP threader with a hybrid forward threader.
This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.

With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.

I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.

Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).

Note, that these numbers are slightly different than what I originally
posted.  A few correctness tweaks, plus restricting loop threads, made
the difference.  That being said, I was aiming for par.  A 12% gain is
just gravy ;-).  When we merge the threaders, we should see even better
numbers-- and we'll have the benefit of an entire release stress testing
the solver.

As I mentioned in my introductory note, paths ending in MEM_REF
conditional are missing.  In reality, this didn't make a difference, as
it was so rare.  However, as a follow-up, I will distill a test and add
a suitable PR to keep us honest.

There is a one-line change to libgomp/team.c silencing a new used
uninitialized warning.  As my previous work with the threaders has
shown, warnings flare up after each improvement to jump threading.  I
expect this to be no different.  I've promised Jakub to investigate
fully, so I will analyze and add the appropriate PR for the warning
experts.

Oh yeah, the new pass dump is called vrp-threader[12] to match each
VRP[12] pass.  However, there's no reason for it to either be named
vrp-threader, or for it to live in tree-vrp.c.

Tested on x86-64 Linux.

OK?

p.s. "Did I say 5 weeks?  My bad, I meant 5 months."

gcc/ChangeLog:

	* passes.def (pass_vrp_threader): New.
	* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
	* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
	(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
	(hybrid_jt_simplifier::simplify): New.
	(hybrid_jt_simplifier::compute_ranges_from_state): New.
	* tree-ssa-threadedge.h (class hybrid_jt_state): New.
	(class hybrid_jt_simplifier): New.
	* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
	threader.
	(class hybrid_threader): New.
	(hybrid_threader::hybrid_threader): New.
	(hybrid_threader::~hybrid_threader): New.
	(hybrid_threader::before_dom_children): New.
	(hybrid_threader::after_dom_children): New.
	(execute_vrp_threader): New.
	(class pass_vrp_threader): New.
	(make_pass_vrp_threader): New.

libgomp/ChangeLog:

	* team.c: Initialize start_data.
	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
	* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr55107.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
	* gcc.dg/tree-ssa/pr21559.c: Adjust.
	* gcc.dg/tree-ssa/pr59597.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
	* gcc.dg/tree-ssa/pr71437.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
	* gcc.dg/tree-ssa/vrp106.c: Adjust.
	* gcc.dg/tree-ssa/vrp55.c: Adjust.
2021-09-27 17:39:51 +02:00
Tobias Burnus
da1f6391b7 libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note
In my last commit, r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0,
which inlined array-size code, I had to update the expected output.  However,
in doing so, I accidentally (copy'n'paste) changed dg-note into dg-message.

libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Change
	dg-message back to dg-note.
2021-09-27 14:33:39 +02:00
Tobias Burnus
00f6de9c69 Fortran: Fix assumed-size to assumed-rank passing [PR94070]
This code inlines the size0 and size1 libgfortran calls, the former is still
used by libgfortan itself (and by old code). Besides permitting more
optimizations, it also permits to handle assumed-rank dummies better: If the
dummy argument is a nonpointer/nonallocatable, an assumed-size actual arg is
repesented by having ubound == -1 for the last dimension. However, for
allocatable/pointers, this value can also exist. Hence, the dummy arg attr
has to be honored.

For that reason, when calling an assumed-rank procedure with nonpointer,
nonallocatable dummy arguments, the bounds have to be updated to avoid
the case ubound == -1 for the last dimension.

	PR fortran/94070

gcc/fortran/ChangeLog:

	* trans-array.c (gfc_tree_array_size): New function to
	find size inline (whole array or one dimension).
	(array_parameter_size): Use it, take stmt_block as arg.
	(gfc_conv_array_parameter): Update call.
	* trans-array.h (gfc_tree_array_size): Add prototype.
	* trans-decl.c (gfor_fndecl_size0, gfor_fndecl_size1): Remove
	these global vars.
	(gfc_build_intrinsic_function_decls): Remove their initialization.
	* trans-expr.c (gfc_conv_procedure_call): Update
	bounds of pointer/allocatable actual args to nonallocatable/nonpointer
	dummies to be one based.
	* trans-intrinsic.c (gfc_conv_intrinsic_shape): Fix case for
	assumed rank with allocatable/pointer dummy.
	(gfc_conv_intrinsic_size): Update to use inline function.
	* trans.h (gfor_fndecl_size0, gfor_fndecl_size1): Remove var decl.

libgfortran/ChangeLog:

	* intrinsics/size.c (size0, size1): Comment that now not
	used by newer compiler code.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Update
	expected dg-note output.

gcc/testsuite/ChangeLog:

	* gfortran.dg/c-interop/cf-out-descriptor-6.f90: Remove xfail.
	* gfortran.dg/c-interop/size.f90: Remove xfail.
	* gfortran.dg/intrinsic_size_3.f90: Update scan-tree-dump-times.
	* gfortran.dg/transpose_optimization_2.f90: Likewise.
	* gfortran.dg/size_optional_dim_1.f90: Add scan-tree-dump-not.
	* gfortran.dg/assumed_rank_22.f90: New test.
	* gfortran.dg/assumed_rank_22_aux.c: New test.
2021-09-27 14:04:54 +02:00
GCC Administrator
e4777439fc Daily bump. 2021-09-23 00:16:29 +00:00
Tobias Burnus
83aac69883 Fortran: Improve -Wmissing-include-dirs warnings [PR55534]
It turned out that enabling the -Wmissing-include-dirs for libcpp did output
too many warnings – at least as run with -B and similar options during the
GCC build and warning for internal include dirs like finclude, unlikely of
relevance to for a real-world user.
This patch now only warns for -I and -J by default but permits to get the
full warnings including libcpp ones with -Wmissing-include-dirs. It
additionally documents this in the manual.

With that change, the -Wno-missing-include-dirs could be removed
from libgfortran's configure and libgomp's testsuite always cflags.
This reverts those bits of the previous
commit r12-3722-g417ea5c02cef7f000e66d1af22b066c2c1cda047

Additionally, it turned out that all call to load_file called exit
explicitly - except for the main file via gfc_init -> gfc_new_file. The
latter also output a file not existing fatal error, such that two errors
where printed. Now exit is called in line with the other users of
load_file.

Finally, when compileing with "nonexisting/file.f90", first a warning that
"nonexisting" does not exist as include path was printed before the file
not found error was printed. Now the directory in which the physical file
is located is added silently, relying on the file-not-found diagnostic for
those.

	PR fortran/55534
gcc/ChangeLog:

	* doc/invoke.texi (-Wno-missing-include-dirs.): Document Fortran
	behavior.

gcc/fortran/ChangeLog:

	* cpp.c (gfc_cpp_register_include_paths, gfc_cpp_post_options):
	Add new bool verbose_missing_dir_warn argument.
	* cpp.h (gfc_cpp_post_options): Update prototype.
	* f95-lang.c (gfc_init): Remove duplicated file-not found diag.
	* gfortran.h (gfc_check_include_dirs): Takes bool
	verbose_missing_dir_warn arg.
	(gfc_new_file): Returns now void.
	* options.c (gfc_post_options): Update to warn for -I and -J,
	only, by default but for all when user requested.
	* scanner.c (gfc_do_check_include_dir):
	(gfc_do_check_include_dirs, gfc_check_include_dirs): Take bool
	verbose warn arg and update to avoid printing the same message
	twice or never.
	(load_file): Fix indent.
	(gfc_new_file): Return void and exit when load_file failed
	as all other load_file users do.

libgfortran/ChangeLog:

	* configure.ac (AM_FCFLAGS): Revert r12-3722 by removing
	-Wno-missing-include-dirs.
	* configure: Regenerate.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Revert
	r12-3722 by removing -Wno-missing-include-dirs.
	* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/include_14.f90: Add -J testcase and update dg-output.
	* gfortran.dg/include_15.f90: Likewise.
	* gfortran.dg/include_16.f90: Likewise.
	* gfortran.dg/include_17.f90: Likewise.
	* gfortran.dg/include_18.f90: Likewise.
	* gfortran.dg/include_19.f90: Likewise.
2021-09-22 20:58:35 +02:00
Jakub Jelinek
059b819e3c openmp: Add support for allocator and align modifiers on allocate clauses
As the allocate-2.c testcase shows, this change isn't 100% backwards compatible,
one could have allocate and/or align functions that return an OpenMP allocator
handle and previously it would call those functions and now would use those
names as keywords for the modifiers.  But it allows specify extra alignment
requirements for the allocations.

2021-09-22  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_ALLOCATE_ALIGN): Define.
	* tree.c (omp_clause_num_ops): Change number of OMP_CLAUSE_ALLOCATE
	arguments from 2 to 3.
	* tree-pretty-print.c (dump_omp_clause): Print allocator() around
	allocate clause allocator and print align if present.
	* omp-low.c (scan_sharing_clauses): Force allocate_map entry even
	for omp_default_mem_alloc if align modifier is present.  If align
	modifier is present, use TREE_LIST to encode both allocator and
	align.
	(lower_private_allocate, lower_rec_input_clauses, create_task_copyfn):
	Handle align modifier on allocator clause if present.
gcc/c-family/
	* c-omp.c (c_omp_split_clauses): Copy over OMP_CLAUSE_ALLOCATE_ALIGN.
gcc/c/
	* c-parser.c (c_parser_omp_clause_allocate): Parse allocate clause
	modifiers.
gcc/cp/
	* parser.c (cp_parser_omp_clause_allocate): Parse allocate clause
	modifiers.
	* semantics.c (finish_omp_clauses) <OMP_CLAUSE_ALLOCATE>: Perform
	semantic analysis of OMP_CLAUSE_ALLOCATE_ALIGN.
	* pt.c (tsubst_omp_clauses) <case OMP_CLAUSE_ALLOCATE>: Handle
	also OMP_CLAUSE_ALLOCATE_ALIGN.
gcc/testsuite/
	* c-c++-common/gomp/allocate-6.c: New test.
	* c-c++-common/gomp/allocate-7.c: New test.
	* g++.dg/gomp/allocate-4.C: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/allocate-2.c: New test.
	* testsuite/libgomp.c-c++-common/allocate-3.c: New test.
2021-09-22 09:29:13 +02:00
GCC Administrator
2c41dd82e2 Daily bump. 2021-09-22 00:16:28 +00:00
Tobias Burnus
417ea5c02c Fortran: Fix -Wno-missing-include-dirs handling [PR55534]
gcc/fortran/ChangeLog:

	PR fortran/55534
	* cpp.c: Define GCC_C_COMMON_C for #include "options.h" to make
	cpp_reason_option_codes available.
	(gfc_cpp_register_include_paths): Make static, set pfile's
	warn_missing_include_dirs and move before caller.
	(gfc_cpp_init_cb): New, cb code moved from ...
	(gfc_cpp_init_0): ... here.
	(gfc_cpp_post_options): Call gfc_cpp_init_cb.
	(cb_cpp_diagnostic_cpp_option): New. As implemented in c-family
	to match CppReason flags to -W... names.
	(cb_cpp_diagnostic): Use it to replace single special case.
	* cpp.h (gfc_cpp_register_include_paths): Remove as now static.
	* gfortran.h (gfc_check_include_dirs): New prototype.
	(gfc_add_include_path): Add new bool arg.
	* options.c (gfc_init_options): Don't set -Wmissing-include-dirs.
	(gfc_post_options): Set it here after commandline processing. Call
	gfc_add_include_path with defer_warn=false.
	(gfc_handle_option): Call it with defer_warn=true.
	* scanner.c (gfc_do_check_include_dir, gfc_do_check_include_dirs,
	gfc_check_include_dirs): New. Diagnostic moved from ...
	(add_path_to_list): ... here, which came before cmdline processing.
	Take additional bool defer_warn argument.
	(gfc_add_include_path): Take additional defer_warn arg.
	* scanner.h (struct gfc_directorylist): Reorder for alignment issues,
	add new 'bool warn'.

libgfortran/ChangeLog:
	PR fortran/55534
	* configure.ac (AM_FCFLAGS): Add -Wno-missing-include-dirs.
	* configure: Regenerate.

libgomp/ChangeLog:
	PR fortran/55534
	* testsuite/libgomp.fortran/fortran.exp: Add -Wno-missing-include-dirs
	to ALWAYS_CFLAGS.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

gcc/testsuite/ChangeLog:
	* gfortran.dg/include_6.f90: Change dg-error to
	dg-warning and update pattern.
	* gfortran.dg/include_14.f90: New test.
	* gfortran.dg/include_15.f90: New test.
	* gfortran.dg/include_16.f90: New test.
	* gfortran.dg/include_17.f90: New test.
	* gfortran.dg/include_18.f90: New test.
	* gfortran.dg/include_19.f90: New test.
	* gfortran.dg/include_20.f90: New test.
	* gfortran.dg/include_21.f90: New test.
2021-09-21 08:28:30 +02:00
GCC Administrator
cf74e7b57b Daily bump. 2021-09-19 00:16:29 +00:00
Jakub Jelinek
e5597f2ad5 openmp: Allow private or firstprivate arguments to default clause even for C/C++
OpenMP 5.1 allows default(private) or default(firstprivate) even in C/C++,
but it behaves the same way as in Fortran only for variables not declared at
namespace or file scope.  For the namespace/file scope variables it instead
behaves as default(none).

2021-09-18  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.c (omp_default_clause): For C/C++ default({,first}private),
	if file/namespace scope variable doesn't have predetermined sharing,
	treat it as if there was default(none).
gcc/c/
	* c-parser.c (c_parser_omp_clause_default): Handle private and
	firstprivate arguments, adjust diagnostics on unknown argument.
gcc/cp/
	* parser.c (cp_parser_omp_clause_default): Handle private and
	firstprivate arguments, adjust diagnostics on unknown argument.
	* cp-gimplify.c (cxx_omp_finish_clause): Handle OMP_CLAUSE_PRIVATE.
gcc/testsuite/
	* c-c++-common/gomp/default-2.c: New test.
	* c-c++-common/gomp/default-3.c: New test.
	* g++.dg/gomp/default-1.C: New test.
libgomp/
	* testsuite/libgomp.c++/default-1.C: New test.
	* testsuite/libgomp.c-c++-common/default-1.c: New test.
	* libgomp.texi (OpenMP 5.1): Mark "private and firstprivate argument
	to default clause in C and C++" as implemented.
2021-09-18 09:47:25 +02:00
GCC Administrator
0a4cb43932 Daily bump. 2021-09-18 00:16:36 +00:00
Julian Brown
2a3f9f6532 openacc: Shared memory layout optimisation
This patch implements an algorithm to lay out local data-share (LDS)
space.  It currently works for AMD GCN.  At the moment, LDS is used for
three things:

  1. Gang-private variables
  2. Reduction temporaries (accumulators)
  3. Broadcasting for worker partitioning

After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:

  +--------------+ (gang-private size + 1024, = 1536)
  | free space   |
  |    ...       |
  | - - - - - - -|
  | worker bcast |
  +--------------+
  | reductions   |
  +--------------+ <<< -mgang-private-size=<number> (def. 512)
  | gang-private |
  |    vars      |
  +--------------+ (32)
  | low LDS vars |
  +--------------+ LDS base

So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.

The partitioning algorithm works as follows.

 1. An "adjacency" set is built up for each basic block that might
    do a broadcast. This is calculated by starting at each such block,
    and doing a recursive DFS walk over successors to find the next
    block (or blocks) that *also* does a broadcast
    (dfs_broadcast_reachable_1).

 2. The adjacency set is inverted to get adjacent predecessor blocks also.

 3. Blocks that will perform a broadcast are sorted by size of that
    broadcast: the biggest blocks are handled first.

 4. A splay tree structure is used to calculate the spans of LDS memory
    that are already allocated by the blocks adjacent to this one
    (merge_ranges{,_1}.

 5. The current block's broadcast space is allocated from the first free
    span not allocated in the splay tree structure calculated above
    (first_fit_range). This seems to work quite nicely and efficiently
    with the splay tree structure.

 6. Continue with the next-biggest broadcast block until we're done.

In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.

PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:

The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.

gcc/
	* config/gcn/gcn-protos.h
	(gcn_goacc_create_worker_broadcast_record): Update prototype.
	* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
	preallocated block of LDS memory.  Do not cache/share decls for
	reduction temporaries between invocations.
	(gcn_goacc_reduction_teardown): Unshare VAR on second use.
	(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
	and return temporary LDS space at that offset.  Return pointer in
	"sender" case.
	* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs):
	New global vars.
	(ACC_LDS_SIZE): Define as acc_lds_size.
	(gcn_init_machine_status): Don't initialise lds_allocated,
	lds_allocs, reduc_decls fields of machine function struct.
	(gcn_option_override): Handle default size for gang-private
	variables and -mgang-private-size option.
	(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
	initialising M0_REG.
	(gcn_shared_mem_layout): New function.
	(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
	gang_private_hwm variable.
	(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
	* config/gcn/gcn.h (machine_function): Remove lds_allocated,
	lds_allocs, reduc_decls. Add reduction_base, reduction_limit.
	* config/gcn/gcn.opt (gang_private_size_opt): New global.
	(mgang-private-size=): New option.
	* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place
	documentation hook.
	* doc/tm.texi: Regenerate.
	* omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h):
	Add includes.
	(build_sender_ref): Handle sender_decl being pointer.
	(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS
	parameters.  Pass placement argument to
	create_worker_broadcast_record hook invocations.  Handle
	sender_decl being pointer and isolate_broadcasts inserting extra
	barriers.
	(blk_offset_map_t): Add typedef.
	(neuter_worker_single): Add BLK_OFFSET_MAP parameter.  Pass
	preallocated range to worker_single_copy call.
	(dfs_broadcast_reachable_1): New function.
	(idx_decl_pair_t, used_range_vec_t): New typedefs.
	(sort_size_descending): New function.
	(addr_range): New class.
	(splay_tree_compare_addr_range, splay_tree_free_key)
	(first_fit_range, merge_ranges_1, merge_ranges): New functions.
	(execute_omp_oacc_neuter_broadcast): Rename to...
	(oacc_do_neutering): ... this.  Add BOUNDS_LO, BOUNDS_HI
	parameters.  Arrange layout of shared memory for broadcast
	operations.
	(execute_omp_oacc_neuter_broadcast): New function.
	(pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1
	handling from here.  Enable pass for all OpenACC routines in order
	to call shared memory-layout hook.
	* target.def (create_worker_broadcast_record): Add OFFSET
	parameter.
	(shared_mem_layout): New hook.
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
2021-09-17 21:04:30 +02:00
Julian Brown
8251f90e87 Add 'libgomp.oacc-c-c++-common/broadcast-many.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
2021-09-17 21:04:29 +02:00
Jakub Jelinek
4a7842bb99 libgomp: Spelling error fix in OpenMP 5.1 conformance section
Fix spelling of OpenMP directive declare variant.

2021-09-17  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.1): Spelling fix,
	declare variante -> declare variant.
2021-09-17 12:34:27 +02:00
Jakub Jelinek
3a2bcffac6 openmp: Add support for OpenMP 5.1 atomics for C++
Besides the C++ FE changes, I've noticed that the C FE didn't reject
  #pragma omp atomic capture compare
  { v = x; x = y; }
and other forms of atomic swap, this patch fixes that too.  And the
c-family/ routine needed quite a few changes so that the new code
in it works fine with both FEs.

2021-09-17  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
	* c-omp.c (c_finish_omp_atomic): Avoid creating
	TARGET_EXPR if test is true, use create_tmp_var_raw instead of
	create_tmp_var and add a zero initializer to TARGET_EXPRs that
	had NULL initializer.  When omitting operands after v = x,
	use type of v rather than type of x.  Fix type of vtmp
	TARGET_EXPR.
gcc/c/
	* c-parser.c (c_parser_omp_atomic): Reject atomic swap if capture
	is true.
gcc/cp/
	* cp-tree.h (finish_omp_atomic): Add r and weak arguments.
	* parser.c (cp_parser_omp_atomic): Update function comment for
	OpenMP 5.1 atomics, parse OpenMP 5.1 atomics and fail, compare and
	weak clauses.
	* semantics.c (finish_omp_atomic): Add r and weak arguments, handle
	them, handle COND_EXPRs.
	* pt.c (tsubst_expr): Adjust for COND_EXPR forms that
	finish_omp_atomic can now produce.
gcc/testsuite/
	* c-c++-common/gomp/atomic-18.c: Expect same diagnostics in C++ as in
	C.
	* c-c++-common/gomp/atomic-25.c: Drop c effective target.
	* c-c++-common/gomp/atomic-26.c: Likewise.
	* c-c++-common/gomp/atomic-27.c: Likewise.
	* c-c++-common/gomp/atomic-28.c: Likewise.
	* c-c++-common/gomp/atomic-29.c: Likewise.
	* c-c++-common/gomp/atomic-30.c: Likewise.  Adjust expected diagnostics
	for C++ when it differs from C.
	(foo): Change return type from double to void.
	* g++.dg/gomp/atomic-5.C: Adjust expected diagnostics wording.
	* g++.dg/gomp/atomic-20.C: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/atomic-19.c: Drop c effective target.
	Use /* */ comments instead of //.
	* testsuite/libgomp.c-c++-common/atomic-20.c: Likewise.
	* testsuite/libgomp.c-c++-common/atomic-21.c: Likewise.
	* testsuite/libgomp.c++/atomic-16.C: New test.
	* testsuite/libgomp.c++/atomic-17.C: New test.
2021-09-17 11:28:31 +02:00
GCC Administrator
a26206ec7b Daily bump. 2021-09-11 00:16:27 +00:00
Jakub Jelinek
8122fbff77 openmp: Implement OpenMP 5.1 atomics, so far for C only
This patch implements OpenMP 5.1 atomics (with clarifications from upcoming 5.2).
The most important changes are that it is now possible to write (for C/C++,
for Fortran it was possible before already) min/max atomics and more importantly
compare and exchange in various forms.
Also, acq_rel is now allowed on read/write and acq_rel/acquire are allowed on
update, and there are new compare, weak and fail clauses.

2021-09-10  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum omp_memory_order): Add OMP_MEMORY_ORDER_MASK,
	OMP_FAIL_MEMORY_ORDER_UNSPECIFIED, OMP_FAIL_MEMORY_ORDER_RELAXED,
	OMP_FAIL_MEMORY_ORDER_ACQUIRE, OMP_FAIL_MEMORY_ORDER_RELEASE,
	OMP_FAIL_MEMORY_ORDER_ACQ_REL, OMP_FAIL_MEMORY_ORDER_SEQ_CST and
	OMP_FAIL_MEMORY_ORDER_MASK enumerators.
	(OMP_FAIL_MEMORY_ORDER_SHIFT): Define.
	* gimple-pretty-print.c (dump_gimple_omp_atomic_load,
	dump_gimple_omp_atomic_store): Print [weak] for weak atomic
	load/store.
	* gimple.h (enum gf_mask): Change GF_OMP_ATOMIC_MEMORY_ORDER
	to 6-bit mask, adjust GF_OMP_ATOMIC_NEED_VALUE value and add
	GF_OMP_ATOMIC_WEAK.
	(gimple_omp_atomic_weak_p, gimple_omp_atomic_set_weak): New inline
	functions.
	* tree.h (OMP_ATOMIC_WEAK): Define.
	* tree-pretty-print.c (dump_omp_atomic_memory_order): Adjust for
	fail memory order being encoded in the same enum and also print
	fail clause if present.
	(dump_generic_node): Print weak clause if OMP_ATOMIC_WEAK.
	* gimplify.c (goa_stabilize_expr): Add target_expr and rhs arguments,
	handle pre_p == NULL case as a test mode that only returns value
	but doesn't change gimplify nor change anything otherwise, adjust
	recursive calls, add MODIFY_EXPR, ADDR_EXPR, COND_EXPR, TARGET_EXPR
	and CALL_EXPR handling, adjust COMPOUND_EXPR handling for
	__builtin_clear_padding calls, for !rhs gimplify as lvalue rather
	than rvalue.
	(gimplify_omp_atomic): Adjust goa_stabilize_expr caller.  Handle
	COND_EXPR rhs.  Set weak flag on gimple load/store for
	OMP_ATOMIC_WEAK.
	* omp-expand.c (omp_memory_order_to_fail_memmodel): New function.
	(omp_memory_order_to_memmodel): Adjust for fail clause encoded
	in the same enum.
	(expand_omp_atomic_cas): New function.
	(expand_omp_atomic_pipeline): Use omp_memory_order_to_fail_memmodel
	function.
	(expand_omp_atomic): Attempt to optimize atomic compare and exchange
	using expand_omp_atomic_cas.
gcc/c-family/
	* c-common.h (c_finish_omp_atomic): Add r and weak arguments.
	* c-omp.c: Include gimple-fold.h.
	(c_finish_omp_atomic): Add r and weak arguments.  Add support for
	OpenMP 5.1 atomics.
gcc/c/
	* c-parser.c (c_parser_conditional_expression): If omp_atomic_lhs and
	cond.value is >, < or == with omp_atomic_lhs as one of the operands,
	don't call build_conditional_expr, instead build a COND_EXPR directly.
	(c_parser_binary_expression): Avoid calling parser_build_binary_op
	if omp_atomic_lhs even in more cases for >, < or ==.
	(c_parser_omp_atomic): Update function comment for OpenMP 5.1 atomics,
	parse OpenMP 5.1 atomics and fail, compare and weak clauses, allow
	acq_rel on atomic read/write and acq_rel/acquire clauses on update.
	* c-typeck.c (build_binary_op): For flag_openmp only handle
	MIN_EXPR/MAX_EXPR.
gcc/cp/
	* parser.c (cp_parser_omp_atomic): Allow acq_rel on atomic read/write
	and acq_rel/acquire clauses on update.
	* semantics.c (finish_omp_atomic): Adjust c_finish_omp_atomic caller.
gcc/testsuite/
	* c-c++-common/gomp/atomic-17.c (foo): Add tests for atomic read,
	write or update with acq_rel clause and atomic update with acquire clause.
	* c-c++-common/gomp/atomic-18.c (foo): Adjust expected diagnostics
	wording, remove tests moved to atomic-17.c.
	* c-c++-common/gomp/atomic-21.c: Expect only 2 omp atomic release and
	2 omp atomic acq_rel directives instead of 4 omp atomic release.
	* c-c++-common/gomp/atomic-25.c: New test.
	* c-c++-common/gomp/atomic-26.c: New test.
	* c-c++-common/gomp/atomic-27.c: New test.
	* c-c++-common/gomp/atomic-28.c: New test.
	* c-c++-common/gomp/atomic-29.c: New test.
	* c-c++-common/gomp/atomic-30.c: New test.
	* c-c++-common/goacc-gomp/atomic.c: Expect 1 omp atomic release and
	1 omp atomic_acq_rel instead of 2 omp atomic release directives.
	* gcc.dg/gomp/atomic-5.c: Adjust expected error diagnostic wording.
	* g++.dg/gomp/atomic-18.C:Expect 4 omp atomic release and
	1 omp atomic_acq_rel instead of 5 omp atomic release directives.
libgomp/
	* testsuite/libgomp.c-c++-common/atomic-19.c: New test.
	* testsuite/libgomp.c-c++-common/atomic-20.c: New test.
	* testsuite/libgomp.c-c++-common/atomic-21.c: New test.
2021-09-10 20:41:33 +02:00
GCC Administrator
b2748138c0 Daily bump. 2021-09-08 00:16:23 +00:00
Tobias Burnus
ff7bc505b1 libgomp.texi: Extend OpenMP 5.0 Implementation Status
libgomp/
	* libgomp.texi (OpenMP Implementation Status): Extend
	OpenMP 5.0 section.
	(OpenACC Profiling Interface): Fix typo.
2021-09-07 18:30:25 +02:00
Tobias Burnus
cff72ef4e2 libgomp.texi: Add OpenMP Implementation Status
libgomp/
	* libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
	not to 4.5; link to new section.
	(OpenMP Implementation Status): New.
2021-09-07 11:01:38 +02:00
GCC Administrator
9f99555f29 Daily bump. 2021-09-07 00:16:34 +00:00
Thomas Schwinge
086bb917d6 'libgomp.c/target-43.c': '-latomic' for nvptx offloading
... to avoid a regression with recent
commit 090f0d78f1
"openmp: Improve expand_omp_atomic_pipeline":

    unresolved symbol __atomic_compare_exchange_1
    collect2: error: ld returned 1 exit status
    mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status

	libgomp/
	* testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.
2021-09-06 11:51:13 +02:00
GCC Administrator
7b7395409c Daily bump. 2021-09-04 00:16:38 +00:00
Tobias Burnus
4ce90454c2 libgomp.*/error-1.{c,f90}: Fix dg-output newline pattern
libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/error-1.c: Use \r\n not \n\r in
	dg-output.
	* testsuite/libgomp.fortran/error-1.f90: Likewise.
2021-09-03 15:27:00 +02:00
GCC Administrator
38b19c5b08 Daily bump. 2021-08-24 00:17:00 +00:00
Thomas Schwinge
29c355f76c Add 'libgomp.c/address-space-1.c'
Intel MIC (emulated) offloading execution failure remains to be analyzed.

	libgomp/
	* testsuite/libgomp.c/address-space-1.c: New file.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2021-08-23 17:46:08 +02:00
Thomas Schwinge
bb75b22aba Allow matching Intel MIC in OpenMP 'declare variant'
..., and use that to improve XFAILing for Intel MIC offloading execution
instead of compilation in 'libgomp.c-c++-common/target-45.c',
'libgomp.fortran/target10.f90'.

	gcc/
	* config/i386/i386-options.c (ix86_omp_device_kind_arch_isa)
	<omp_device_arch> [ACCEL_COMPILER]: Match "intel_mic".
	* config/i386/t-omp-device (omp-device-properties-i386) <arch>:
	Add "intel_mic".
	libgomp/
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): Remove 'proc'.
	(check_effective_target_offload_device_intel_mic): New 'proc'.
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(device_arch_intel_mic, on_device_arch_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use that for
	'dg-xfail-run-if'.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
2021-08-23 17:45:40 +02:00
Tobias Burnus
d4de7e32ef Fortran/OpenMP: strict modifier on grainsize/num_tasks
This patch adds support for the 'strict' modifier on grainsize/num_tasks
clauses, an OpenMP 5.1 feature supported in C/C++ since commit
r12-3066-g3bc75533d1f87f0617be6c1af98804f9127ec637

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle 'strict' modifier
	on grainsize/num_tasks
	* gfortran.h (gfc_omp_clauses): Add grainsize_strict
	and num_tasks_strict.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses):
	Handle 'strict' modifier on grainsize/num_tasks.
	* openmp.c (gfc_match_omp_clauses): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/taskloop-4-a.f90: New test.
	* testsuite/libgomp.fortran/taskloop-4.f90: New test.
	* testsuite/libgomp.fortran/taskloop-5-a.f90: New test.
	* testsuite/libgomp.fortran/taskloop-5.f90: New test.
2021-08-23 15:15:30 +02:00