Commit Graph

1599 Commits

Author SHA1 Message Date
GCC Administrator
6b1695f4a0 Daily bump. 2021-11-17 00:16:29 +00:00
Jakub Jelinek
9ceaf0fee3 libgomp: Mark thread_limit clause to target construct as implemented
After the Fortran changes we can mark it as implemented...

2021-11-16  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.1): Mark thread_limit clause to target
	construct as implemented.
2021-11-16 10:21:56 +01:00
GCC Administrator
e2b57363fc Daily bump. 2021-11-16 00:16:31 +00:00
Tobias Burnus
82ec4cb3c4 Fortran: openmp: Add support for thread_limit clause on target
gcc/fortran/ChangeLog:

	* openmp.c (OMP_TARGET_CLAUSES): Add thread_limit.
	* trans-openmp.c (gfc_split_omp_clauses): Add thread_limit also to
	teams.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/thread-limit-1.f90: New test.
2021-11-15 15:44:11 +01:00
Jakub Jelinek
aea7238683 openmp: Add support for thread_limit clause on target
OpenMP 5.1 says that thread_limit clause can also appear on target,
and similarly to teams should affect the thread-limit-var ICV.
On combined target teams, the clause goes to both.

We actually passed thread_limit internally on target already before,
but only used it for gcn/ptx offloading to hint how many threads should be
created and for ptx didn't set thread_limit_var in that case.
Similarly for host fallback.
Also, I found that we weren't copying the args array that contains encoded
thread_limit and num_teams clause for target (etc.) for async target.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT
	to OMP_TARGET_CLAUSES if it isn't there already.
gcc/c-family/
	* c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>:
	Duplicate to both OMP_TARGET and OMP_TEAMS.
gcc/c/
	* c-parser.c (OMP_TARGET_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
gcc/cp/
	* parser.c (OMP_TARGET_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
libgomp/
	* task.c (gomp_create_target_task): Copy args array as well.
	* target.c (gomp_target_fallback): Add args argument.
	Set gomp_icv (true)->thread_limit_var if thread_limit is present.
	(GOMP_target): Adjust gomp_target_fallback caller.
	(GOMP_target_ext): Likewise.
	(gomp_target_task_fn): Likewise.
	* config/nvptx/team.c (gomp_nvptx_main): Set
	gomp_global_icv.thread_limit_var.
	* testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
2021-11-15 13:20:53 +01:00
Jakub Jelinek
9fa72756d9 libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound
Here is a PTX implementation of what I was talking about, that for
num_teams_upper 0 or whenever num_teams_lower <= num_blocks, the current
implementation is fine but if the user explicitly asks for more
teams than we can provide in hardware, we need to stop assuming that
omp_get_team_num () is equal to the hw team id, but instead need to use some
team specific memory (it is .shared for PTX), or if none is
provided, array indexed by the hw team id and run some teams serially within
the same hw thread.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

	* config/nvptx/team.c (__gomp_team_num): Define as
	__attribute__((shared)) var.
	(gomp_nvptx_main): Initialize __gomp_team_num to 0.
	* config/nvptx/target.c (__gomp_team_num): Declare as
	extern __attribute__((shared)) var.
	(GOMP_teams4): Use __gomp_team_num as the team number instead of
	%ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
	is bigger than num_blocks, use num_teams_lower teams and arrange for
	bumping of __gomp_team_num if !first and returning false once we run
	out of teams.
	* config/nvptx/teams.c (__gomp_team_num): Declare as
	extern __attribute__((shared)) var.
	(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.
2021-11-15 09:20:52 +01:00
Jakub Jelinek
d294459720 libgomp: Add a testcase for omp_get_num_teams inside of target inside of host teams
This is https://github.com/OpenMP/spec/issues/3183
There is an agreement that we should return 1 team inside of target,
even if that target is inside of host teams.  We were doing that
when offloading and not during host fallback, r12-5151 should fix that
even for host fallback.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/teams-5.c: New test.
2021-11-15 08:58:39 +01:00
GCC Administrator
af2852b9dc Daily bump. 2021-11-13 00:16:39 +00:00
Jakub Jelinek
f49c7a4fb2 libgomp: Unbreak gcn offload build
My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it.  My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.

This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/103201
	* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
	to ...
	* config/gcn/teams.c: ... here.  New file.
2021-11-12 16:11:02 +01:00
Chung-Lin Tang
b7e2048063 openmp: Relax handling of implicit map vs. existing device mappings
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an
 implicit data-mapping attribute has corresponding storage in the device data
 environment prior to a task encountering the construct that is associated with
 the map clause, only that part of the original storage will have corresponding
 storage in the device data environment as a result of the map clause."

2021-11-12  Chung-Lin Tang  <cltang@codesourcery.com>

include/ChangeLog:

	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
	(GOMP_MAP_IMPLICIT): New special map kind bits value.
	(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
	special map kind bits.
	(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

	* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
	'implicit' bit, using 'base.deprecated_flag' field of tree_node.
	* tree-pretty-print.c (dump_omp_clause): Add support for printing
	implicit attribute in tree dumping.
	* gimplify.c (gimplify_adjust_omp_clauses_1):
	Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
	created.
	(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
	clauses, from simple append, to starting of list, after non-map clauses.
	* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
	values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-implicit-map-1.c: New test.
	* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
	* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
	implicit map handling to allow a "superset" existing map as valid case.
	(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
	(get_implicit): New function to extract implicit status.
	(gomp_map_fields_existing): Adjust arguments in calls to
	gomp_map_vars_existing, and add uses of get_implicit.
	(gomp_map_vars_internal): Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12 20:29:48 +08:00
Jakub Jelinek
7d6da11fce openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
	(BUILT_IN_GOMP_TEAMS4): New.
	* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
	* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
	GOMP_teams, pass to it also num_teams lower-bound expression
	or a dup of upper-bound if it is missing and a flag whether
	it is the first call or not.
gcc/fortran/
	* types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
	* libgomp_g.h (GOMP_teams4): Declare.
	* libgomp.map (GOMP_5.1): Export GOMP_teams4.
	* target.c (GOMP_teams4): New function.
	* config/nvptx/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* config/gcn/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
	teams instead of <= 2.
	* testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12 12:41:22 +01:00
GCC Administrator
b39265d4fe Daily bump. 2021-11-12 00:16:32 +00:00
Tobias Burnus
407eaad25f Fortran/openmp: Add support for 2 argument num_teams clause
Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

	* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
	num_teams_upper, add num_teams_upper.
	* dump-parse-tree.c (show_omp_clauses): Update to handle
	lower-bound num_teams clause.
	* frontend-passes.c (gfc_code_walker): Likewise
	* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
	resolve_omp_clauses): Likewise.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
	gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/teams-1.f90: New test.
2021-11-11 17:27:00 +01:00
Jakub Jelinek
fa4fcb111a libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
	* team.c (struct gomp_thread_start_data): Likewise.
	(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
	(gomp_team_start): Initialize start_data->num_teams and
	start_data->team_num.  Update nthr->num_teams and nthr->team_num.
	* teams.c (gomp_num_teams, gomp_team_num): Remove.
	(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
	instead of gomp_num_teams and gomp_team_num.
	(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
	(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
	* testsuite/libgomp.c/teams-4.c: New test.
2021-11-11 13:57:31 +01:00
Jakub Jelinek
48d7327f2a openmp: Add support for 2 argument num_teams clause
In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
    #pragma omp teams firstprivate (b) num_teams (baz ())
    {
      bar (a, b);
    }
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
    unsigned int retval.3;
    struct S * D.2549;
    struct S b;

    retval.3 = (unsigned int) retval.1;
    D.2549 = .omp_data_i->b;
    S::S (&b, D.2549);
    #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
    __builtin_GOMP_teams (retval.3, 0);
    {
      bar (&a, &b);
    }
    S::~S (&b);
    #pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
	(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
	(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
	* tree.c (omp_clause_num_ops): Increase num ops for
	OMP_CLAUSE_NUM_TEAMS to 2.
	* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
	for OMP_CLAUSE_NUM_TEAMS.
	* gimplify.c (gimplify_scan_omp_clauses): Gimplify
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
	(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
	of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
	instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
	* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
	* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/cp/
	* parser.c (cp_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
	* semantics.c (finish_omp_clauses): Handle
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
	* pt.c (tsubst_omp_clauses): Likewise.
	(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses): Use
	OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
	* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* c-c++-common/gomp/num-teams-1.c: New test.
	* c-c++-common/gomp/num-teams-2.c: New test.
	* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* g++.dg/gomp/attrs-2.C (bar): Likewise.
	* g++.dg/gomp/num-teams-1.C: New test.
	* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/teams-1.c: New test.
2021-11-11 09:42:47 +01:00
GCC Administrator
c9b1334eec Daily bump. 2021-11-10 00:16:28 +00:00
Thomas Schwinge
00c9ce13a6 Restore 'GOMP_OPENACC_DIM' environment variable parsing
... that got broken by recent commit c057ed9c52
"openmp: Fix up strtoul and strtoull uses in libgomp", resulting in spurious
FAILs for tests specifying 'dg-set-target-env-var "GOMP_OPENACC_DIM" "[...]"'.

	libgomp/
	* env.c (parse_gomp_openacc_dim): Restore parsing.
2021-11-09 16:51:57 +01:00
GCC Administrator
0ef944629a Daily bump. 2021-10-31 00:16:24 +00:00
Tobias Burnus
948d461954 OpenMP: Add strictly nested API call check [PR102972]
The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

	PR middle-end/102972
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
	internal Fortran name; new permit_num_teams arg to permit
	omp_get_num_teams and omp_get_team_num.
	(scan_omp_1_stmt): Update call to it, add missing call for
	reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
	routine test.
	* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
	* c-c++-common/gomp/teams-3.c: New test.
	* gfortran.dg/gomp/teams-3.f90: New test.
	* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
	* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
	parallel construct.
	* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
	* testsuite/libgomp.c/target-3.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/target-teams-1.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-3.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.c/thread-limit-5.c: Likewise.
	* testsuite/libgomp.fortran/icv-3.f90: Likewise.
	* testsuite/libgomp.fortran/icv-4.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.
2021-10-30 23:45:32 +02:00
GCC Administrator
4c61300f2b Daily bump. 2021-10-30 00:16:25 +00:00
Aldy Hernandez
4b3a325f07 Remove VRP threader passes in exchange for better threading pre-VRP.
This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The numbers are as follows:

	prev: # threads in backward + vrp-threaders = 92624
	now:  # threads in backward threaders = 94275
	Gain: +1.78%

	prev: # total threads: 189495
	now:  # total threads: 193714
	Gain: +2.22%

	The numbers are not as great as my initial proposal, but I've
	recently pushed all the work that got us to this point ;-).

And... the compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

gcc/ChangeLog:

	* passes.def: Replace the pass_thread_jumps before VRP* with
	pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
	* tree-ssa-threadbackward.c (pass_data_thread_jumps_full):
	Remove hyphen from "thread-full" name.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
	* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

	* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
	* gcc.dg/old-style-asm-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
	* gcc.dg/tree-ssa/pr20701.c: Same.
	* gcc.dg/tree-ssa/pr21001.c: Same.
	* gcc.dg/tree-ssa/pr21294.c: Same.
	* gcc.dg/tree-ssa/pr21417.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr21563.c: Same.
	* gcc.dg/tree-ssa/pr49039.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr61839_1.c: Same.
	* gcc.dg/tree-ssa/pr61839_3.c: Same.
	* gcc.dg/tree-ssa/pr66752-3.c: Same.
	* gcc.dg/tree-ssa/pr68198.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/pr77445.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
	* gcc.dg/tree-ssa/vrp02.c: Same.
	* gcc.dg/tree-ssa/vrp03.c: Same.
	* gcc.dg/tree-ssa/vrp05.c: Same.
	* gcc.dg/tree-ssa/vrp06.c: Same.
	* gcc.dg/tree-ssa/vrp07.c: Same.
	* gcc.dg/tree-ssa/vrp08.c: Same.
	* gcc.dg/tree-ssa/vrp09.c: Same.
	* gcc.dg/tree-ssa/vrp33.c: Same.
	* gcc.dg/uninit-pred-9_b.c: Same.
	* gcc.dg/uninit-pred-7_a.c: xfail.
2021-10-29 17:57:27 +02:00
GCC Administrator
04a2cf3fd6 Daily bump. 2021-10-28 00:16:39 +00:00
Jakub Jelinek
eef8114906 openmp: Document that non-rect loops are not supported in Fortran yet
I've found we claim to support non-rectangular loops, but don't actually
support those in Fortran, as can be seen on:
  integer i, j
  !$omp parallel do collapse(2)
  do i = 0, 10
    do j = 0, i
    end do
  end do
end
To support this, the Fortran FE needs to allow the valid forms of
non-rectangular loops and disallow others, so mainly it needs its
updated version of c-omp.c c_omp_check_loop_iv etc., plus for non-rectangular
lb or ub expressions emit a TREE_VEC instead of normal expression as the C/C++ FE
do, plus testsuite coverage.

2021-10-27  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.0): Mention that Non-rectangular loop nests
	aren't implemented for Fortran yet.
2021-10-27 09:24:46 +02:00
Jakub Jelinek
2084b5f42a openmp: Allow non-rectangular loops with pointer iterators
This patch handles pointer iterators for non-rectangular loops.  They are
more limited than integral iterators of non-rectangular loops, in particular
only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear
in lb or ub where a2 is some integral loop invariant expression, so no e.g.
multiplication etc.

2021-10-27  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular
	iterators with pointer types.
	(expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise.
gcc/c-family/
	* c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for
	POINTER_PLUS_EXPR.
	(c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR.
	(c_omp_check_loop_iv): Set kind even if the iterator is non-integral.
gcc/testsuite/
	* c-c++-common/gomp/loop-8.c: New test.
	* c-c++-common/gomp/loop-9.c: New test.
libgomp/
	* testsuite/libgomp.c/loop-26.c: New test.
	* testsuite/libgomp.c/loop-27.c: New test.
2021-10-27 09:22:07 +02:00
GCC Administrator
b621508d6f Daily bump. 2021-10-26 00:16:26 +00:00
Tobias Burnus
72dc270be7 libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
Some systems do not have <alloca.h> but provide alloca differently, e.g.
via stdlib.h. Do it like other testcases do and use __builtin_alloca.

libgomp/ChangeLog:

	PR testsuite/102910
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
	instead of #include <alloca.h> + alloca.
2021-10-25 20:48:38 +02:00
GCC Administrator
ae5c540662 Daily bump. 2021-10-22 00:16:31 +00:00
Chung-Lin Tang
2e4659199e openmp: Fortran strictly-structured blocks support
This implements strictly-structured blocks support for Fortran, as specified in
OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most
OpenMP constructs, with a "!$omp end ..." ending directive optional for that
form.

gcc/fortran/ChangeLog:

	* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
	together with COMP_BLOCK.
	* parse.c (parse_omp_structured_block): Change return type to
	'gfc_statement', add handling for strictly-structured block case, adjust
	recursive calls to parse_omp_structured_block.
	(parse_executable): Adjust calls to parse_omp_structured_block.
	* parse.h (enum gfc_compile_state): Add
	COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
	* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
	handling.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/cancel-1.f90: Adjust testcase.
	* gfortran.dg/gomp/nesting-3.f90: Adjust testcase.
	* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-2.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-3.f90: New test.

libgomp/ChangeLog:

	* libgomp.texi (Support of strictly structured blocks in Fortran):
	Adjust to 'Y'.
	* testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
2021-10-21 14:57:25 +08:00
GCC Administrator
674dda6be0 Daily bump. 2021-10-21 00:16:29 +00:00
Chung-Lin Tang
d98626bf45 openmp: in_reduction support for Fortran
This patch implements support for the in_reduction clause for Fortran.
It also includes more completion of the taskgroup construct inside the
Fortran front-end, thus allowing task_reduction to work for task and
target constructs.

gcc/fortran/ChangeLog:

	* openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
	false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
	(gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
	adjust call to gfc_match_omp_clause_reduction.
	(match_omp): Adjust call to gfc_match_omp_clauses
	* trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
	gfc_match_omp_clause, create and return block.

gcc/ChangeLog:

	* omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy
	as local variable.
	(scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in
	ctx->outer instead of ctx.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
	pattern.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
	* testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.
2021-10-20 23:25:02 +08:00
Jakub Jelinek
c7abdf46fb openmp: Fix up struct gomp_work_share handling [PR102838]
If GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is not defined, the intent was to
treat the split of the structure between first cacheline (64 bytes)
as mostly write-once, use afterwards and second cacheline as rw just
as an optimization.  But as has been reported, with vectorization enabled
at -O2 it can now result in aligned vector 16-byte or larger stores.
When not having posix_memalign/aligned_alloc/memalign or other similar API,
alloc.c emulates it but it needs to allocate extra memory for the dynamic
realignment.
So, for the GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC not defined case, this patch
stops using aligned (64) attribute in the middle of the structure and instead
inserts padding that puts the second half of the structure at offset 64 bytes.

And when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, usually it was allocated
as aligned, but for the orphaned case it could still be allocated just with
gomp_malloc without guaranteed proper alignment.

2021-10-20  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* libgomp.h (struct gomp_work_share_1st_cacheline): New type.
	(struct gomp_work_share): Only use aligned(64) attribute if
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, otherwise just
	add padding before lock to ensure lock is at offset 64 bytes
	into the structure.
	(gomp_workshare_struct_check1, gomp_workshare_struct_check2):
	New poor man's static assertions.
	* work.c (gomp_work_share_start): Use gomp_aligned_alloc instead of
	gomp_malloc if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
2021-10-20 09:34:51 +02:00
Aldy Hernandez
d8edfadfc7 Disallow loop rotation and loop header crossing in jump threaders.
There is a lot of fall-out from this patch, as there were many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
	on the same line as the threading cancellation.
	(jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
	Avoid threading through loop headers where the path remains in the
	loop.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.

gcc/testsuite/ChangeLog:

	* gcc.dg/Warray-bounds-87.c: Remove xfail.
	* gcc.dg/analyzer/pr94851-2.c: Remove xfail.
	* gcc.dg/graphite/pr69728.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
	* gcc.dg/shrink-wrap-loop.c: Remove xfail.
	* gcc.dg/loop-8.c: Adjust for new threading restrictions.
	* gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr71437.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Remove.
	* gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
2021-10-20 07:07:35 +02:00
GCC Administrator
ce4d1f632f Daily bump. 2021-10-19 00:16:23 +00:00
Jakub Jelinek
3adcf7e104 openmp: Fix handling of numa_domains(1)
If numa-domains is used with num-places count, sometimes the function
could create more places than requested and crash.  This depended on the
content of /sys/devices/system/node/online file, e.g. if the file
contains
0-1,16-17
and all NUMA nodes contain at least one CPU in the cpuset of the program,
then numa_domains(2) or numa_domains(4) (or 5+) work fine while
numa_domains(1) or numa_domains(3) misbehave.  I.e. the function was able
to stop after reaching limit on the , separators (or trivially at the end),
but not within in the ranges.

2021-10-18  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/affinity.c (gomp_affinity_init_numa_domains): Add
	&& gomp_places_list_len < count after nfirst <= nlast loop condition.
2021-10-18 15:00:46 +02:00
Tobias Burnus
64f9623765 Fortran: Fix Bind(C) Array-Descriptor Conversion
gfortran uses internally a different array descriptor ("gfc") as
Fortran 2018 alias TS291113 defines for C interoperability via
ISO_Fortran_binding.h ("CFI").  Hence, when calling a C function
from Fortran, it has to be converted in the callee - and if a
BIND(C) procedure is written in Fortran, the CFI argument has
to be converted to gfc in order work with the rest of the FE
code and the library calls.

Before this patch, part was handled in the FE generated code and
other parts in libgfortran.  With this patch, all code is generated
and CFI is defined as proper type - visible in the debugger and to
the middle end - avoiding both alias issues and missed optimization
issues.

This patch also fixes issues like: intent(out) deallocation in
the bind(C) callee, using the CFI descriptor also for allocatable
and pointer scalars and for len=* character strings.
For 'select rank', it also optimizes the code + avoid accessing
uninitialized memory if the dummy argument is allocatable/a pointer.
It additionally rejects passing a descriptorless type(*) to an
assumed-rank dummy argument. [F2018:C711]

	PR fortran/102086
	PR fortran/92189
	PR fortran/92621
	PR fortran/101308
	PR fortran/101309
	PR fortran/101635
	PR fortran/92482

gcc/fortran/ChangeLog:

	* decl.c (gfc_verify_c_interop_param): Remove 'sorry' for
	scalar allocatable/pointer and len=*.
	* expr.c (is_CFI_desc): Return true for for those.
	* gfortran.h (CFI_type_kind_shift, CFI_type_mask,
	CFI_type_from_type_kind, CFI_VERSION, CFI_MAX_RANK,
	CFI_attribute_pointer, CFI_attribute_allocatable,
	CFI_attribute_other, CFI_type_Integer, CFI_type_Logical,
	CFI_type_Real, CFI_type_Complex, CFI_type_Character,
	CFI_type_ucs4_char, CFI_type_struct, CFI_type_cptr,
	CFI_type_cfunptr, CFI_type_other): New #define.
	* trans-array.c (CFI_FIELD_BASE_ADDR, CFI_FIELD_ELEM_LEN,
	CFI_FIELD_VERSION, CFI_FIELD_RANK, CFI_FIELD_ATTRIBUTE,
	CFI_FIELD_TYPE, CFI_FIELD_DIM, CFI_DIM_FIELD_LOWER_BOUND,
	CFI_DIM_FIELD_EXTENT, CFI_DIM_FIELD_SM,
	gfc_get_cfi_descriptor_field, gfc_get_cfi_desc_base_addr,
	gfc_get_cfi_desc_elem_len, gfc_get_cfi_desc_version,
	gfc_get_cfi_desc_rank, gfc_get_cfi_desc_type,
	gfc_get_cfi_desc_attribute, gfc_get_cfi_dim_item,
	gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm):
	New define/functions to access the CFI array descriptor.
	(gfc_conv_descriptor_type): New function for the GFC descriptor.
	(gfc_get_array_span): Handle expr of CFI descriptors and
	assumed-type descriptors.
	(gfc_trans_array_bounds): Remove 'static'.
	(gfc_conv_expr_descriptor): For assumed type, use the dtype of
	the actual argument.
	(structure_alloc_comps): Remove ' ' inside tabs.
	* trans-array.h (gfc_trans_array_bounds, gfc_conv_descriptor_type,
	gfc_get_cfi_desc_base_addr, gfc_get_cfi_desc_elem_len,
	gfc_get_cfi_desc_version, gfc_get_cfi_desc_rank,
	gfc_get_cfi_desc_type, gfc_get_cfi_desc_attribute,
	gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm):
	New prototypes.
	* trans-decl.c (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi):
	Remove global vars.
	(gfc_build_builtin_function_decls): Remove their initialization.
	(gfc_get_symbol_decl, create_function_arglist,
	gfc_trans_deferred_vars): Update for CFI.
	(convert_CFI_desc): Remove and replace by ...
	(gfc_conv_cfi_to_gfc): ... this function
	(gfc_generate_function_code): Call it; create local GFC var for CFI.
	* trans-expr.c (gfc_maybe_dereference_var): Handle CFI.
	(gfc_conv_subref_array_arg): Handle the if-noncontigous-only copy in
	when the result should be a descriptor.
	(gfc_conv_gfc_desc_to_cfi_desc): Completely rewritten.
	(gfc_conv_procedure_call): CFI fixes.
	* trans-openmp.c (gfc_omp_is_optional_argument,
	gfc_omp_check_optional_argument): Handle optional
	CFI.
	* trans-stmt.c (gfc_trans_select_rank_cases): Cleanup, avoid invalid
	code for allocatable/pointer dummies, which cannot be assumed size.
	* trans-types.c (gfc_cfi_descriptor_base): New global var.
	(gfc_get_dtype_rank_type): Skip rank init for rank < 0.
	(gfc_sym_type): Handle CFI dummies.
	(gfc_get_function_type): Update call.
	(gfc_get_cfi_dim_type, gfc_get_cfi_type): New.
	* trans-types.h (gfc_sym_type): Update prototype.
	(gfc_get_cfi_type): New prototype.
	* trans.c (gfc_trans_runtime_check): Make conditions more consistent
	to avoid '<logical> AND_THEN <long int>' in conditions.
	* trans.h (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi): Remove
	global-var declaration.

libgfortran/ChangeLog:

	* ISO_Fortran_binding.h (CFI_type_cfunptr): Make unique type again.
	* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
	gfc_desc_to_cfi_desc): Add comment that those are no longer called
	by new code.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/optional-bind-c.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ISO_Fortran_binding_4.f90: Extend testcase.
	* gfortran.dg/PR100914.f90: Remove xfail.
	* gfortran.dg/PR100915.c: Expect CFI_type_cfunptr.
	* gfortran.dg/PR100915.f90: Handle CFI_type_cfunptr != CFI_type_cptr.
	* gfortran.dg/PR93963.f90: Extend select-rank tests.
	* gfortran.dg/bind-c-intent-out.f90: Change to dg-do run,
	update scan-dump.
	* gfortran.dg/bind_c_array_params_2.f90: Update/extend scan-dump.
	* gfortran.dg/bind_c_char_10.f90: Update scan-dump.
	* gfortran.dg/bind_c_char_8.f90: Remove dg-error "sorry".
	* gfortran.dg/c-interop/allocatable-dummy.f90: Remove xfail.
	* gfortran.dg/c-interop/c1255-1.f90: Likewise.
	* gfortran.dg/c-interop/c407c-1.f90: Update dg-error.
	* gfortran.dg/c-interop/cf-descriptor-5.f90: Remove xfail.
	* gfortran.dg/c-interop/cf-out-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/cf-out-descriptor-4.f90: Likewise.
	* gfortran.dg/c-interop/cf-out-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/contiguous-2.f90: Likewise.
	* gfortran.dg/c-interop/contiguous-3.f90: Likewise.
	* gfortran.dg/c-interop/deferred-character-1.f90: Likewise.
	* gfortran.dg/c-interop/deferred-character-2.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-4.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/ff-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/ff-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-7.f90: Remove xfail + extend.
	* gfortran.dg/c-interop/fc-descriptor-7-c.c: Update for changes.
	* gfortran.dg/c-interop/shape.f90: Add implicit none.
	* gfortran.dg/c-interop/typecodes-array-char-c.c: Add kind=4 char.
	* gfortran.dg/c-interop/typecodes-array-char.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-array-float128.f90: Remove xfail.
	* gfortran.dg/c-interop/typecodes-scalar-basic.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-float128.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-int128.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-longdouble.f90: Likewise.
	* gfortran.dg/iso_c_binding_char_1.f90: Remove dg-error "sorry".
	* gfortran.dg/pr93792.f90: Turn XFAIL into PASS.
	* gfortran.dg/ISO_Fortran_binding_19.f90: New test.
	* gfortran.dg/assumed_type_12.f90: New test.
	* gfortran.dg/assumed_type_13.c: New test.
	* gfortran.dg/assumed_type_13.f90: New test.
	* gfortran.dg/bind-c-char-descr.f90: New test.
	* gfortran.dg/bind-c-contiguous-1.c: New test.
	* gfortran.dg/bind-c-contiguous-1.f90: New test.
	* gfortran.dg/bind-c-contiguous-2.f90: New test.
	* gfortran.dg/bind-c-contiguous-3.c: New test.
	* gfortran.dg/bind-c-contiguous-3.f90: New test.
	* gfortran.dg/bind-c-contiguous-4.c: New test.
	* gfortran.dg/bind-c-contiguous-4.f90: New test.
	* gfortran.dg/bind-c-contiguous-5.c: New test.
	* gfortran.dg/bind-c-contiguous-5.f90: New test.
2021-10-18 10:29:30 +02:00
GCC Administrator
93d183a5ff Daily bump. 2021-10-16 00:16:27 +00:00
Jakub Jelinek
a10794eafb openmp: Improve testsuite/libgomp.c/affinity-1.c testcase
I've noticed that while I have added hopefully sufficient test coverage
for the case where one uses simple number or !number as p-interval,
I haven't added any coverage for number:len:stride or number:len.

This patch adds that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/affinity-1.c (struct places): Change name field
	type from char [50] to const char *.
	(places_array): Add a testcase for simplified syntax place followed
	by length or length and stride.
2021-10-15 17:19:54 +02:00
Jakub Jelinek
4a0fed0c0c openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntax
In addition to adding ll_caches and numa_domain abstract names
to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification:
https://github.com/OpenMP/spec/issues/2080
https://github.com/OpenMP/spec/pull/2081
in particular that in the grammar place non-terminal is now
not only { res-list } but also res (i.e. a non-negative integer),
which stands as a shortcut for { res }
So, one can specify OMP_PLACES=0,4,8,12 with the meaning
OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4
or OMP_PLACES={0},{1},{2},{3} etc.

This patch implements that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_one_place): Handle non-negative-number the same
	as { non-negative-number }.  Reject even !number:1 and
	!number:1:stride or !place:1 or !place:1:stride instead of just
	length other than 1.
	* libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions
	and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and
	omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features
	as implemented.
	* testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place
	simplified syntax.
2021-10-15 16:35:57 +02:00
Jakub Jelinek
c057ed9c52 openmp: Fix up strtoul and strtoull uses in libgomp
Yesterday when working on numa_domains, I've noticed because of a bug
in my patch a hang on a large NUMA machine.  I've fixed the bug, but
also discovered that the hang was a result of making wrong assumptions
about strtoul/strtoull.  All the uses were for portability setting
errno = 0 before the calls and treating non-zero errno after the call
as invalid input, but for the case where there are no valid digits at
all strtoul may set errno to EINVAL, but doesn't have to and with
glibc doesn't do that.  So, this patch goes through all the strtoul calls
and next to errno != 0 checks adds also endptr == startptr check.
Haven't done it in places where we immediately reject strtoul returning 0
the same as we reject errno != 0, because strtoul must return 0 in the
case where it sets endptr to the start pointer.  In some spots the code
was using errno = 0; x = strtoul (p, &p, 10); if (errno) { /*invalid*/ }
and those spots had to be changed to
errno = 0; x = strtoul (p, &end, 10); if (errno || end == p) { /*invalid*/ }
p = end;

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_schedule): For strtoul or strtoull calls which don't
	clearly reject return value 0 as invalid handle the case where end
	pointer is the same as first argument as invalid.
	(parse_unsigned_long_1): Likewise.
	(parse_one_place): Likewise.
	(parse_places_var): Likewise.
	(parse_stacksize): Likewise.
	(parse_spincount): Likewise.
	(parse_affinity): Likewise.
	(parse_gomp_openacc_dim): Likewise.  Avoid strict aliasing violation.
	Make code valid C89.
	* config/linux/affinity.c (gomp_affinity_find_last_cache_level):
	For strtoul calls which don't clearly reject return value 0 as
	invalid handle the case where end pointer is the same as first
	argument as invalid.
	(gomp_affinity_init_level_1): Likewise.
	(gomp_affinity_init_numa_domains): Likewise.
	* config/rtems/proc.c (parse_thread_pools): Likewise.
2021-10-15 16:28:34 +02:00
Jakub Jelinek
4764049dd6 openmp: Fix up handling of OMP_PLACES=threads(1)
When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core.  It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
	after creating count places clean up and return immediately.
	* testsuite/libgomp.c/places-6.c: New test.
	* testsuite/libgomp.c/places-7.c: New test.
	* testsuite/libgomp.c/places-8.c: New test.
	* testsuite/libgomp.c/places-9.c: New test.
	* testsuite/libgomp.c/places-10.c: New test.
2021-10-15 16:25:25 +02:00
Jakub Jelinek
e7ce32c783 openmp: Add support for OMP_PLACES=numa_domains
This adds support for numa_domains abstract name in OMP_PLACES, also new
in OpenMP 5.1.

Way to test this is
OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.
For non-NUMA machines it should print a single place that covers all CPUs,
for NUMA machine one place for each NUMA node with corresponding CPUs.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle numa_domains as level 5.
	* config/linux/affinity.c (gomp_affinity_init_numa_domains): New
	function.
	(gomp_affinity_init_level): Use it instead of
	gomp_affinity_init_level_1 for level == 5.
	* testsuite/libgomp.c/places-5.c: New test.
2021-10-15 12:16:50 +02:00
Jakub Jelinek
5809be05a2 openmp: Add support for OMP_PLACES=ll_caches
This patch implements support for ll_caches abstract name in OMP_PLACES,
which stands for places where logical cpus in each place share the last
level cache.

This seems to work fine for me on x86 and kernel sources show that it is
in common code, but on some machines on CompileFarm the files I'm using,
i.e.
/sys/devices/system/cpu/cpuN/cache/indexN/level
/sys/devices/system/cpu/cpuN/cache/indexN/shared_cpu_list
don't exist, is that because they have too old kernel and newer kernels
are fine or should I implement some fallback methods (which)?
E.g. on gcc112.fsffrance.org I see just shared_cpu_map and not shared_cpu_list
(with shared_cpu_map being harder to parse) and on another box I didn't even
see the cache subdirectories.

Way to test this is
OMP_PLACES=ll_caches OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle ll_caches as level 4.
	* config/linux/affinity.c (gomp_affinity_find_last_cache_level): New
	function.
	(gomp_affinity_init_level_1): Handle level 4 as logical cpus sharing
	last level cache.
	(gomp_affinity_init_level): Likewise.
	* testsuite/libgomp.c/places-1.c: New test.
	* testsuite/libgomp.c/places-2.c: New test.
	* testsuite/libgomp.c/places-3.c: New test.
	* testsuite/libgomp.c/places-4.c: New test.
2021-10-15 12:06:51 +02:00
GCC Administrator
5d5885c99c Daily bump. 2021-10-15 00:17:02 +00:00
Kwok Cheung Yeung
2c4666fb06 openmp: Mark declare variant directive in documentation as supported in Fortran
2021-10-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
	* libgomp.texi (OpenMP 5.0): Update entry for declare variant
	directive.
2021-10-14 09:35:33 -07:00
Kwok Cheung Yeung
724ee5a009 openmp, fortran: Add support for OpenMP declare variant directive in Fortran
2021-10-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c-family/

	* c-omp.c (c_omp_check_context_selector): Rename to
	omp_check_context_selector and move to omp-general.c.
	(c_omp_mark_declare_variant): Rename to omp_mark_declare_variant and
	move to omp-general.c.

gcc/c/

	* c-parser.c (c_finish_omp_declare_variant): Change call from
	c_omp_check_context_selector to omp_check_context_selector. Change
	call from c_omp_mark_declare_variant to omp_mark_declare_variant.

gcc/cp/

	* decl.c (omp_declare_variant_finalize_one): Change call from
	c_omp_mark_declare_variant to omp_mark_declare_variant.
	* parser.c (cp_finish_omp_declare_variant): Change call from
	c_omp_check_context_selector to omp_check_context_selector.

gcc/fortran/

	* gfortran.h (enum gfc_statement): Add ST_OMP_DECLARE_VARIANT.
	(enum gfc_omp_trait_property_kind): New.
	(struct gfc_omp_trait_property): New.
	(gfc_get_omp_trait_property): New macro.
	(struct gfc_omp_selector): New.
	(gfc_get_omp_selector): New macro.
	(struct gfc_omp_set_selector): New.
	(gfc_get_omp_set_selector): New macro.
	(struct gfc_omp_declare_variant): New.
	(gfc_get_omp_declare_variant): New macro.
	(struct gfc_namespace): Add omp_declare_variant field.
	(gfc_free_omp_declare_variant_list): New prototype.
	* match.h (gfc_match_omp_declare_variant): New prototype.
	* openmp.c (gfc_free_omp_trait_property_list): New.
	(gfc_free_omp_selector_list): New.
	(gfc_free_omp_set_selector_list): New.
	(gfc_free_omp_declare_variant_list): New.
	(gfc_match_omp_clauses): Add extra optional argument.  Handle end of
	clauses for context selectors.
	(omp_construct_selectors, omp_device_selectors,
	omp_implementation_selectors, omp_user_selectors): New.
	(gfc_match_omp_context_selector): New.
	(gfc_match_omp_context_selector_specification): New.
	(gfc_match_omp_declare_variant): New.
	* parse.c: Include tree-core.h and omp-general.h.
	(decode_omp_directive): Handle 'declare variant'.
	(case_omp_decl): Include ST_OMP_DECLARE_VARIANT.
	(gfc_ascii_statement): Handle ST_OMP_DECLARE_VARIANT.
	(gfc_parse_file): Initialize omp_requires_mask.
	* symbol.c (gfc_free_namespace): Call
	gfc_free_omp_declare_variant_list.
	* trans-decl.c (gfc_get_extern_function_decl): Call
	gfc_trans_omp_declare_variant.
	(gfc_create_function_decl): Call gfc_trans_omp_declare_variant.
	* trans-openmp.c (gfc_trans_omp_declare_variant): New.
	* trans-stmt.h (gfc_trans_omp_declare_variant): New prototype.

gcc/

	* omp-general.c (omp_check_context_selector):  Move from c-omp.c.
	(omp_mark_declare_variant): Move from c-omp.c.
	(omp_context_name_list_prop): Update for Fortran strings.
	* omp-general.h (omp_check_context_selector): New prototype.
	(omp_mark_declare_variant): New prototype.

gcc/testsuite/

	* gfortran.dg/gomp/declare-variant-1.f90: New test.
	* gfortran.dg/gomp/declare-variant-10.f90: New test.
	* gfortran.dg/gomp/declare-variant-11.f90: New test.
	* gfortran.dg/gomp/declare-variant-12.f90: New test.
	* gfortran.dg/gomp/declare-variant-13.f90: New test.
	* gfortran.dg/gomp/declare-variant-14.f90: New test.
	* gfortran.dg/gomp/declare-variant-15.f90: New test.
	* gfortran.dg/gomp/declare-variant-16.f90: New test.
	* gfortran.dg/gomp/declare-variant-17.f90: New test.
	* gfortran.dg/gomp/declare-variant-18.f90: New test.
	* gfortran.dg/gomp/declare-variant-19.f90: New test.
	* gfortran.dg/gomp/declare-variant-2.f90: New test.
	* gfortran.dg/gomp/declare-variant-2a.f90: New test.
	* gfortran.dg/gomp/declare-variant-3.f90: New test.
	* gfortran.dg/gomp/declare-variant-4.f90: New test.
	* gfortran.dg/gomp/declare-variant-5.f90: New test.
	* gfortran.dg/gomp/declare-variant-6.f90: New test.
	* gfortran.dg/gomp/declare-variant-7.f90: New test.
	* gfortran.dg/gomp/declare-variant-8.f90: New test.
	* gfortran.dg/gomp/declare-variant-9.f90: New test.

libgomp/

	* testsuite/libgomp.fortran/declare-variant-1.f90: New test.
2021-10-14 09:16:36 -07:00
GCC Administrator
52055987fb Daily bump. 2021-10-13 00:16:22 +00:00
Julian Brown
ccfcf08e66 libgomp: Release device lock on cbuf error path
This patch releases the device lock on a sanity-checking error path in
transfer combining (cbuf) handling in libgomp:target.c.  This shouldn't
happen when handling well-formed mapping clauses, but erroneous clauses
can currently cause a hang if the condition triggers.

2021-12-10  Julian Brown  <julian@codesourcery.com>

libgomp/
	* target.c (gomp_copy_host2dev): Release device lock on cbuf
	error path.
2021-10-12 06:50:26 -07:00
Tobias Burnus
f5a538e164 Fortran version of libgomp.c-c++-common/icv-{3,4}.c
This adds the Fortran testsuite coverage of
omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit

libgomp/
	* testsuite/libgomp.fortran/icv-3.f90: New.
	* testsuite/libgomp.fortran/icv-4.f90: New.
2021-10-12 10:54:18 +02:00
Jakub Jelinek
4096bf82a0 openmp: Add documentation for omp_{get_max, set_num}_threads and omp_{s, g}et_teams_thread_limit
This patch adds documentation for these new OpenMP 5.1 APIs as well as
two new environment variables - OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (omp_get_max_teams, omp_get_teams_thread_limit,
	omp_set_num_teams, omp_set_teams_thread_limit, OMP_NUM_TEAMS,
	OMP_TEAMS_THREAD_LIMIT): Document.
2021-10-12 09:35:43 +02:00
Jakub Jelinek
de7fa7063e openmp: Fix up warnings on libgomp.info build
When building libgomp documentation, I see
makeinfo --split-size=5000000  -I ../../../libgomp/../gcc/doc/include -I ../../../libgomp -o libgomp.info ../../../libgomp/libgomp.texi
../../../libgomp/libgomp.texi:503: warning: node next `omp_get_default_device' in menu `omp_get_device_num' and in sectioning `omp_get_dynamic' differ
../../../libgomp/libgomp.texi:528: warning: node prev `omp_get_dynamic' in menu `omp_get_device_num' and in sectioning `omp_get_default_device' differ
../../../libgomp/libgomp.texi:560: warning: node next `omp_get_initial_device' in menu `omp_get_level' and in sectioning `omp_get_device_num' differ
../../../libgomp/libgomp.texi:587: warning: node next `omp_get_device_num' in menu `omp_get_dynamic' and in sectioning `omp_get_level' differ
../../../libgomp/libgomp.texi:587: warning: node prev `omp_get_device_num' in menu `omp_get_default_device' and in sectioning `omp_get_initial_device' differ
../../../libgomp/libgomp.texi:615: warning: node prev `omp_get_level' in menu `omp_get_initial_device' and in sectioning `omp_get_device_num' differ
warnings.  This patch fixes those.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (omp_get_device_num): Move @node before omp_get_dynamic
	to avoid makeinfo warnings.
2021-10-12 09:34:38 +02:00