loop.texi: New file.
* doc/loop.texi: New file. * doc/gccint.texi: Reference loop.texi. * Makefile.in (loop.texi): Add. From-SVN: r117277
This commit is contained in:
parent
c6c621d415
commit
01e0ef5af0
@ -1,3 +1,9 @@
|
||||
2006-09-28 Zdenek Dvorak <dvorakz@suse.cz>
|
||||
|
||||
* doc/loop.texi: New file.
|
||||
* doc/gccint.texi: Reference loop.texi.
|
||||
* Makefile.in (loop.texi): Add.
|
||||
|
||||
2006-09-27 Geoffrey Keating <geoffk@apple.com>
|
||||
|
||||
* config/darwin.h (ENABLE_EXECUTE_STACK): New, use getpagesize not
|
||||
|
@ -3380,7 +3380,8 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi gcc-vers.texi \
|
||||
rtl.texi md.texi tm.texi hostconfig.texi fragments.texi \
|
||||
configfiles.texi collect2.texi headerdirs.texi funding.texi \
|
||||
gnu.texi gpl.texi fdl.texi contrib.texi languages.texi \
|
||||
sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi
|
||||
sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi \
|
||||
loop.texi
|
||||
|
||||
TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi
|
||||
|
||||
|
@ -111,6 +111,7 @@ Additional tutorial information is linked to from
|
||||
* RTL:: The intermediate representation that most passes work on.
|
||||
* Control Flow:: Maintaining and manipulating the control flow graph.
|
||||
* Tree SSA:: Analysis and optimization of the tree representation.
|
||||
* Loop Representation:: Analysis and representation of loops
|
||||
* Machine Desc:: How to write machine description instruction patterns.
|
||||
* Target Macros:: How to write the machine description C macros and functions.
|
||||
* Host Config:: Writing the @file{xm-@var{machine}.h} file.
|
||||
@ -141,6 +142,7 @@ Additional tutorial information is linked to from
|
||||
@include passes.texi
|
||||
@include c-tree.texi
|
||||
@include tree-ssa.texi
|
||||
@include loop.texi
|
||||
@include rtl.texi
|
||||
@include cfg.texi
|
||||
@include md.texi
|
||||
|
547
gcc/doc/loop.texi
Normal file
547
gcc/doc/loop.texi
Normal file
@ -0,0 +1,547 @@
|
||||
@c Copyright (c) 2006 Free Software Foundation, Inc.
|
||||
@c Free Software Foundation, Inc.
|
||||
@c This is part of the GCC manual.
|
||||
@c For copying conditions, see the file gcc.texi.
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c Loop Representation
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@node Loop Representation
|
||||
@chapter Analysis and Representation of Loops
|
||||
|
||||
GCC provides extensive infrastructure for work with natural loops, i.e.,
|
||||
strongly connected components of CFG with only one entry block. This
|
||||
chapter describes representation of loops in GCC, both on GIMPLE and in
|
||||
RTL, as well as the interfaces to loop-related analyses (induction
|
||||
variable analysis and number of iterations analysis).
|
||||
|
||||
@menu
|
||||
* Loop representation:: Representation and analysis of loops.
|
||||
* Loop querying:: Getting information about loops.
|
||||
* Loop manipulation:: Loop manipulation functions.
|
||||
* LCSSA:: Loop-closed SSA form.
|
||||
* Scalar evolutions:: Induction variables on GIMPLE.
|
||||
* loop-iv:: Induction variables on RTL.
|
||||
* Number of iterations:: Number of iterations analysis.
|
||||
* Dependency analysis:: Data dependency analysis.
|
||||
* Lambda:: Linear loop transformations framework.
|
||||
@end menu
|
||||
|
||||
@node Loop representation
|
||||
@section Loop representation
|
||||
@cindex Loop representation
|
||||
@cindex Loop analysis
|
||||
|
||||
This chapter describes the representation of loops in GCC, and functions
|
||||
that can be used to build, modify and analyze this representation. Most
|
||||
of the interfaces and data structures are declared in @file{cfgloop.h}.
|
||||
At the moment, loop structures are analyzed and this information is
|
||||
updated only by the optimization passes that deal with loops, but some
|
||||
efforts are being made to make it available throughout most of the
|
||||
optimization passes.
|
||||
|
||||
In general, a natural loop has one entry block (header) and possibly
|
||||
several back edges (latches) leading to the header from the inside of
|
||||
the loop. Loops with several latches may appear if several loops share
|
||||
a single header, or if there is a branching in the middle of the loop.
|
||||
The representation of loops in GCC however allows only loops with a
|
||||
single latch. During loop analysis, headers of such loops are split and
|
||||
forwarder blocks are created in order to disambiguate their structures.
|
||||
A heuristic based on profile information is used to determine whether
|
||||
the latches correspond to sub-loops or to control flow in a single loop.
|
||||
This means that the analysis sometimes changes the CFG, and if you run
|
||||
it in the middle of an optimization pass, you must be able to deal with
|
||||
the new blocks.
|
||||
|
||||
Body of the loop is the set of blocks that are dominated by its header,
|
||||
and reachable from its latch against the direction of edges in CFG. The
|
||||
loops are organized in a containment hierarchy (tree) such that all the
|
||||
loops immediately contained inside loop L are the children of L in the
|
||||
tree. This tree is represented by the @code{struct loops} structure.
|
||||
The root of this tree is a fake loop that contains all blocks in the
|
||||
function. Each of the loops is represented in a @code{struct loop}
|
||||
structure. Each loop is assigned an index (@code{num} field of the
|
||||
@code{struct loop} structure), and the pointer to the loop is stored in
|
||||
the corresponding field of the @code{parray} field of the loops
|
||||
structure. Index of a sub-loop is always greater than the index of its
|
||||
super-loop. The indices do not have to be continuous, there may be
|
||||
empty (@code{NULL}) entries in the @code{parray} created by deleting
|
||||
loops. The index of a loop never changes. The first unused index is
|
||||
stored in the @code{num} field of the loops structure.
|
||||
|
||||
Each basic block contains the reference to the innermost loop it belongs
|
||||
to (@code{loop_father}). For this reason, it is only possible to have
|
||||
one @code{struct loops} structure initialized at the same time for each
|
||||
CFG. It is recommended to use the global variable @code{current_loops}
|
||||
to contain the @code{struct loops} structure, especially if the loop
|
||||
structures are updated throughout several passes. Many of the loop
|
||||
manipulation functions assume that dominance information is up-to-date.
|
||||
|
||||
The loops are analyzed through @code{loop_optimizer_init} function. The
|
||||
argument of this function is a set of flags represented in an integer
|
||||
bitmask. These flags specify what other properties of the loop
|
||||
structures should be calculated/enforced and preserved later:
|
||||
|
||||
@itemize
|
||||
@item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
|
||||
a way that each loop has only one entry edge, and additionally, the
|
||||
source block of this entry edge has only one successor. This creates a
|
||||
natural place where the code can be moved out of the loop, and ensures
|
||||
that the entry edge of the loop leads from its immediate super-loop.
|
||||
@item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
|
||||
force the latch block of each loop to have only one successor. This
|
||||
ensures that the latch of the loop does not belong to any of its
|
||||
sub-loops, and makes manipulation with the loops significantly easier.
|
||||
Most of the loop manipulation functions assume that the loops are in
|
||||
this shape. Note that with this flag, the ``normal'' loop without any
|
||||
control flow inside and with one exit consists of two basic blocks.
|
||||
@item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
|
||||
edges in the strongly connected components that are not natural loops
|
||||
(have more than one entry block) are marked with
|
||||
@code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags. The
|
||||
flag is not set for blocks and edges that belong to natural loops that
|
||||
are in such an irreducible region (but it is set for the entry and exit
|
||||
edges of such a loop, if they lead to/from this region).
|
||||
@item @code{LOOPS_HAVE_MARKED_SINGLE_EXITS}: If a loop has exactly one
|
||||
exit edge, this edge is stored in @code{single_exit} field of the loop
|
||||
structure. @code{NULL} is stored there otherwise.
|
||||
@end itemize
|
||||
|
||||
These properties may also be computed/enforced later, using functions
|
||||
@code{create_preheaders}, @code{force_single_succ_latches},
|
||||
@code{mark_irreducible_loops} and @code{mark_single_exit_loops}.
|
||||
|
||||
The memory occupied by the loops structures should be freed with
|
||||
@code{loop_optimizer_finalize} function.
|
||||
|
||||
The CFG manipulation functions in general do not update loop structures.
|
||||
Specialized versions that additionally do so are provided for the most
|
||||
common tasks. On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
|
||||
used to cleanup CFG while updating the loops structures if
|
||||
@code{current_loops} is set.
|
||||
|
||||
@node Loop querying
|
||||
@section Loop querying
|
||||
@cindex Loop querying
|
||||
|
||||
The functions to query the information about loops are declared in
|
||||
@file{cfgloop.h}. Some of the information can be taken directly from
|
||||
the structures. @code{loop_father} field of each basic block contains
|
||||
the innermost loop to that the block belongs. The most useful fields of
|
||||
loop structure (that are kept up-to-date at all times) are:
|
||||
|
||||
@itemize
|
||||
@item @code{header}, @code{latch}: Header and latch basic blocks of the
|
||||
loop.
|
||||
@item @code{num_nodes}: Number of basic blocks in the loop (including
|
||||
the basic blocks of the sub-loops).
|
||||
@item @code{depth}: The depth of the loop in the loops tree, i.e., the
|
||||
number of super-loops of the loop.
|
||||
@item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
|
||||
sub-loop, and the sibling of the loop in the loops tree.
|
||||
@item @code{single_exit}: The exit edge of the loop, if the loop has
|
||||
exactly one exit and the loops were analyzed with
|
||||
LOOPS_HAVE_MARKED_SINGLE_EXITS.
|
||||
@end itemize
|
||||
|
||||
There are other fields in the loop structures, many of them used only by
|
||||
some of the passes, or not updated during CFG changes; in general, they
|
||||
should not be accessed directly.
|
||||
|
||||
The most important functions to query loop structures are:
|
||||
|
||||
@itemize
|
||||
@item @code{flow_loops_dump}: Dumps the information about loops to a
|
||||
file.
|
||||
@item @code{verify_loop_structure}: Checks consistency of the loop
|
||||
structures.
|
||||
@item @code{loop_latch_edge}: Returns the latch edge of a loop.
|
||||
@item @code{loop_preheader_edge}: If loops have preheaders, returns
|
||||
the preheader edge of a loop.
|
||||
@item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
|
||||
another loop.
|
||||
@item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
|
||||
to a loop (including its sub-loops).
|
||||
@item @code{find_common_loop}: Finds the common super-loop of two loops.
|
||||
@item @code{superloop_at_depth}: Returns the super-loop of a loop with
|
||||
the given depth.
|
||||
@item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
|
||||
number of insns in the loop, on GIMPLE and on RTL.
|
||||
@item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
|
||||
loop.
|
||||
@item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
|
||||
with @code{EDGE_LOOP_EXIT} flag.
|
||||
@item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
|
||||
@code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
|
||||
loop in depth-first search order in reversed CFG, ordered by dominance
|
||||
relation, and breath-first search order, respectively.
|
||||
@item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
|
||||
@item @code{just_once_each_iteration_p}: Returns true if the basic block
|
||||
is executed exactly once during each iteration of a loop (that is, it
|
||||
does not belong to a sub-loop, and it dominates the latch of the loop).
|
||||
@end itemize
|
||||
|
||||
@node Loop manipulation
|
||||
@section Loop manipulation
|
||||
@cindex Loop manipulation
|
||||
|
||||
The loops tree can be manipulated using the following functions:
|
||||
|
||||
@itemize
|
||||
@item @code{flow_loop_tree_node_add}: Adds a node to the tree.
|
||||
@item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
|
||||
@item @code{add_bb_to_loop}: Adds a basic block to a loop.
|
||||
@item @code{remove_bb_from_loops}: Removes a basic block from loops.
|
||||
@end itemize
|
||||
|
||||
The specialized versions of several low-level CFG functions that also
|
||||
update loop structures are provided:
|
||||
|
||||
@itemize
|
||||
@item @code{loop_split_edge_with}: Splits an edge, and places a
|
||||
specified RTL code on it. On GIMPLE, the function can still be used,
|
||||
but the code must be NULL.
|
||||
@item @code{bsi_insert_on_edge_immediate_loop}: Inserts code on edge,
|
||||
splitting it if necessary. Only works on GIMPLE.
|
||||
@item @code{remove_path}: Removes an edge and all blocks it dominates.
|
||||
@item @code{loop_commit_inserts}: Commits insertions scheduled on edges,
|
||||
and sets loops for the new blocks. This function can only be used on
|
||||
GIMPLE.
|
||||
@item @code{split_loop_exit_edge}: Splits exit edge of the loop,
|
||||
ensuring that PHI node arguments remain in the loop (this ensures that
|
||||
loop-closed SSA form is preserved). Only useful on GIMPLE.
|
||||
@end itemize
|
||||
|
||||
Finally, there are some higher-level loop transformations implemented.
|
||||
While some of them are written so that they should work on non-innermost
|
||||
loops, they are mostly untested in that case, and at the moment, they
|
||||
are only reliable for the innermost loops:
|
||||
|
||||
@itemize
|
||||
@item @code{create_iv}: Creates a new induction variable. Only works on
|
||||
GIMPLE. @code{standard_iv_increment_position} can be used to find a
|
||||
suitable place for the iv increment.
|
||||
@item @code{duplicate_loop_to_header_edge},
|
||||
@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
|
||||
on GIMPLE) duplicate the body of the loop prescribed number of times on
|
||||
one of the edges entering loop header, thus performing either loop
|
||||
unrolling or loop peeling. @code{can_duplicate_loop_p}
|
||||
(@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
|
||||
loop.
|
||||
@item @code{loop_version}, @code{tree_ssa_loop_version}: These function
|
||||
create a copy of a loop, and a branch before them that selects one of
|
||||
them depending on the prescribed condition. This is useful for
|
||||
optimizations that need to verify some assumptions in runtime (one of
|
||||
the copies of the loop is usually left unchanged, while the other one is
|
||||
transformed in some way).
|
||||
@item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
|
||||
extra iterations to make the number of iterations divisible by unroll
|
||||
factor, updating the exit condition, and removing the exits that now
|
||||
cannot be taken. Works only on GIMPLE.
|
||||
@end itemize
|
||||
|
||||
@node LCSSA
|
||||
@section Loop-closed SSA form
|
||||
@cindex LCSSA
|
||||
@cindex Loop-closed SSA form
|
||||
|
||||
Throughout the loop optimizations on tree level, one extra condition is
|
||||
enforced on the SSA form: No SSA name is used outside of the loop in
|
||||
that it is defined. The SSA form satisfying this condition is called
|
||||
``loop-closed SSA form'' -- LCSSA. To enforce LCSSA, PHI nodes must be
|
||||
created at the exits of the loops for the SSA names that are used
|
||||
outside of them. Only the real operands (not virtual SSA names) are
|
||||
held in LCSSA, in order to save memory.
|
||||
|
||||
There are various benefits of LCSSA:
|
||||
|
||||
@itemize
|
||||
@item Many optimizations (value range analysis, final value
|
||||
replacement) are interested in the values that are defined in the loop
|
||||
and used outside of it, i.e., exactly those for that we create new PHI
|
||||
nodes.
|
||||
@item In induction variable analysis, it is not necessary to specify the
|
||||
loop in that the analysis should be performed -- the scalar evolution
|
||||
analysis always returns the results with respect to the loop in that the
|
||||
SSA name is defined.
|
||||
@item It makes updating of SSA form during loop transformations simpler.
|
||||
Without LCSSA, operations like loop unrolling may force creation of PHI
|
||||
nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
|
||||
updated locally. However, since we only keep real operands in LCSSA, we
|
||||
cannot use this advantage (we could have local updating of real
|
||||
operands, but it is not much more efficient than to use generic SSA form
|
||||
updating for it as well; the amount of changes to SSA is the same).
|
||||
@end itemize
|
||||
|
||||
However, it also means LCSSA must be updated. This is usually
|
||||
straightforward, unless you create a new value in loop and use it
|
||||
outside, or unless you manipulate loop exit edges (functions are
|
||||
provided to make these manipulations simple).
|
||||
@code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
|
||||
LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
|
||||
LCSSA is preserved.
|
||||
|
||||
@node Scalar evolutions
|
||||
@section Scalar evolutions
|
||||
@cindex Scalar evolutions
|
||||
@cindex IV analysis on GIMPLE
|
||||
|
||||
Scalar evolutions (SCEV) are used to represent results of induction
|
||||
variable analysis on GIMPLE. They enable us to represent variables with
|
||||
complicated behavior in a simple and consistent way (we only use it to
|
||||
express values of polynomial induction variables, but it is possible to
|
||||
extend it). The interfaces to SCEV analysis are declared in
|
||||
@file{tree-scalar-evolution.h}. To use scalar evolutions analysis,
|
||||
@code{scev_initialize} must be used. To stop using SCEV,
|
||||
@code{scev_finalize} should be used. SCEV analysis caches results in
|
||||
order to save time and memory. This cache however is made invalid by
|
||||
most of the loop transformations, including removal of code. If such a
|
||||
transformation is performed, @code{scev_reset} must be called to clean
|
||||
the caches.
|
||||
|
||||
Given an SSA name, its behavior in loops can be analyzed using the
|
||||
@code{analyze_scalar_evolution} function. The returned SCEV however
|
||||
does not have to be fully analyzed and it may contain references to
|
||||
other SSA names defined in the loop. To resolve these (potentially
|
||||
recursive) references, @code{instantiate_parameters} or
|
||||
@code{resolve_mixers} functions must be used.
|
||||
@code{instantiate_parameters} is useful when you use the results of SCEV
|
||||
only for some analysis, and when you work with whole nest of loops at
|
||||
once. It will try replacing all SSA names by their SCEV in all loops,
|
||||
including the super-loops of the current loop, thus providing a complete
|
||||
information about the behavior of the variable in the loop nest.
|
||||
@code{resolve_mixers} is useful if you work with only one loop at a
|
||||
time, and if you possibly need to create code based on the value of the
|
||||
induction variable. It will only resolve the SSA names defined in the
|
||||
current loop, leaving the SSA names defined outside unchanged, even if
|
||||
their evolution in the outer loops is known.
|
||||
|
||||
The SCEV is a normal tree expression, except for the fact that it may
|
||||
contain several special tree nodes. One of them is
|
||||
@code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
|
||||
expressed. The other one is @code{POLYNOMIAL_CHREC}. Polynomial chrec
|
||||
has three arguments -- base, step and loop (both base and step may
|
||||
contain further polynomial chrecs). Type of the expression and of base
|
||||
and step must be the same. A variable has evolution
|
||||
@code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
|
||||
loop) equivalent to @code{x_1} in the following example
|
||||
|
||||
@smallexample
|
||||
while (...)
|
||||
@{
|
||||
x_1 = phi (base, x_2);
|
||||
x_2 = x_1 + step;
|
||||
@}
|
||||
@end smallexample
|
||||
|
||||
Note that this includes the language restrictions on the operations.
|
||||
For example, if we compile C code and @code{x} has signed type, then the
|
||||
overflow in addition would cause undefined behavior, and we may assume
|
||||
that this does not happen. Hence, the value with this SCEV cannot
|
||||
overflow (which restricts the number of iterations of such a loop).
|
||||
|
||||
In many cases, one wants to restrict the attention just to affine
|
||||
induction variables. In this case, the extra expressive power of SCEV
|
||||
is not useful, and may complicate the optimizations. In this case,
|
||||
@code{simple_iv} function may be used to analyze a value -- the result
|
||||
is a loop-invariant base and step.
|
||||
|
||||
@node loop-iv
|
||||
@section IV analysis on RTL
|
||||
@cindex IV analysis on RTL
|
||||
|
||||
The induction variable on RTL is simple and only allows analysis of
|
||||
affine induction variables, and only in one loop at once. The interface
|
||||
is declared in @file{cfgloop.h}. Before analyzing induction variables
|
||||
in a loop L, @code{iv_analysis_loop_init} function must be called on L.
|
||||
After the analysis (possibly calling @code{iv_analysis_loop_init} for
|
||||
several loops) is finished, @code{iv_analysis_done} should be called.
|
||||
The following functions can be used to access the results of the
|
||||
analysis:
|
||||
|
||||
@itemize
|
||||
@item @code{iv_analyze}: Analyzes a single register used in the given
|
||||
insn. If no use of the register in this insn is found, the following
|
||||
insns are scanned, so that this function can be called on the insn
|
||||
returned by get_condition.
|
||||
@item @code{iv_analyze_result}: Analyzes result of the assignment in the
|
||||
given insn.
|
||||
@item @code{iv_analyze_expr}: Analyzes a more complicated expression.
|
||||
All its operands are analyzed by @code{iv_analyze}, and hence they must
|
||||
be used in the specified insn or one of the following insns.
|
||||
@end itemize
|
||||
|
||||
The description of the induction variable is provided in @code{struct
|
||||
rtx_iv}. In order to handle subregs, the representation is a bit
|
||||
complicated; if the value of the @code{extend} field is not
|
||||
@code{UNKNOWN}, the value of the induction variable in the i-th
|
||||
iteration is
|
||||
|
||||
@smallexample
|
||||
delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
|
||||
@end smallexample
|
||||
|
||||
with the following exception: if @code{first_special} is true, then the
|
||||
value in the first iteration (when @code{i} is zero) is @code{delta +
|
||||
mult * base}. However, if @code{extend} is equal to @code{UNKNOWN},
|
||||
then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
|
||||
and the value in the i-th iteration is
|
||||
|
||||
@smallexample
|
||||
subreg_@{mode@} (base + i * step)
|
||||
@end smallexample
|
||||
|
||||
The function @code{get_iv_value} can be used to perform these
|
||||
calculations.
|
||||
|
||||
@node Number of iterations
|
||||
@section Number of iterations analysis
|
||||
@cindex Number of iterations analysis
|
||||
|
||||
Both on GIMPLE and on RTL, there are functions available to determine
|
||||
the number of iterations of a loop, with a similar interface. In many
|
||||
cases, it is not possible to determine number of iterations
|
||||
unconditionally -- the determined number is correct only if some
|
||||
assumptions are satisfied. The analysis tries to verify these
|
||||
conditions using the information contained in the program; if it fails,
|
||||
the conditions are returned together with the result. The following
|
||||
information and conditions are provided by the analysis:
|
||||
|
||||
@itemize
|
||||
@item @code{assumptions}: If this condition is false, the rest of
|
||||
the information is invalid.
|
||||
@item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
|
||||
this condition is true, the loop exits in the first iteration.
|
||||
@item @code{infinite}: If this condition is true, the loop is infinite.
|
||||
This condition is only available on RTL. On GIMPLE, conditions for
|
||||
finiteness of the loop are included in @code{assumptions}.
|
||||
@item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
|
||||
that gives number of iterations. The number of iterations is defined as
|
||||
the number of executions of the loop latch.
|
||||
@end itemize
|
||||
|
||||
Both on GIMPLE and on RTL, it necessary for the induction variable
|
||||
analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
|
||||
On GIMPLE, the results are stored to @code{struct tree_niter_desc}
|
||||
structure. Number of iterations before the loop is exited through a
|
||||
given exit can be determined using @code{number_of_iterations_exit}
|
||||
function. On RTL, the results are returned in @code{struct niter_desc}
|
||||
structure. The corresponding function is named
|
||||
@code{check_simple_exit}. There are also functions that pass through
|
||||
all the exits of a loop and try to find one with easy to determine
|
||||
number of iterations -- @code{find_loop_niter} on GIMPLE and
|
||||
@code{find_simple_exit} on RTL. Finally, there are functions that
|
||||
provide the same information, but additionally cache it, so that
|
||||
repeated calls to number of iterations are not so costly --
|
||||
@code{number_of_iterations_in_loop} on GIMPLE and
|
||||
@code{get_simple_loop_desc} on RTL.
|
||||
|
||||
Note that some of these functions may behave slightly differently than
|
||||
others -- some of them return only the expression for the number of
|
||||
iterations, and fail if there are some assumptions. The function
|
||||
@code{number_of_iterations_in_loop} works only for single-exit loops,
|
||||
and it returns the value for number of iterations higher by one with
|
||||
respect to all other functions (i.e., it returns number of executions of
|
||||
the exit statement, not of the loop latch).
|
||||
|
||||
@node Dependency analysis
|
||||
@section Data Dependency Analysis
|
||||
@cindex Data Dependency Analysis
|
||||
|
||||
The code for the data dependence analysis can be found in
|
||||
@file{tree-data-ref.c} and its interface and data structures are
|
||||
described in @file{tree-data-ref.h}. The function that computes the
|
||||
data dependences for all the array and pointer references for a given
|
||||
loop is @code{compute_data_dependences_for_loop}. This function is
|
||||
currently used by the linear loop transform and the vectorization
|
||||
passes. Before calling this function, one has to allocate two vectors:
|
||||
a first vector will contain the set of data references that are
|
||||
contained in the analyzed loop body, and the second vector will contain
|
||||
the dependence relations between the data references. Thus if the
|
||||
vector of data references is of size @code{n}, the vector containing the
|
||||
dependence relations will contain @code{n*n} elements. However if the
|
||||
analyzed loop contains side effects, such as calls that potentially can
|
||||
interfere with the data references in the current analyzed loop, the
|
||||
analysis stops while scanning the loop body for data references, and
|
||||
inserts a single @code{chrec_dont_know} in the dependence relation
|
||||
array.
|
||||
|
||||
The data references are discovered in a particular order during the
|
||||
scanning of the loop body: the loop body is analyzed in execution order,
|
||||
and the data references of each statement are pushed at the end of the
|
||||
data reference array. Two data references syntactically occur in the
|
||||
program in the same order as in the array of data references. This
|
||||
syntactic order is important in some classical data dependence tests,
|
||||
and mapping this order to the elements of this array avoids costly
|
||||
queries to the loop body representation.
|
||||
|
||||
The structure describing the relation between two data references is
|
||||
@code{data_dependence_relation} and the shorter name for a pointer to
|
||||
such a structure is @code{ddr_p}. This structure contains:
|
||||
|
||||
@itemize
|
||||
@item a pointer to each data reference,
|
||||
@item a tree node @code{are_dependent} that is set to @code{chrec_known}
|
||||
if the analysis has proved that there is no dependence between these two
|
||||
data references, @code{chrec_dont_know} if the analysis was not able to
|
||||
determine any useful result and potentially there could exist a
|
||||
dependence between these data references, and @code{are_dependent} is
|
||||
set to @code{NULL_TREE} if there exist a dependence relation between the
|
||||
data references, and the description of this dependence relation is
|
||||
given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
|
||||
arrays,
|
||||
@item a boolean that determines whether the dependence relation can be
|
||||
represented by a classical distance vector,
|
||||
@item an array @code{subscripts} that contains a description of each
|
||||
subscript of the data references. Given two array accesses a
|
||||
subscript is the tuple composed of the access functions for a given
|
||||
dimension. For example, given @code{A[f1][f2][f3]} and
|
||||
@code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
|
||||
g2), (f3, g3)}.
|
||||
@item two arrays @code{dir_vects} and @code{dist_vects} that contain
|
||||
classical representations of the data dependences under the form of
|
||||
direction and distance dependence vectors,
|
||||
@item an array of loops @code{loop_nest} that contains the loops to
|
||||
which the distance and direction vectors refer to.
|
||||
@end itemize
|
||||
|
||||
Several functions for pretty printing the information extracted by the
|
||||
data dependence analysis are available: @code{dump_ddrs} prints with a
|
||||
maximum verbosity the details of a data dependence relations array,
|
||||
@code{dump_dist_dir_vectors} prints only the classical distance and
|
||||
direction vectors for a data dependence relations array, and
|
||||
@code{dump_data_references} prints the details of the data references
|
||||
contained in a data reference array.
|
||||
|
||||
@node Lambda
|
||||
@section Linear loop transformations framework
|
||||
@cindex Linear loop transformations framework
|
||||
|
||||
Lambda is a framework that allows transformations of loops using
|
||||
non-singular matrix based transformations of the iteration space and
|
||||
loop bounds. This allows compositions of skewing, scaling, interchange,
|
||||
and reversal transformations. These transformations are often used to
|
||||
improve cache behavior or remove inner loop dependencies to allow
|
||||
parallelization and vectorization to take place.
|
||||
|
||||
To perform these transformations, Lambda requires that the loopnest be
|
||||
converted into a internal form that can be matrix transformed easily.
|
||||
To do this conversion, the function
|
||||
@code{gcc_loopnest_to_lambda_loopnest} is provided. If the loop cannot
|
||||
be transformed using lambda, this function will return NULL.
|
||||
|
||||
Once a @code{lambda_loopnest} is obtained from the conversion function,
|
||||
it can be transformed by using @code{lambda_loopnest_transform}, which
|
||||
takes a transformation matrix to apply. Note that it is up to the
|
||||
caller to verify that the transformation matrix is legal to apply to the
|
||||
loop (dependence respecting, etc). Lambda simply applies whatever
|
||||
matrix it is told to provide. It can be extended to make legal matrices
|
||||
out of any non-singular matrix, but this is not currently implemented.
|
||||
Legality of a matrix for a given loopnest can be verified using
|
||||
@code{lambda_transform_legal_p}.
|
||||
|
||||
Given a transformed loopnest, conversion back into gcc IR is done by
|
||||
@code{lambda_loopnest_to_gcc_loopnest}. This function will modify the
|
||||
loops so that they match the transformed loopnest.
|
||||
|
Loading…
Reference in New Issue
Block a user