This chapter provides a description of the task classes which are used during the build phase.
=== Task execution
==== Main actors
The build context is only used to create the tasks and to return lists of tasks that may be executed in parallel. The scheduling is delegated to a task producer which lets task consumers to execute the tasks. The task producer keeps a record of the build state such as the amount of tasks processed or the errors.
image::tasks_actors{PIC}["Actors processing the tasks"{backend@docbook:,width=150:},align="center"]
// To reduce the build time, it is interesting to take advantage of the hardware (multiple cpu cores) or of the environment (distributed builds).
The amount of consumers is determined from the number of processors, or may be set manually by using the '-j' option:
[source,shishell]
------------------
$ waf -j3
------------------
==== Build groups
The task producer iterates over lists of tasks returned by the build context. Although the tasks from a list may be executed in parallel by the consumer threads, all the tasks from one list must be consumed before processing another list of tasks. The build ends when there are no more tasks to process.
These lists of tasks are called _build groups_ and may be accessed from the build scripts. Let's demonstrate this behaviour on an example:
// tasks_groups
[source,python]
---------------
def build(ctx):
for i in range(8):
ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_a_%d' % i,
color='YELLOW', name='tasks a')
ctx(rule='cp ${SRC} ${TGT}', source='wscript_a_%d' % i, target='wscript_b_%d' % i,
color='GREEN', name='tasks b')
for i in range(8)
ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_c_%d' % i,
color='BLUE', name='tasks c')
ctx(rule='cp ${SRC} ${TGT}', source='wscript_c_%d' % i, target='wscript_d_%d' % i,
color='PINK', name='tasks d')
---------------
Each green task must be executed after one yellow task and each pink task must be executed after one blue task. Because there is only one group by default, the parallel execution will be similar to the following:
<1> All task generators create their tasks before the build starts (default behaviour)
<2> Groups are processed sequentially: all tasks from previous groups are executed before the task generators from the next group are processed
<3> Combination of the two previous behaviours: task generators created by tasks in the next groups may create tasks
Build groups can be used for <<build_compiler_first,building a compiler to generate more source files>> to process.
==== The Producer-consumer system
In most python interpreters, a global interpreter lock prevents parallelization by more than one cpu core at a time. Therefore, it makes sense to restrict the task scheduling on a single task producer, and to let the threads access only the task execution.
The communication between producer and consumers is based on two queues _ready_ and _out_. The producer adds the tasks to _ready_ and reads them back from _out_. The consumers obtain the tasks from _ready_ and give them back to the producer into _out_ after executing 'task.run'.
The producer uses the an internal list named _outstanding_ to iterate over the tasks and to decide which ones to put in the queue _ready_. The tasks that cannot be processed are temporarily output in the list _frozen_ to avoid looping endlessly over the tasks waiting for others.
The following illustrates the relationship between the task producers and consumers as performed during the build:
A state is assigned to each task (_task.hasrun = state_) to keep track of the execution. The possible values are the following:
[options="header", cols="1,1,6"]
|=================
|State | Numeric value | Description
|NOT_RUN | 0 | The task has not been processed yet
|MISSING | 1 | The task outputs are missing
|CRASHED | 2 | The task method 'run' returned a non-0 value
|EXCEPTION| 3 | An exception occured in the Task method 'run'
|SKIPPED | 8 | The task was skipped (it was up-to-date)
|SUCCESS | 9 | The execution was successful
|=================
To decide to execute a task or not, the producer uses the value returned by the task method 'runnable_status'. The possible return values are the following:
[options="header", cols="1,6"]
|=================
|Code | Description
| ASK_LATER | The task may depend on other tasks which have not finished to run (not ready)
| SKIP_ME | The task does not have to be executed, it is up-to-date
| RUN_ME | The task is ready to be executed
|=================
The following diagram represents the interaction between the main task methods and the states and status:
The method _set_run_after_ is used to declare ordering constraints between tasks:
[source,python]
---------------
task1.set_run_after(task2)
---------------
The tasks to wait for are stored in the attribute _run_after_. They are used by the method _runnable_status_ to yield the status 'ASK_LATER' when a task has not run yet. This is merely for the build order and not for forcing a rebuild if one of the previous tasks is executed.
==== Computed constraints
===== Attribute after/before
The attributes _before_ and _after_ are used to declare ordering constraints between tasks:
[source,python]
---------------
from waflib.Task import TaskBase
class task_test_a(TaskBase):
before = ['task_test_b']
class task_test_b(TaskBase):
after = ['task_test_a']
---------------
===== ext_in/ext_out
Another way to force the order is by declaring lists of abstract symbols on the class attributes. This way the classes are not named explicitly, for example:
[source,python]
---------------
from waflib.Task import TaskBase
class task_test_a(TaskBase):
ext_in = ['.h']
class task_test_b(TaskBase):
ext_out = ['.h']
---------------
The 'extensions' ext_in and ext_out do not mean that the tasks have to produce files with such extensions, but are mere symbols for use as precedence constraints.
===== Order extraction
Before feeding the tasks to the producer-consumer system, a constraint extraction is performed on the tasks having input and output files. The attributes _run_after_ are initialized with the tasks to wait for.
The two functions called on lists of tasks are:
. _waflib.Task.set_precedence_constraints_: extract the build order from the task classes attributes ext_in/ext_out/before/after
. _waflib.Task.set_file_constraints_: extract the constraints from the tasks having input and output files
==== Weak order constraints
Tasks that are known to take a lot of time may be launched first to improve the build times. The general problem of finding an optimal order for launching tasks in parallel and with constraints is called http://en.wikipedia.org/wiki/Job-shop_problem[Job Shop]. In practice this problem can often be reduced to a critical path problem (approximation).
The following pictures illustrate the difference in scheduling a build with different independent tasks, in which a slow task is clearly identified, and launched first:
A function is used to reorder the tasks from a group before they are passed to the producer. We will replace it to reorder the long task in first position:
The direct instances of 'waflib.Task.TaskBase' are very limited and cannot be used to track file changes. The subclass 'waflib.Task.Task' provides the necessary features for the most common builds in which source files are used to produce target files.
The dependency tracking is based on the use of hashes of the dependencies called *task signatures*. The signature is computed from various dependencies source, such as input files and configuration set values.
The following diagram describes how 'waflib.Task.Task' instances are processed:
The following data is used in the signature computation:
. Explicit dependencies: _input nodes_ and dependencies set explicitly by using _bld.depends_on_
. Implicit dependencies: dependencies searched by a scanner method (the method _scan_)
. Values: configuration set values such as compilation flags
==== Explicit dependencies
===== Input and output nodes
The task objects do not directly depend on other tasks. Other tasks may exist or not, and be executed or nodes. Rather, the input and output nodes hold themselves signatures values, which come from different sources:
. Nodes for build files usually inherit the signature of the task that generated the file
. Nodes from elsewhere have a signature computed automatically from the file contents (hash)
===== Global dependencies on other nodes
The tasks may be informed that some files may depend on other files transitively without listing them in the inputs. This is achieved by the method _add_manual_dependency_ from the build context:
Some tasks can be created dynamically after the build has started, so the dependencies cannot be known in advance. Task subclasses can provide a method named _scan_ to obtain additional nodes implicitly. In the following example, the _copy_ task provides a scanner method to depend on the wscript file found next to the input file.
<2> The return value is a tuple containing a list of nodes to depend on and serializable data for custom uses
<3> Override the method runnable_status to add some logging
<4> Obtain a reference to the build context associated to this task
<5> The nodes returned by the scanner method are stored in the map *bld.node_deps*
<6> The custom data returned by the scanner method is stored in the map *bld.raw_deps*
<7> Create a task manually (encapsulation by task generators will be described in the next chapters)
[source,shishell]
---------------
$ waf
→ calling the scanner method <1>
nodes: [/tmp/tasks_scan/wscript]
custom data: 55.51
[1/1] copy: a.in -> build/b.out
'build' finished successfully (0.021s)
$ waf <2>
nodes: [/tmp/tasks_scan/wscript]
custom data: 1280561555.512006
'build' finished successfully (0.005s)
$ echo " " >> wscript <3>
$ waf
→ calling the scanner method
nodes: [/tmp/tasks_scan/wscript]
custom data: 64.31
[1/1] copy: a.in -> build/b.out
'build' finished successfully (0.022s)
---------------
<1> The scanner method is always called on a clean build
<2> The scanner method is not called when nothing has changed, although the data returned is retrieved
<3> When a dependency changes, the scanner method is executed once again (the custom data has changed)
WARNING: If the build order is incorrect, the method _scan_ may fail to find dependent nodes (missing nodes) or the signature calculation may throw an exception (missing signature for dependent nodes).
==== Values
The habitual use of command-line parameters such as compilation flags lead to the creation of _dependencies on values_, and more specifically the configuration set values. The Task class attribute 'vars' is used to control what values can enter in the signature calculation. In the following example, the task created has no inputs and no outputs nodes, and only depends on the values.
It is assumed that 'run_str' represents a command-line, and that the variables in _$\{}_ such as 'COPYFLAGS' represent variables to add to the dependencies. A metaclass processes 'run_str' to obtain the method 'run' (called to execute the task) and the variables in the attribute 'vars' (merged with existing variables). The function created is displayed in the following output:
[source,shishell]
---------------
$ waf --zones=action
13:36:49 action def f(tsk):
env = tsk.env
gen = tsk.generator
bld = gen.bld
wd = getattr(tsk, 'cwd', None)
def to_list(xx):
if isinstance(xx, str): return [xx]
return xx
lst = []
lst.extend(to_list(env['COPY']))
lst.extend(to_list(env['COPYFLAGS']))
lst.extend([a.path_from(bld.bldnode) for a in tsk.inputs])
lst.extend([a.path_from(bld.bldnode) for a in tsk.outputs])
lst = [x for x in lst if x]
return tsk.exec_command(lst, cwd=wd, env=env.env or None)
[1/1] copy: wscript -> build/b.out
['COPY', 'COPYFLAGS']
'build' finished successfully (0.007s)
---------------
All subclasses of 'waflib.Task.TaskBase' are stored on the module attribute 'waflib.Task.classes'. Therefore, the 'copy' task can be accessed by using:
[source,python]
---------------
from waflib import Task
cls = Task.classes['copy']
---------------
==== Scriptlet expressions
Although the 'run_str' is aimed at configuration set variables, a few special cases are provided for convenience:
. If the value starts by *env*, *gen*, *bld* or *tsk*, a method call will be made
. If the value starts by SRC[n] or TGT[n], a method call to the input/output node _n_ will be made
. SRC represents the list of task inputs seen from the root of the build directory
. TGT represents the list of task outputs seen from the root of the build directory
Here are a few examples:
[source,python]
---------------
${SRC[0].parent.abspath()} <1>
${bld.root.abspath()} <2>
${tsk.uid()} <3>
${CPPPATH_ST:INCPATHS} <4>
---------------
<1> Absolute path of the parent folder of the task first source file
<2> File system root
<3> Print the task unique identifier
<4> Perform a map replacement equivalent to _[env.CPPPATH_ST % x for x in env.INCPATHS]_
==== Direct class modifications
===== Always execute
The function 'waflib.Task.always_run' is used to force a task to be executed whenever a build is performed. It sets a method 'runnable_status' that always return _RUN_ME_.