waf/docs/book/tasks.txt


== Task processing

This chapter provides a description of the task classes which are used during the build phase.

=== Task execution

==== Main actors

The build context is only used to create the tasks and to return lists of tasks that may be executed in parallel. The scheduling is delegated to a task producer which lets task consumers to execute the tasks. The task producer keeps a record of the build state such as the amount of tasks processed or the errors.

image::tasks_actors{PIC}["Actors processing the tasks"{backend@docbook:,width=250:},align="center"]

// To reduce the build time, it is interesting to take advantage of the hardware (multiple cpu cores) or of the environment (distributed builds).
The amount of consumers is determined from the number of processors, or may be set manually by using the '-j' option:

[source,shishell]
------------------
$ waf -j3
------------------

==== Build groups

The task producer iterates over lists of tasks returned by the build context. Although the tasks from a list may be executed in parallel by the consumer threads, all the tasks from one list must be consumed before processing another list of tasks. The build ends when there are no more tasks to process.

These lists of tasks are called _build groups_ and may be accessed from the build scripts. Let's demonstrate this behaviour on an example:

// tasks_groups
[source,python]
---------------
def build(ctx):
    for i in range(8):
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_a_%d' % i,
            color='YELLOW', name='tasks a')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_a_%d' % i, target='wscript_b_%d' % i,
            color='GREEN', name='tasks b')
    for i in range(8)
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_c_%d' % i,
            color='BLUE', name='tasks c')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_c_%d' % i, target='wscript_d_%d' % i,
            color='PINK', name='tasks d')
---------------

Each green task must be executed after one yellow task and each pink task must be executed after one blue task. Because there is only one group by default, the parallel execution will be similar to the following:

image::tasks_nogroup{PIC}["One build group"{backend@docbook:,width=440:},align="center"]

We will now modify the example to add one more build group.

[source,python]
---------------
def build(ctx):
    for i in range(8):
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_a_%d' % i,
            color='YELLOW', name='tasks a')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_a_%d' % i, target='wscript_b_%d' % i,
            color='GREEN', name='tasks b')
    ctx.add_group()
    for i in range(8):
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_c_%d' % i,
            color='BLUE', name='tasks c')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_c_%d' % i, target='wscript_d_%d' % i,
            color='PINK', name='tasks d')
---------------

Now a separator will appear between the group of yellow and green tasks and the group of blue and violet taks:

image::tasks_twogroups{PIC}["Two build groups"{backend@docbook:,width=440:},align="center"]

The tasks and tasks generator are added implicitely to the current group. By giving a name to the groups, it is easy to control what goes where:

// tasks_groups2
[source,python]
---------------
def build(ctx):

    ctx.add_group('group1')
    ctx.add_group('group2')

    for i in range(8):
        ctx.set_group('group1')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_a_%d' % i,
            color='YELLOW', name='tasks a')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_a_%d' % i, target='wscript_b_%d' % i,
            color='GREEN', name='tasks b')

        ctx.set_group('group2')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='wscript_c_%d' % i,
            color='BLUE', name='tasks c')
        ctx(rule='cp ${SRC} ${TGT}', source='wscript_c_%d' % i, target='wscript_d_%d' % i,
            color='PINK', name='tasks d')
---------------

In the previous examples, all task generators from all build groups are processed before the build
actually starts. This default is provided to ensure that the task count is as accurate as possible.
Here is how to tune the build groups:

[source,python]
---------------
def build(ctx):
	from waflib.Build import POST_LAZY, POST_AT_ONCE
	ctx.post_mode = POST_AT_ONCE <1>
	#ctx.post_mode = POST_LAZY <2>
---------------

<1> All task generators create their tasks before the build starts (default behaviour)
<2> Groups are processed sequentially: all tasks from previous groups are executed before the task generators from the next group are processed

Build groups can be used for <<build_compiler_first,building a compiler to generate more source files>> to process.

==== The Producer-consumer system

In most python interpreters, a global interpreter lock prevents parallelization by more than one cpu core at a time. Therefore, it makes sense to restrict the task scheduling on a single task producer, and to let the threads access only the task execution.

The communication between producer and consumers is based on two queues _ready_ and _out_. The producer adds the tasks to _ready_ and reads them back from _out_. The consumers obtain the tasks from _ready_ and give them back to the producer into _out_ after executing 'task.run'.

The producer uses the an internal list named _outstanding_ to iterate over the tasks and to decide which ones to put in the queue _ready_. The tasks that cannot be processed are temporarily output in the list _frozen_ to avoid looping endlessly over the tasks waiting for others.

The following illustrates the relationship between the task producers and consumers as performed during the build:

image::prodcons{PIC}["Parallel execution"{backend@docbook:,width=900:},align="center"]

==== Task states and status

A state is assigned to each task (_task.hasrun = state_) to keep track of the execution. The possible values are the following:

[options="header", cols="1,1,6"]
|=================
|State    | Numeric value | Description
|NOT_RUN  | 0 | The task has not been processed yet
|MISSING  | 1 | The task outputs are missing
|CRASHED  | 2 | The task method 'run' returned a non-0 value
|EXCEPTION| 3 | An exception occured in the Task method 'run'
|SKIPPED  | 8 | The task was skipped (it was up-to-date)
|SUCCESS  | 9 | The execution was successful
|=================

To decide to execute a task or not, the producer uses the value returned by the task method 'runnable_status'. The possible return values are the following:

[options="header", cols="1,6"]
|=================
|Code    | Description
| ASK_LATER | The task may depend on other tasks which have not finished to run (not ready)
| SKIP_ME   | The task does not have to be executed, it is up-to-date
| RUN_ME    | The task is ready to be executed
|=================

The following diagram represents the interaction between the main task methods and the states and status:

image::task_run{PIC}["Task states"{backend@docbook:,width=610:},align="center"]

=== Build order constraints

==== The method set_run_after

The method _set_run_after_ is used to declare ordering constraints between tasks:

[source,python]
---------------
task1.set_run_after(task2)
---------------

The tasks to wait for are stored in the attribute _run_after_. They are used by the method _runnable_status_ to yield the status 'ASK_LATER' when a task has not run yet. This is merely for the build order and not for forcing a rebuild if one of the previous tasks is executed.

==== Computed constraints

===== Attribute after/before

The attributes _before_ and _after_ are used to declare ordering constraints between tasks:

[source,python]
---------------
from waflib.Task import Task
class task_test_a(Task):
    before = ['task_test_b']
class task_test_b(Task):
    after  = ['task_test_a']
---------------

===== ext_in/ext_out

Another way to force the order is by declaring lists of abstract symbols on the class attributes. This way the classes are not named explicitly, for example:

[source,python]
---------------
from waflib.Task import Task
class task_test_a(Task):
    ext_in  = ['.h']
class task_test_b(Task):
    ext_out = ['.h']
---------------

The 'extensions' ext_in and ext_out do not mean that the tasks have to produce files with such extensions, but are mere symbols for use as precedence constraints.

===== Order extraction

Before feeding the tasks to the producer-consumer system, a constraint extraction is performed on the tasks having input and output files. The attributes _run_after_ are initialized with the tasks to wait for.

The two functions called on lists of tasks are:

. _waflib.Task.set_precedence_constraints_: extract the build order from the task classes attributes ext_in/ext_out/before/after
. _waflib.Task.set_file_constraints_: extract the constraints from the tasks having input and output files

==== Weak order constraints

Tasks that are known to take a lot of time may be launched first to improve the build times. The general problem of finding an optimal order for launching tasks in parallel and with constraints is called http://en.wikipedia.org/wiki/Job-shop_problem[Job Shop]. In practice this problem can often be reduced to a critical path problem (approximation).

The following pictures illustrate the difference in scheduling a build with different independent tasks, in which a slow task is clearly identified, and launched first:

[source,python]
---------------
def build(ctx):
    for x in range(5):
        ctx(rule='sleep 1', color='GREEN', name='short task')
    ctx(rule='sleep 5', color='RED', name='long task')
---------------

image::tasks_nosort{PIC}["No particular order"{backend@docbook:,width=440:},align="center"]

A function is used to reorder the tasks from a group before they are passed to the producer. We will replace it to reorder the long task in first position:

// tasks_weak
[source,python]
---------------
from waflib import Task
old = Task.set_file_constraints
def meth(lst):
    lst.sort(cmp=lambda x, y: cmp(x.__class__.__name__, y.__class__.__name__)) <1>
    old(lst) <2>
Task.set_file_constraints = meth <3>
---------------

<1> Set the long task in first position
<2> Execute the original code
<3> Replace the method

Here is a representation of the effect:

image::tasks_sort{PIC}["Slowest task first"{backend@docbook:,width=440:},align="center"]

=== Dependencies

==== Task signatures

The direct instances of 'waflib.Task.TaskBase' are very limited and cannot be used to track file changes. The subclass 'waflib.Task.Task' provides the necessary features for the most common builds in which source files are used to produce target files.

The dependency tracking is based on the use of hashes of the dependencies called *task signatures*. The signature is computed from various dependencies source, such as input files and configuration set values.

The following diagram describes how 'waflib.Task.Task' instances are processed:

image::task_signature{PIC}["Signatures"{backend@docbook:,height=580:},align="center"]

The following data is used in the signature computation:

. Explicit dependencies: _input nodes_ and dependencies set explicitly by using _bld.depends_on_
. Implicit dependencies: dependencies searched by a scanner method (the method _scan_)
. Values: configuration set values such as compilation flags

==== Explicit dependencies

===== Input and output nodes

The task objects do not directly depend on other tasks. Other tasks may exist or not, and be executed or nodes. Rather, the input and output nodes hold themselves signatures values, which come from different sources:

. Nodes for build files usually inherit the signature of the task that generated the file
. Nodes from elsewhere have a signature computed automatically from the file contents (hash)

===== Global dependencies on other nodes

The tasks may be informed that some files may depend on other files transitively without listing them in the inputs. This is achieved by the method _add_manual_dependency_ from the build context:

// tasks_manual_deps
[source,python]
---------------
def configure(ctx):
    pass

def build(ctx):
    ctx(rule='cp ${SRC} ${TGT}', source='wscript', target='somecopy')
    ctx.add_manual_dependency(
        ctx.path.find_node('wscript'),
        ctx.path.find_node('testfile'))
---------------

The file _somecopy_ will be rebuilt whenever _wscript_ or _testfile_ change, even by one character:

[source,shishell]
---------------
$ waf build
Waf: Entering directory `/tmp/tasks_manual_deps/build'
[1/1] somecopy: wscript -> build/somecopy
Waf: Leaving directory `/tmp/tasks_manual_deps/build'
'build' finished successfully (0.034s)

$ waf
Waf: Entering directory `/tmp/tasks_manual_deps/build'
Waf: Leaving directory `/tmp/tasks_manual_deps/build'
'build' finished successfully (0.006s)

$ echo " " >> testfile

$ waf
Waf: Entering directory `/tmp/tasks_manual_deps/build'
[1/1] somecopy: wscript -> build/somecopy
Waf: Leaving directory `/tmp/tasks_manual_deps/build'
'build' finished successfully (0.022s)
---------------

==== Implicit dependencies (scanner methods)

Some tasks can be created dynamically after the build has started, so the dependencies cannot be known in advance. Task subclasses can provide a method named _scan_ to obtain additional nodes implicitly. In the following example, the _copy_ task provides a scanner method to depend on the wscript file found next to the input file.

// tasks_scan
[source,python]
---------------
import time
from waflib.Task import Task
class copy(Task):

    def run(self):
        return self.exec_command('cp %s %s' % (self.inputs[0].abspath(), self.outputs[0].abspath()))

    def scan(self): <1>
        print('→ calling the scanner method')
        node = self.inputs[0].parent.find_resource('wscript')
        return ([node], time.time()) <2>

    def runnable_status(self):
        ret = super(copy, self).runnable_status() <3>
        bld = self.generator.bld <4>
        print('nodes:       %r' % bld.node_deps[self.uid()]) <5>
        print('custom data: %r' % bld.raw_deps[self.uid()]) <6>
        return ret

def configure(ctx):
    pass

def build(ctx):
    tsk = copy(env=ctx.env) <7>
    tsk.set_inputs(ctx.path.find_resource('a.in'))
    tsk.set_outputs(ctx.path.find_or_declare('b.out'))
    ctx.add_to_group(tsk)
---------------

<1> A scanner method
<2> The return value is a tuple containing a list of nodes to depend on and serializable data for custom uses
<3> Override the method runnable_status to add some logging
<4> Obtain a reference to the build context associated to this task
<5> The nodes returned by the scanner method are stored in the map *bld.node_deps*
<6> The custom data returned by the scanner method is stored in the map *bld.raw_deps*
<7> Create a task manually (encapsulation by task generators will be described in the next chapters)

[source,shishell]
---------------
$ waf
→ calling the scanner method <1>
nodes:  [/tmp/tasks_scan/wscript]
custom data: 55.51
[1/1] copy: a.in -> build/b.out
'build' finished successfully (0.021s)

$ waf <2>
nodes:  [/tmp/tasks_scan/wscript]
custom data: 1280561555.512006
'build' finished successfully (0.005s)

$ echo " " >> wscript <3>

$ waf
→ calling the scanner method
nodes:  [/tmp/tasks_scan/wscript]
custom data: 64.31
[1/1] copy: a.in -> build/b.out
'build' finished successfully (0.022s)
---------------

<1> The scanner method is always called on a clean build
<2> The scanner method is not called when nothing has changed, although the data returned is retrieved
<3> When a dependency changes, the scanner method is executed once again (the custom data has changed)

WARNING: If the build order is incorrect, the method _scan_ may fail to find dependent nodes (missing nodes) or the signature calculation may throw an exception (missing signature for dependent nodes).

==== Values

The habitual use of command-line parameters such as compilation flags lead to the creation of _dependencies on values_, and more specifically the configuration set values. The Task class attribute 'vars' is used to control what values can enter in the signature calculation. In the following example, the task created has no inputs and no outputs nodes, and only depends on the values.

// tasks_values
[source,python]
---------------
from waflib.Task import Task
class foo(Task): <1>
	vars = ['FLAGS'] <2>
	def run(self):
		print('the flags are %r' % self.env.FLAGS) <3>

def options(ctx):
	ctx.add_option('--flags', default='-f', dest='flags', type='string')

def build(ctx):
	ctx.env.FLAGS = ctx.options.flags <4>
	tsk = foo(env=ctx.env)
	ctx.add_to_group(tsk)

def configure(ctx):
	pass
---------------

<1> Create a task class named _foo_
<2> The task instances will be executed whenever 'self.env.FLAGS' changes
<3> Print the value for debugging purposes
<4> Read the value from the command-line

The execution will produce the following output:

[source,shishell]
---------------
$ waf --flags abcdef
[1/1] foo:
the flags are 'abcdef' <1>
'build' finished successfully (0.006s)

$ waf --flags abcdef <2>
'build' finished successfully (0.004s)

$ waf --flags abc
[1/1] foo: <3>
the flags are 'abc'
'build' finished successfully (0.006s)
---------------

<1> The task is executed on the first run
<2> The dependencies have not changed, so the task is not executed
<3> The flags have changed so the task is executed

=== Task tuning

==== Class access

When a task provides an attribute named _run_str_ as in the following example:

// tasks_values2
[source,python]
---------------
def configure(ctx):
	ctx.env.COPY      = '/bin/cp'
	ctx.env.COPYFLAGS = ['-f']

def build(ctx):
	from waflib.Task import Task
	class copy(Task):
		run_str = '${COPY} ${COPYFLAGS} ${SRC} ${TGT}'
	print(copy.vars)

	tsk = copy(env=ctx.env)
	tsk.set_inputs(ctx.path.find_resource('wscript'))
	tsk.set_outputs(ctx.path.find_or_declare('b.out'))
	ctx.add_to_group(tsk)
---------------

It is assumed that 'run_str' represents a command-line, and that the variables in _$\{}_ such as 'COPYFLAGS' represent variables to add to the dependencies. A metaclass processes 'run_str' to obtain the method 'run' (called to execute the task) and the variables in the attribute 'vars' (merged with existing variables). The function created is displayed in the following output:

[source,shishell]
---------------
$ waf --zones=action
13:36:49 action   def f(tsk):
	env = tsk.env
	gen = tsk.generator
	bld = gen.bld
	wd = getattr(tsk, 'cwd', None)
	def to_list(xx):
		if isinstance(xx, str): return [xx]
		return xx
	lst = []
	lst.extend(to_list(env['COPY']))
	lst.extend(to_list(env['COPYFLAGS']))
	lst.extend([a.path_from(bld.bldnode) for a in tsk.inputs])
	lst.extend([a.path_from(bld.bldnode) for a in tsk.outputs])
	lst = [x for x in lst if x]
	return tsk.exec_command(lst, cwd=wd, env=env.env or None)
[1/1] copy: wscript -> build/b.out
['COPY', 'COPYFLAGS']
'build' finished successfully (0.007s)
---------------

All subclasses of 'waflib.Task.TaskBase' are stored on the module attribute 'waflib.Task.classes'. Therefore, the 'copy' task can be accessed by using:

[source,python]
---------------
from waflib import Task
cls = Task.classes['copy']
---------------

==== Scriptlet expressions

Although the 'run_str' is aimed at configuration set variables, a few special cases are provided for convenience:

. If the value starts by *env*, *gen*, *bld* or *tsk*, a method call will be made
. If the value starts by SRC[n] or TGT[n], a method call to the input/output node _n_ will be made
. SRC represents the list of task inputs seen from the root of the build directory
. TGT represents the list of task outputs seen from the root of the build directory

Here are a few examples:

[source,python]
---------------
${SRC[0].parent.abspath()} <1>
${bld.root.abspath()} <2>
${tsk.uid()} <3>
${CPPPATH_ST:INCPATHS} <4>
---------------

<1> Absolute path of the parent folder of the task first source file
<2> File system root
<3> Print the task unique identifier
<4> Perform a map replacement equivalent to _[env.CPPPATH_ST % x for x in env.INCPATHS]_

==== Command chains

In order to improve script portability 'run_str' can also be a list of command strings or functions. All sub-commands must complete for the task to succeed.

The following example shows how to chain three commands to copy `wscript` to `wscript.out`. The output file will be removed first if it already exists.
The copy is then performed. When successful, the output file permissions are changed to turn it into an executable.

// tasks_chains
[source,python]
---------------
def build(ctx):
	import os
	from waflib import Utils, Task

	def chmod_fun(task): <1>
		for x in task.outputs:
			os.chmod(x.abspath(), Utils.O755)
		return 0

	def remove_fun(task):
		for x in task.outputs:
			try:
				os.remove(x.abspath())
			except OSError:
				if os.path.exists(x.abspath()):
					raise ValueError('Cannot remove %r' % x) <2>
		return 0

	class complex_copy(Task.Task):
		run_str = (remove_fun, "${CP} ${SRC} ${TGT}", chmod_fun) <3>

	ctx.env.CP = '/bin/cp' <4>
	tsk = complex_copy(env=ctx.env.derive())
	tsk.set_inputs(ctx.path.find_resource('wscript'))
	tsk.set_outputs(ctx.path.find_or_declare('wscript.out'))
	ctx.add_to_group(tsk)
---------------

<1> The function arguments is the same as that of 'run' methods. Function commands return 0 or None in case of success.
<2> Function commands can raise exceptions such as OSError, ValueError or WafError
<3> Several functions or strings can be chained together; the build will run the commands in the order specified
<4> Rebuilds will occur when variables used in command strings are modified, and when command strings or function definitions change

NOTE: Chaining is also possible through subclassing.

==== Direct class modifications

===== Always execute

The function 'waflib.Task.always_run' is used to force a task to be executed whenever a build is performed. It sets a method 'runnable_status' that always return _RUN_ME_.

// task_always
[source,python]
---------------
def configure(ctx):
    pass

def build(ctx):
    from waflib import Task
    class copy(Task.Task):
        run_str = 'cp ${SRC} ${TGT}'
    copy = waflib.Task.always_run(copy)

    tsk = copy(env=ctx.env)
    tsk.set_inputs(ctx.path.find_resource('wscript'))
    tsk.set_outputs(ctx.path.find_or_declare('b.out'))
    ctx.add_to_group(tsk)
---------------

For convenience, rule-based task generators can declare the *always* attribute to achieve the same results:

[source,python]
---------------
def build(ctx):
    ctx(
        rule   = 'echo hello',
        always = True
    )
---------------

===== File hashes and dependencies

Nodes created by tasks during the build inherit the signature of the task that created them.
Tasks consuming such nodes as inputs will be executed whenever the first tasks are executed.
This is usually a desirable behaviour, as the tasks will propagate the dependencies in a transitive manner.

In a few contexts though, there can be an excess of downstream rebuilds even if the output files content have not changed.
This will also cause build files in the source directory to be rebuild whenever a new build is initiated (files in the source directory are hashed).
The function 'waflib.Task.update_outputs' is used to enable file hashes in task classes, it is used in the same way as 'waflib.Task.always_run'.

For convenience, rule-based task generators can provide the *update_outputs* attribute to simplify the declaration:

[source,python]
---------------
def build(ctx):
    ctx(
        rule           = 'touch ${TGT}',
        source         = 'wscript',
        target         = ctx.path.make_node('wscript2'),
        update_outputs = True
    )
    ctx(
        rule           = 'cp ${SRC} ${TGT}',
        source         = ctx.path.make_node('wscript2'),
        target         = 'wscript3'
    )
---------------

In this example, the file *wscript2* is created in the source directory.
The *update_outputs* keyword is therefore necessary to prevent unnecessary rebuilds.
Additionally, *wscript3* is only rebuilt when the contents of *wscript2* change.