2008-02-12 03:39:33 +01:00
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2011-02-09 02:45:25 +01:00
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
2011-09-28 03:37:10 +02:00
< html xmlns = "http://www.w3.org/1999/xhtml" > < head > < title > Design< / title > < meta name = "generator" content = "DocBook XSL-NS Stylesheets V1.76.1" / > < meta name = "keywords" content = " C++ , library , parallel " / > < meta name = "keywords" content = " ISO C++ , library " / > < meta name = "keywords" content = " ISO C++ , runtime , library " / > < link rel = "home" href = "../index.html" title = "The GNU C++ Library" / > < link rel = "up" href = "parallel_mode.html" title = "Chapter 18. Parallel Mode" / > < link rel = "prev" href = "bk01pt03ch18s03.html" title = "Using" / > < link rel = "next" href = "bk01pt03ch18s05.html" title = "Testing" / > < / head > < body > < div class = "navheader" > < table width = "100%" summary = "Navigation header" > < tr > < th colspan = "3" align = "center" > Design< / th > < / tr > < tr > < td align = "left" > < a accesskey = "p" href = "bk01pt03ch18s03.html" > Prev< / a > < / td > < th width = "60%" align = "center" > Chapter 18. Parallel Mode< / th > < td align = "right" > < a accesskey = "n" href = "bk01pt03ch18s05.html" > Next< / a > < / td > < / tr > < / table > < hr / > < / div > < div class = "section" title = "Design" > < div class = "titlepage" > < div > < div > < h2 class = "title" > < a id = "manual.ext.parallel_mode.design" / > Design< / h2 > < / div > < / div > < / div > < p >
2011-02-09 02:45:25 +01:00
< / p > < div class = "section" title = "Interface Basics" > < div class = "titlepage" > < div > < div > < h3 class = "title" > < a id = "parallel_mode.design.intro" / > Interface Basics< / h3 > < / div > < / div > < / div > < p >
2008-03-20 15:20:49 +01:00
All parallel algorithms are intended to have signatures that are
2008-02-12 03:39:33 +01:00
equivalent to the ISO C++ algorithms replaced. For instance, the
2008-03-20 15:20:49 +01:00
< code class = "function" > std::adjacent_find< / code > function is declared as:
2008-02-12 03:39:33 +01:00
< / p > < pre class = "programlisting" >
namespace std
{
template< typename _FIter>
_FIter
adjacent_find(_FIter, _FIter);
}
< / pre > < p >
Which means that there should be something equivalent for the parallel
version. Indeed, this is the case:
< / p > < pre class = "programlisting" >
namespace std
{
namespace __parallel
{
template< typename _FIter>
_FIter
adjacent_find(_FIter, _FIter);
...
}
}
2008-04-11 00:14:17 +02:00
< / pre > < p > But.... why the ellipses?
< / p > < p > The ellipses in the example above represent additional overloads
2008-02-12 03:39:33 +01:00
required for the parallel version of the function. These additional
overloads are used to dispatch calls from the ISO C++ function
signature to the appropriate parallel function (or sequential
function, if no parallel functions are deemed worthy), based on either
compile-time or run-time conditions.
2008-06-24 20:34:52 +02:00
< / p > < p > The available signature options are specific for the different
algorithms/algorithm classes.< / p > < p > The general view of overloads for the parallel algorithms look like this:
2011-02-09 02:45:25 +01:00
< / p > < div class = "itemizedlist" > < ul class = "itemizedlist" > < li class = "listitem" > < p > ISO C++ signature< / p > < / li > < li class = "listitem" > < p > ISO C++ signature + sequential_tag argument< / p > < / li > < li class = "listitem" > < p > ISO C++ signature + algorithm-specific tag type
2008-06-24 20:34:52 +02:00
(several signatures)< / p > < / li > < / ul > < / div > < p > Please note that the implementation may use additional functions
2008-02-12 03:39:33 +01:00
(designated with the < code class = "code" > _switch< / code > suffix) to dispatch from the
ISO C++ signature to the correct parallel version. Also, some of the
algorithms do not have support for run-time conditions, so the last
overload is therefore missing.
2011-02-09 02:45:25 +01:00
< / p > < / div > < div class = "section" title = "Configuration and Tuning" > < div class = "titlepage" > < div > < div > < h3 class = "title" > < a id = "parallel_mode.design.tuning" / > Configuration and Tuning< / h3 > < / div > < / div > < / div > < div class = "section" title = "Setting up the OpenMP Environment" > < div class = "titlepage" > < div > < div > < h4 class = "title" > < a id = "parallel_mode.design.tuning.omp" / > Setting up the OpenMP Environment< / h4 > < / div > < / div > < / div > < p >
2008-03-20 15:20:49 +01:00
Several aspects of the overall runtime environment can be manipulated
by standard OpenMP function calls.
2008-02-12 03:39:33 +01:00
< / p > < p >
2008-06-24 20:34:52 +02:00
To specify the number of threads to be used for the algorithms globally,
use the function < code class = "function" > omp_set_num_threads< / code > . An example:
2008-03-20 15:20:49 +01:00
< / p > < pre class = "programlisting" >
#include < stdlib.h>
#include < omp.h>
int main()
{
// Explicitly set number of threads.
const int threads_wanted = 20;
omp_set_dynamic(false);
omp_set_num_threads(threads_wanted);
2008-06-24 20:34:52 +02:00
// Call parallel mode algorithms.
2008-03-20 15:20:49 +01:00
return 0;
}
< / pre > < p >
2008-06-24 20:34:52 +02:00
Some algorithms allow the number of threads being set for a particular call,
by augmenting the algorithm variant.
See the next section for further information.
< / p > < p >
2008-03-20 15:20:49 +01:00
Other parts of the runtime environment able to be manipulated include
nested parallelism (< code class = "function" > omp_set_nested< / code > ), schedule kind
(< code class = "function" > omp_set_schedule< / code > ), and others. See the OpenMP
documentation for more information.
2011-02-09 02:45:25 +01:00
< / p > < / div > < div class = "section" title = "Compile Time Switches" > < div class = "titlepage" > < div > < div > < h4 class = "title" > < a id = "parallel_mode.design.tuning.compile" / > Compile Time Switches< / h4 > < / div > < / div > < / div > < p >
2008-03-20 15:20:49 +01:00
To force an algorithm to execute sequentially, even though parallelism
is switched on in general via the macro < code class = "constant" > _GLIBCXX_PARALLEL< / code > ,
add < code class = "classname" > __gnu_parallel::sequential_tag()< / code > to the end
2008-06-24 20:34:52 +02:00
of the algorithm's argument list.
2008-03-20 15:20:49 +01:00
< / p > < p >
Like so:
< / p > < pre class = "programlisting" >
std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
< / pre > < p >
2008-06-24 20:34:52 +02:00
Some parallel algorithm variants can be excluded from compilation by
preprocessor defines. See the doxygen documentation on
< code class = "code" > compiletime_settings.h< / code > and < code class = "code" > features.h< / code > for details.
2008-03-20 15:20:49 +01:00
< / p > < p >
2008-06-24 20:34:52 +02:00
For some algorithms, the desired variant can be chosen at compile-time by
appending a tag object. The available options are specific to the particular
algorithm (class).
< / p > < p >
For the "embarrassingly parallel" algorithms, there is only one "tag object
type", the enum _Parallelism.
It takes one of the following values,
< code class = "code" > __gnu_parallel::parallel_tag< / code > ,
< code class = "code" > __gnu_parallel::balanced_tag< / code > ,
< code class = "code" > __gnu_parallel::unbalanced_tag< / code > ,
< code class = "code" > __gnu_parallel::omp_loop_tag< / code > ,
< code class = "code" > __gnu_parallel::omp_loop_static_tag< / code > .
This means that the actual parallelization strategy is chosen at run-time.
(Choosing the variants at compile-time will come soon.)
< / p > < p >
For the following algorithms in general, we have
< code class = "code" > __gnu_parallel::parallel_tag< / code > and
< code class = "code" > __gnu_parallel::default_parallel_tag< / code > , in addition to
< code class = "code" > __gnu_parallel::sequential_tag< / code > .
2010-04-22 01:33:44 +02:00
< code class = "code" > __gnu_parallel::default_parallel_tag< / code > chooses the default
2008-06-24 20:34:52 +02:00
algorithm at compiletime, as does omitting the tag.
< code class = "code" > __gnu_parallel::parallel_tag< / code > postpones the decision to runtime
(see next section).
For all tags, the number of threads desired for this call can optionally be
passed to the respective tag's constructor.
< / p > < p >
The < code class = "code" > multiway_merge< / code > algorithm comes with the additional choices,
< code class = "code" > __gnu_parallel::exact_tag< / code > and
< code class = "code" > __gnu_parallel::sampling_tag< / code > .
Exact and sampling are the two available splitting strategies.
< / p > < p >
For the < code class = "code" > sort< / code > and < code class = "code" > stable_sort< / code > algorithms, there are
several additional choices, namely
< code class = "code" > __gnu_parallel::multiway_mergesort_tag< / code > ,
2010-04-22 01:33:44 +02:00
< code class = "code" > __gnu_parallel::multiway_mergesort_exact_tag< / code > ,
2008-06-24 20:34:52 +02:00
< code class = "code" > __gnu_parallel::multiway_mergesort_sampling_tag< / code > ,
< code class = "code" > __gnu_parallel::quicksort_tag< / code > , and
< code class = "code" > __gnu_parallel::balanced_quicksort_tag< / code > .
Multiway mergesort comes with the two splitting strategies for multi-way
merging. The quicksort options cannot be used for < code class = "code" > stable_sort< / code > .
2011-02-09 02:45:25 +01:00
< / p > < / div > < div class = "section" title = "Run Time Settings and Defaults" > < div class = "titlepage" > < div > < div > < h4 class = "title" > < a id = "parallel_mode.design.tuning.settings" / > Run Time Settings and Defaults< / h4 > < / div > < / div > < / div > < p >
2008-04-11 00:14:17 +02:00
The default parallelization strategy, the choice of specific algorithm
2008-03-20 15:20:49 +01:00
strategy, the minimum threshold limits for individual parallel
algorithms, and aspects of the underlying hardware can be specified as
desired via manipulation
of < code class = "classname" > __gnu_parallel::_Settings< / code > member data.
2008-02-12 03:39:33 +01:00
< / p > < p >
2008-03-20 15:20:49 +01:00
First off, the choice of parallelization strategy: serial, parallel,
2008-06-24 20:34:52 +02:00
or heuristically deduced. This corresponds
2008-03-20 15:20:49 +01:00
to < code class = "code" > __gnu_parallel::_Settings::algorithm_strategy< / code > and is a
value of enum < span class = "type" > __gnu_parallel::_AlgorithmStrategy< / span >
type. Choices
include: < span class = "type" > heuristic< / span > , < span class = "type" > force_sequential< / span > ,
2008-06-24 20:34:52 +02:00
and < span class = "type" > force_parallel< / span > . The default is < span class = "type" > heuristic< / span > .
2008-03-20 15:20:49 +01:00
< / p > < p >
2008-06-24 20:34:52 +02:00
Next, the sub-choices for algorithm variant, if not fixed at compile-time.
Specific algorithms like < code class = "function" > find< / code > or < code class = "function" > sort< / code >
2008-03-20 15:20:49 +01:00
can be implemented in multiple ways: when this is the case,
a < code class = "classname" > __gnu_parallel::_Settings< / code > member exists to
pick the default strategy. For
example, < code class = "code" > __gnu_parallel::_Settings::sort_algorithm< / code > can
have any values of
enum < span class = "type" > __gnu_parallel::_SortAlgorithm< / span > : < span class = "type" > MWMS< / span > , < span class = "type" > QS< / span > ,
or < span class = "type" > QS_BALANCED< / span > .
< / p > < p >
Likewise for setting the minimal threshold for algorithm
2008-04-11 00:14:17 +02:00
parallelization. Parallelism always incurs some overhead. Thus, it is
2008-03-20 15:20:49 +01:00
not helpful to parallelize operations on very small sets of
data. Because of this, measures are taken to avoid parallelizing below
a certain, pre-determined threshold. For each algorithm, a minimum
problem size is encoded as a variable in the
active < code class = "classname" > __gnu_parallel::_Settings< / code > object. This
threshold variable follows the following naming scheme:
< code class = "code" > __gnu_parallel::_Settings::[algorithm]_minimal_n< / code > . So,
for < code class = "function" > fill< / code > , the threshold variable
2008-06-24 20:34:52 +02:00
is < code class = "code" > __gnu_parallel::_Settings::fill_minimal_n< / code > ,
2008-03-20 15:20:49 +01:00
< / p > < p >
Finally, hardware details like L1/L2 cache size can be hardwired
via < code class = "code" > __gnu_parallel::_Settings::L1_cache_size< / code > and friends.
< / p > < p >
2008-06-24 20:34:52 +02:00
< / p > < p >
2008-03-20 15:20:49 +01:00
All these configuration variables can be changed by the user, if
2008-06-24 20:34:52 +02:00
desired.
There exists one global instance of the class < code class = "classname" > _Settings< / code > ,
i. e. it is a singleton. It can be read and written by calling
< code class = "code" > __gnu_parallel::_Settings::get< / code > and
< code class = "code" > __gnu_parallel::_Settings::set< / code > , respectively.
Please note that the first call return a const object, so direct manipulation
is forbidden.
2011-05-02 12:39:15 +02:00
See < a class = "link" href = "http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html" >
2008-06-24 20:34:52 +02:00
< code class = "filename" > settings.h< / code > < / a >
2008-03-20 15:20:49 +01:00
for complete details.
< / p > < p >
A small example of tuning the default:
< / p > < pre class = "programlisting" >
#include < parallel/algorithm>
#include < parallel/settings.h>
int main()
{
__gnu_parallel::_Settings s;
s.algorithm_strategy = __gnu_parallel::force_parallel;
__gnu_parallel::_Settings::set(s);
// Do work... all algorithms will be parallelized, always.
return 0;
}
2011-02-09 02:45:25 +01:00
< / pre > < / div > < / div > < div class = "section" title = "Implementation Namespaces" > < div class = "titlepage" > < div > < div > < h3 class = "title" > < a id = "parallel_mode.design.impl" / > Implementation Namespaces< / h3 > < / div > < / div > < / div > < p > One namespace contain versions of code that are always
2008-03-20 15:20:49 +01:00
explicitly sequential:
2008-02-12 03:39:33 +01:00
< code class = "code" > __gnu_serial< / code > .
< / p > < p > Two namespaces contain the parallel mode:
2010-04-22 01:33:44 +02:00
< code class = "code" > std::__parallel< / code > and < code class = "code" > __gnu_parallel< / code > .
2008-02-12 03:39:33 +01:00
< / p > < p > Parallel implementations of standard components, including
template helpers to select parallelism, are defined in < code class = "code" > namespace
2008-03-20 15:20:49 +01:00
std::__parallel< / code > . For instance, < code class = "function" > std::transform< / code > from < code class = "filename" > algorithm< / code > has a parallel counterpart in
< code class = "function" > std::__parallel::transform< / code > from < code class = "filename" > parallel/algorithm< / code > . In addition, these parallel
2008-02-12 03:39:33 +01:00
implementations are injected into < code class = "code" > namespace
__gnu_parallel< / code > with using declarations.
< / p > < p > Support and general infrastructure is in < code class = "code" > namespace
__gnu_parallel< / code > .
< / p > < p > More information, and an organized index of types and functions
related to the parallel mode on a per-namespace basis, can be found in
the generated source documentation.
2011-09-28 03:37:10 +02:00
< / p > < / div > < / div > < div class = "navfooter" > < hr / > < table width = "100%" summary = "Navigation footer" > < tr > < td align = "left" > < a accesskey = "p" href = "bk01pt03ch18s03.html" > Prev< / a > < / td > < td align = "center" > < a accesskey = "u" href = "parallel_mode.html" > Up< / a > < / td > < td align = "right" > < a accesskey = "n" href = "bk01pt03ch18s05.html" > Next< / a > < / td > < / tr > < tr > < td align = "left" valign = "top" > Using < / td > < td align = "center" > < a accesskey = "h" href = "../index.html" > Home< / a > < / td > < td align = "right" valign = "top" > Testing< / td > < / tr > < / table > < / div > < / body > < / html >