The latest version of this document is always available at http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html.
To the libstdc++-v3 homepage.
The libstdc++ parallel mode is an experimental parallel implementation of many algorithms the C++ Standard Library.
Several of the standard algorithms, for instance
std::sort
, are made parallel using OpenMP
annotations. These parallel mode constructs and can be invoked by
explicit source declaration or by compiling existing sources with a
specific compiler flag.
The libstdc++ parallel mode performs parallelization of algorithms, function objects, classes, and functions in the C++ Standard.
To use the libstdc++ parallel mode, compile your application with
the compiler flag -D_GLIBCXX_PARALLEL -fopenmp
. This
will link in libgomp
, the GNU OpenMP implementation,
whose presence is mandatory. In addition, hardware capable of atomic
operations is mandatory. Actually activating these atomic
operations may require explicit compiler flags on some targets
(like sparc and x86), such as -march=i686
,
-march=native
or -mcpu=v9
.
Note that the _GLIBCXX_PARALLEL
define may change the
sizes and behavior of standard class templates such as
std::search
, and therefore one can only link code
compiled with parallel mode and code compiled without parallel mode
if no instantiation of a container is passed between the two
translation units. Parallel mode functionality has distinct linkage,
and cannot be confused with normal mode symbols.
The following library components in the include
<numeric>
are included in the parallel mode:
std::accumulate
std::adjacent_difference
std::inner_product
std::partial_sum
The following library components in the include
<algorithm>
are included in the parallel mode:
std::adjacent_find
std::count
std::count_if
std::equal
std::find
std::find_if
std::find_first_of
std::for_each
std::generate
std::generate_n
std::lexicographical_compare
std::mismatch
std::search
std::search_n
std::transform
std::replace
std::replace_if
std::max_element
std::merge
std::min_element
std::nth_element
std::partial_sort
std::partition
std::random_shuffle
std::set_union
std::set_intersection
std::set_symmetric_difference
std::set_difference
std::sort
std::stable_sort
std::unique_copy
The following library components in the includes
<set>
and <map>
are included in the parallel mode:
std::(multi_)map/set<T>::(multi_)map/set(Iterator begin, Iterator end)
(bulk construction)std::(multi_)map/set<T>::insert(Iterator begin, Iterator end)
(bulk insertion)When it is not feasible to recompile your entire application, or only specific algorithms need to be parallel-aware, individual parallel algorithms can be made available explicitly. These parallel algorithms are functionally equivalent to the standard drop-in algorithms used in parallel mode, but they are available in a separate namespace as GNU extensions and may be used in programs compiled with either release mode or with parallel mode. The following table provides the names and headers of the parallel algorithms:
Algorithm | Header | Parallel algorithm | Parallel header |
---|---|---|---|
std::accumulate | <numeric> | __gnu_parallel::accumulate | <parallel/numeric> |
std::adjacent_difference | <numeric> | __gnu_parallel::adjacent_difference | <parallel/numeric> |
std::inner_product | <numeric> | __gnu_parallel::inner_product | <parallel/numeric> |
std::partial_sum | <numeric> | __gnu_parallel::partial_sum | <parallel/numeric> |
std::adjacent_find | <algorithm> | __gnu_parallel::adjacent_find | <parallel/algorithm> |
std::count | <algorithm> | __gnu_parallel::count | <parallel/algorithm> |
std::count_if | <algorithm> | __gnu_parallel::count_if | <parallel/algorithm> |
std::equal | <algorithm> | __gnu_parallel::equal | <parallel/algorithm> |
std::find | <algorithm> | __gnu_parallel::find | <parallel/algorithm> |
std::find_if | <algorithm> | __gnu_parallel::find_if | <parallel/algorithm> |
std::find_first_of | <algorithm> | __gnu_parallel::find_first_of | <parallel/algorithm> |
std::for_each | <algorithm> | __gnu_parallel::for_each | <parallel/algorithm> |
std::generate | <algorithm> | __gnu_parallel::generate | <parallel/algorithm> |
std::generate_n | <algorithm> | __gnu_parallel::generate_n | <parallel/algorithm> |
std::lexicographical_compare | <algorithm> | __gnu_parallel::lexicographical_compare | <parallel/algorithm> |
std::mismatch | <algorithm> | __gnu_parallel::mismatch | <parallel/algorithm> |
std::search | <algorithm> | __gnu_parallel::search | <parallel/algorithm> |
std::search_n | <algorithm> | __gnu_parallel::search_n | <parallel/algorithm> |
std::transform | <algorithm> | __gnu_parallel::transform | <parallel/algorithm> |
std::replace | <algorithm> | __gnu_parallel::replace | <parallel/algorithm> |
std::replace_if | <algorithm> | __gnu_parallel::replace_if | <parallel/algorithm> |
std::max_element | <algorithm> | __gnu_parallel::max_element | <parallel/algorithm> |
std::merge | <algorithm> | __gnu_parallel::merge | <parallel/algorithm> |
std::min_element | <algorithm> | __gnu_parallel::min_element | <parallel/algorithm> |
std::nth_element | <algorithm> | __gnu_parallel::nth_element | <parallel/algorithm> |
std::partial_sort | <algorithm> | __gnu_parallel::partial_sort | <parallel/algorithm> |
std::partition | <algorithm> | __gnu_parallel::partition | <parallel/algorithm> |
std::random_shuffle | <algorithm> | __gnu_parallel::random_shuffle | <parallel/algorithm> |
std::set_union | <algorithm> | __gnu_parallel::set_union | <parallel/algorithm> |
std::set_intersection | <algorithm> | __gnu_parallel::set_intersection | <parallel/algorithm> |
std::set_symmetric_difference | <algorithm> | __gnu_parallel::set_symmetric_difference | <parallel/algorithm> |
std::set_difference | <algorithm> | __gnu_parallel::set_difference | <parallel/algorithm> |
std::sort | <algorithm> | __gnu_parallel::sort | <parallel/algorithm> |
std::stable_sort | <algorithm> | __gnu_parallel::stable_sort | <parallel/algorithm> |
std::unique_copy | <algorithm> | __gnu_parallel::unique_copy | <parallel/algorithm> |
The parallel mode STL algorithms are currently not exception-safe, i. e. user-defined functors must not throw exceptions.
Since the current GCC OpenMP implementation does not support OpenMP parallel regions in concurrent threads, it is not possible to call parallel STL algorithm in concurrent threads, either. It might work with other compilers, though.
Some algorithm variants can be enabled/disabled/selected at compile-time.
See
<compiletime_settings.h>
and
See
<features.h>
for details.
To specify the number of threads to be used for an algorithm,
use omp_set_num_threads
.
To force a function to execute sequentially,
even though parallelism is switched on in general,
add __gnu_parallel::sequential_tag()
to the end of the argument list.
Parallelism always incurs some overhead. Thus, it is not
helpful to parallelize operations on very small sets of data.
There are measures to avoid parallelizing stuff that is not worth it.
For each algorithm, a minimum problem size can be stated,
usually using the variable
__gnu_parallel::Settings::[algorithm]_minimal_n
.
Please see
<settings.h>
for details.
All parallel algorithms are intended to have signatures that are
equivalent to the ISO C++ algorithms replaced. For instance, the
std::adjacent_find
function is declared as:
namespace std { template<typename _FIter> _FIter adjacent_find(_FIter, _FIter); }Which means that there should be something equivalent for the parallel version. Indeed, this is the case:
namespace std { namespace __parallel { template<typename _FIter> _FIter adjacent_find(_FIter, _FIter); ... } }
But.... why the elipses?
The elipses in the example above represent additional overloads required for the parallel version of the function. These additional overloads are used to dispatch calls from the ISO C++ function signature to the appropriate parallel function (or sequential function, if no parallel functions are deemed worthy), based on either compile-time or run-time conditions.
Compile-time conditions are referred to as "embarrassingly
parallel," and are denoted with the appropriate dispatch object, ie
one of __gnu_parallel::sequential_tag
,
__gnu_parallel::parallel_tag
,
__gnu_parallel::balanced_tag
,
__gnu_parallel::unbalanced_tag
,
__gnu_parallel::omp_loop_tag
, or
__gnu_parallel::omp_loop_static_tag
.
Run-time conditions depend on the hardware being used, the number
of threads available, etc., and are denoted by the use of the enum
__gnu_parallel::parallelism
. Values of this enum include
__gnu_parallel::sequential
,
__gnu_parallel::parallel_unbalanced
,
__gnu_parallel::parallel_balanced
,
__gnu_parallel::parallel_omp_loop
,
__gnu_parallel::parallel_omp_loop_static
, or
__gnu_parallel::parallel_taskqueue
.
Putting all this together, the general view of overloads for the parallel algorithms look like this:
Please note that the implementation may use additional functions
(designated with the _switch
suffix) to dispatch from the
ISO C++ signature to the correct parallel version. Also, some of the
algorithms do not have support for run-time conditions, so the last
overload is therefore missing.
One namespace contain versions of code that are explicitly sequential:
__gnu_serial
.
Two namespaces contain the parallel mode:
std::__parallel
and __gnu_parallel
.
Parallel implementations of standard components, including
template helpers to select parallelism, are defined in namespace
std::__parallel
. For instance, std::transform
from
<algorithm> has a parallel counterpart in
std::__parallel::transform
from
<parallel/algorithm>. In addition, these parallel
implementations are injected into namespace
__gnu_parallel
with using declarations.
Support and general infrastructure is in namespace
__gnu_parallel
.
More information, and an organized index of types and functions related to the parallel mode on a per-namespace basis, can be found in the generated source documentation.
Both the normal conformance and regression tests and the supplemental performance tests work.
To run the conformance and regression tests with the parallel mode active,
make check-parallel
The log and summary files for conformance testing are in the
testsuite/parallel
directory.
To run the performance tests with the parallel mode active,
make check-performance-parallel
The result file for performance testing are in the
testsuite
directory, in the file
libstdc++_performance.sum
. In addition, the policy-based
containers have their own visualizations, which have additional
software dependencies than the usual bare-boned text file, and can be
generated by using the make doc-performance
rule in the
testsuite's Makefile.
Return to the top of the page or to the libstdc++ homepage.
Johannes Singler, Peter Sanders, Felix Putze. The Multi-Core Standard Template Library. Euro-Par 2007: Parallel Processing. (LNCS 4641)
Leonor Frias, Johannes Singler: Parallelization of Bulk Operations for STL Dictionaries. Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
See license.html for copying conditions. Comments and suggestions are welcome, and may be sent to the libstdc++ mailing list.