The libstdc++ parallel mode

The latest version of this document is always available at http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html.

To the libstdc++-v3 homepage.

The libstdc++ parallel mode is an experimental parallel implementation of many algorithms the C++ Standard Library.

Several of the standard algorithms, for instance std::sort, are made parallel using OpenMP annotations. These parallel mode constructs and can be invoked by explicit source declaration or by compiling existing sources with a specific compiler flag.

The libstdc++ parallel mode

The libstdc++ parallel mode performs parallelization of algorithms, function objects, classes, and functions in the C++ Standard.

Using the libstdc++ parallel mode

To use the libstdc++ parallel mode, compile your application with the compiler flag -D_GLIBCXX_PARALLEL -fopenmp. This will link in libgomp, the GNU OpenMP implementation, whose presence is mandatory. In addition, hardware capable of atomic operations is mandatory. Actually activating these atomic operations may require explicit compiler flags on some targets (like sparc and x86), such as -march=i686, -march=native or -mcpu=v9.

Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, and cannot be confused with normal mode symbols.

The following library components in the include <numeric> are included in the parallel mode:

std::accumulate
std::adjacent_difference
std::inner_product
std::partial_sum

The following library components in the include <algorithm> are included in the parallel mode:

std::adjacent_find
std::count
std::count_if
std::equal
std::find
std::find_if
std::find_first_of
std::for_each
std::generate
std::generate_n
std::lexicographical_compare
std::mismatch
std::search
std::search_n
std::transform
std::replace
std::replace_if
std::max_element
std::merge
std::min_element
std::nth_element
std::partial_sort
std::partition
std::random_shuffle
std::set_union
std::set_intersection
std::set_symmetric_difference
std::set_difference
std::sort
std::stable_sort
std::unique_copy

The following library components in the includes <set> and <map> are included in the parallel mode:

std::(multi_)map/set<T>::(multi_)map/set(Iterator begin, Iterator end) (bulk construction)
std::(multi_)map/set<T>::insert(Iterator begin, Iterator end) (bulk insertion)

Using the parallel algorithms without parallel mode

When it is not feasible to recompile your entire application, or only specific algorithms need to be parallel-aware, individual parallel algorithms can be made available explicitly. These parallel algorithms are functionally equivalent to the standard drop-in algorithms used in parallel mode, but they are available in a separate namespace as GNU extensions and may be used in programs compiled with either release mode or with parallel mode. The following table provides the names and headers of the parallel algorithms:

Algorithm	Header	Parallel algorithm	Parallel header
std::accumulate	<numeric>	__gnu_parallel::accumulate	<parallel/numeric>
std::adjacent_difference	<numeric>	__gnu_parallel::adjacent_difference	<parallel/numeric>
std::inner_product	<numeric>	__gnu_parallel::inner_product	<parallel/numeric>
std::partial_sum	<numeric>	__gnu_parallel::partial_sum	<parallel/numeric>
std::adjacent_find	<algorithm>	__gnu_parallel::adjacent_find	<parallel/algorithm>
std::count	<algorithm>	__gnu_parallel::count	<parallel/algorithm>
std::count_if	<algorithm>	__gnu_parallel::count_if	<parallel/algorithm>
std::equal	<algorithm>	__gnu_parallel::equal	<parallel/algorithm>
std::find	<algorithm>	__gnu_parallel::find	<parallel/algorithm>
std::find_if	<algorithm>	__gnu_parallel::find_if	<parallel/algorithm>
std::find_first_of	<algorithm>	__gnu_parallel::find_first_of	<parallel/algorithm>
std::for_each	<algorithm>	__gnu_parallel::for_each	<parallel/algorithm>
std::generate	<algorithm>	__gnu_parallel::generate	<parallel/algorithm>
std::generate_n	<algorithm>	__gnu_parallel::generate_n	<parallel/algorithm>
std::lexicographical_compare	<algorithm>	__gnu_parallel::lexicographical_compare	<parallel/algorithm>
std::mismatch	<algorithm>	__gnu_parallel::mismatch	<parallel/algorithm>
std::search	<algorithm>	__gnu_parallel::search	<parallel/algorithm>
std::search_n	<algorithm>	__gnu_parallel::search_n	<parallel/algorithm>
std::transform	<algorithm>	__gnu_parallel::transform	<parallel/algorithm>
std::replace	<algorithm>	__gnu_parallel::replace	<parallel/algorithm>
std::replace_if	<algorithm>	__gnu_parallel::replace_if	<parallel/algorithm>
std::max_element	<algorithm>	__gnu_parallel::max_element	<parallel/algorithm>
std::merge	<algorithm>	__gnu_parallel::merge	<parallel/algorithm>
std::min_element	<algorithm>	__gnu_parallel::min_element	<parallel/algorithm>
std::nth_element	<algorithm>	__gnu_parallel::nth_element	<parallel/algorithm>
std::partial_sort	<algorithm>	__gnu_parallel::partial_sort	<parallel/algorithm>
std::partition	<algorithm>	__gnu_parallel::partition	<parallel/algorithm>
std::random_shuffle	<algorithm>	__gnu_parallel::random_shuffle	<parallel/algorithm>
std::set_union	<algorithm>	__gnu_parallel::set_union	<parallel/algorithm>
std::set_intersection	<algorithm>	__gnu_parallel::set_intersection	<parallel/algorithm>
std::set_symmetric_difference	<algorithm>	__gnu_parallel::set_symmetric_difference	<parallel/algorithm>
std::set_difference	<algorithm>	__gnu_parallel::set_difference	<parallel/algorithm>
std::sort	<algorithm>	__gnu_parallel::sort	<parallel/algorithm>
std::stable_sort	<algorithm>	__gnu_parallel::stable_sort	<parallel/algorithm>
std::unique_copy	<algorithm>	__gnu_parallel::unique_copy	<parallel/algorithm>

Parallel mode semantics

The parallel mode STL algorithms are currently not exception-safe, i. e. user-defined functors must not throw exceptions.

Since the current GCC OpenMP implementation does not support OpenMP parallel regions in concurrent threads, it is not possible to call parallel STL algorithm in concurrent threads, either. It might work with other compilers, though.

Configuration and Tuning

Some algorithm variants can be enabled/disabled/selected at compile-time. See <compiletime_settings.h> and See <features.h> for details.

To specify the number of threads to be used for an algorithm, use omp_set_num_threads. To force a function to execute sequentially, even though parallelism is switched on in general, add __gnu_parallel::sequential_tag() to the end of the argument list.

Parallelism always incurs some overhead. Thus, it is not helpful to parallelize operations on very small sets of data. There are measures to avoid parallelizing stuff that is not worth it. For each algorithm, a minimum problem size can be stated, usually using the variable __gnu_parallel::Settings::[algorithm]_minimal_n. Please see <settings.h> for details.

Interface basics and general design

All parallel algorithms are intended to have signatures that are equivalent to the ISO C++ algorithms replaced. For instance, the std::adjacent_find function is declared as:

namespace std
{
  template<typename _FIter>
    _FIter
    adjacent_find(_FIter, _FIter);
}

Which means that there should be something equivalent for the parallel version. Indeed, this is the case:

namespace std
{
  namespace __parallel
  {
    template<typename _FIter>
      _FIter
      adjacent_find(_FIter, _FIter);

    ...
  }
}

But.... why the elipses?

The elipses in the example above represent additional overloads required for the parallel version of the function. These additional overloads are used to dispatch calls from the ISO C++ function signature to the appropriate parallel function (or sequential function, if no parallel functions are deemed worthy), based on either compile-time or run-time conditions.

Compile-time conditions are referred to as "embarrassingly parallel," and are denoted with the appropriate dispatch object, ie one of __gnu_parallel::sequential_tag, __gnu_parallel::parallel_tag, __gnu_parallel::balanced_tag, __gnu_parallel::unbalanced_tag, __gnu_parallel::omp_loop_tag, or __gnu_parallel::omp_loop_static_tag.

Run-time conditions depend on the hardware being used, the number of threads available, etc., and are denoted by the use of the enum __gnu_parallel::parallelism. Values of this enum include __gnu_parallel::sequential, __gnu_parallel::parallel_unbalanced, __gnu_parallel::parallel_balanced, __gnu_parallel::parallel_omp_loop, __gnu_parallel::parallel_omp_loop_static, or __gnu_parallel::parallel_taskqueue.

Putting all this together, the general view of overloads for the parallel algorithms look like this:

ISO C++ signature
ISO C++ signature + sequential_tag argument
ISO C++ signature + parallelism argument

Please note that the implementation may use additional functions (designated with the _switch suffix) to dispatch from the ISO C++ signature to the correct parallel version. Also, some of the algorithms do not have support for run-time conditions, so the last overload is therefore missing.

Relevant namespaces

One namespace contain versions of code that are explicitly sequential: __gnu_serial.

Two namespaces contain the parallel mode: std::__parallel and __gnu_parallel.

Parallel implementations of standard components, including template helpers to select parallelism, are defined in namespace std::__parallel. For instance, std::transform from <algorithm> has a parallel counterpart in std::__parallel::transform from <parallel/algorithm>. In addition, these parallel implementations are injected into namespace __gnu_parallel with using declarations.

Support and general infrastructure is in namespace __gnu_parallel.

More information, and an organized index of types and functions related to the parallel mode on a per-namespace basis, can be found in the generated source documentation.

Testing

Both the normal conformance and regression tests and the supplemental performance tests work.

To run the conformance and regression tests with the parallel mode active,

make check-parallel

The log and summary files for conformance testing are in the testsuite/parallel directory.

To run the performance tests with the parallel mode active,

make check-performance-parallel

The result file for performance testing are in the testsuite directory, in the file libstdc++_performance.sum. In addition, the policy-based containers have their own visualizations, which have additional software dependencies than the usual bare-boned text file, and can be generated by using the make doc-performance rule in the testsuite's Makefile.

Return to the top of the page or to the libstdc++ homepage.

References / Further Reading

Johannes Singler, Peter Sanders, Felix Putze. The Multi-Core Standard Template Library. Euro-Par 2007: Parallel Processing. (LNCS 4641)

Leonor Frias, Johannes Singler: Parallelization of Bulk Operations for STL Dictionaries. Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)

See license.html for copying conditions. Comments and suggestions are welcome, and may be sent to the libstdc++ mailing list.