879 lines
31 KiB
XML
879 lines
31 KiB
XML
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
|
|
xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
|
|
<?dbhtml filename="parallel_mode.html"?>
|
|
|
|
<info><title>Parallel Mode</title>
|
|
<keywordset>
|
|
<keyword>
|
|
C++
|
|
</keyword>
|
|
<keyword>
|
|
library
|
|
</keyword>
|
|
<keyword>
|
|
parallel
|
|
</keyword>
|
|
</keywordset>
|
|
</info>
|
|
|
|
|
|
|
|
<para> The libstdc++ parallel mode is an experimental parallel
|
|
implementation of many algorithms the C++ Standard Library.
|
|
</para>
|
|
|
|
<para>
|
|
Several of the standard algorithms, for instance
|
|
<function>std::sort</function>, are made parallel using OpenMP
|
|
annotations. These parallel mode constructs and can be invoked by
|
|
explicit source declaration or by compiling existing sources with a
|
|
specific compiler flag.
|
|
</para>
|
|
|
|
|
|
<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
|
|
|
|
|
|
<para>The following library components in the include
|
|
<filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
|
|
<itemizedlist>
|
|
<listitem><para><function>std::accumulate</function></para></listitem>
|
|
<listitem><para><function>std::adjacent_difference</function></para></listitem>
|
|
<listitem><para><function>std::inner_product</function></para></listitem>
|
|
<listitem><para><function>std::partial_sum</function></para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The following library components in the include
|
|
<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
|
|
<itemizedlist>
|
|
<listitem><para><function>std::adjacent_find</function></para></listitem>
|
|
<listitem><para><function>std::count</function></para></listitem>
|
|
<listitem><para><function>std::count_if</function></para></listitem>
|
|
<listitem><para><function>std::equal</function></para></listitem>
|
|
<listitem><para><function>std::find</function></para></listitem>
|
|
<listitem><para><function>std::find_if</function></para></listitem>
|
|
<listitem><para><function>std::find_first_of</function></para></listitem>
|
|
<listitem><para><function>std::for_each</function></para></listitem>
|
|
<listitem><para><function>std::generate</function></para></listitem>
|
|
<listitem><para><function>std::generate_n</function></para></listitem>
|
|
<listitem><para><function>std::lexicographical_compare</function></para></listitem>
|
|
<listitem><para><function>std::mismatch</function></para></listitem>
|
|
<listitem><para><function>std::search</function></para></listitem>
|
|
<listitem><para><function>std::search_n</function></para></listitem>
|
|
<listitem><para><function>std::transform</function></para></listitem>
|
|
<listitem><para><function>std::replace</function></para></listitem>
|
|
<listitem><para><function>std::replace_if</function></para></listitem>
|
|
<listitem><para><function>std::max_element</function></para></listitem>
|
|
<listitem><para><function>std::merge</function></para></listitem>
|
|
<listitem><para><function>std::min_element</function></para></listitem>
|
|
<listitem><para><function>std::nth_element</function></para></listitem>
|
|
<listitem><para><function>std::partial_sort</function></para></listitem>
|
|
<listitem><para><function>std::partition</function></para></listitem>
|
|
<listitem><para><function>std::random_shuffle</function></para></listitem>
|
|
<listitem><para><function>std::set_union</function></para></listitem>
|
|
<listitem><para><function>std::set_intersection</function></para></listitem>
|
|
<listitem><para><function>std::set_symmetric_difference</function></para></listitem>
|
|
<listitem><para><function>std::set_difference</function></para></listitem>
|
|
<listitem><para><function>std::sort</function></para></listitem>
|
|
<listitem><para><function>std::stable_sort</function></para></listitem>
|
|
<listitem><para><function>std::unique_copy</function></para></listitem>
|
|
</itemizedlist>
|
|
|
|
</section>
|
|
|
|
<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
|
|
|
|
|
|
<para> The parallel mode STL algorithms are currently not exception-safe,
|
|
i.e. user-defined functors must not throw exceptions.
|
|
Also, the order of execution is not guaranteed for some functions, of course.
|
|
Therefore, user-defined functors should not have any concurrent side effects.
|
|
</para>
|
|
|
|
<para> Since the current GCC OpenMP implementation does not support
|
|
OpenMP parallel regions in concurrent threads,
|
|
it is not possible to call parallel STL algorithm in
|
|
concurrent threads, either.
|
|
It might work with other compilers, though.</para>
|
|
|
|
</section>
|
|
|
|
<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
|
|
|
|
|
|
<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
|
|
|
|
|
|
<para>
|
|
Any use of parallel functionality requires additional compiler
|
|
and runtime support, in particular support for OpenMP. Adding this support is
|
|
not difficult: just compile your application with the compiler
|
|
flag <literal>-fopenmp</literal>. This will link
|
|
in <code>libgomp</code>, the GNU
|
|
OpenMP <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libgomp">implementation</link>,
|
|
whose presence is mandatory.
|
|
</para>
|
|
|
|
<para>
|
|
In addition, hardware that supports atomic operations and a compiler
|
|
capable of producing atomic operations is mandatory: GCC defaults to no
|
|
support for atomic operations on some common hardware
|
|
architectures. Activating atomic operations may require explicit
|
|
compiler flags on some targets (like sparc and x86), such
|
|
as <literal>-march=i686</literal>,
|
|
<literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
|
|
the GCC manual for more information.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
|
|
|
|
|
|
<para>
|
|
To use the libstdc++ parallel mode, compile your application with
|
|
the prerequisite flags as detailed above, and in addition
|
|
add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
|
|
use of the standard (sequential) algorithms to the appropriate parallel
|
|
equivalents. Please note that this doesn't necessarily mean that
|
|
everything will end up being executed in a parallel manner, but
|
|
rather that the heuristics and settings coded into the parallel
|
|
versions will be used to determine if all, some, or no algorithms
|
|
will be executed using parallel variants.
|
|
</para>
|
|
|
|
<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
|
|
sizes and behavior of standard class templates such as
|
|
<function>std::search</function>, and therefore one can only link code
|
|
compiled with parallel mode and code compiled without parallel mode
|
|
if no instantiation of a container is passed between the two
|
|
translation units. Parallel mode functionality has distinct linkage,
|
|
and cannot be confused with normal mode symbols.
|
|
</para>
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
|
|
|
|
|
|
<para>When it is not feasible to recompile your entire application, or
|
|
only specific algorithms need to be parallel-aware, individual
|
|
parallel algorithms can be made available explicitly. These
|
|
parallel algorithms are functionally equivalent to the standard
|
|
drop-in algorithms used in parallel mode, but they are available in
|
|
a separate namespace as GNU extensions and may be used in programs
|
|
compiled with either release mode or with parallel mode.
|
|
</para>
|
|
|
|
|
|
<para>An example of using a parallel version
|
|
of <function>std::sort</function>, but no other parallel algorithms, is:
|
|
</para>
|
|
|
|
<programlisting>
|
|
#include <vector>
|
|
#include <parallel/algorithm>
|
|
|
|
int main()
|
|
{
|
|
std::vector<int> v(100);
|
|
|
|
// ...
|
|
|
|
// Explicitly force a call to parallel sort.
|
|
__gnu_parallel::sort(v.begin(), v.end());
|
|
return 0;
|
|
}
|
|
</programlisting>
|
|
|
|
<para>
|
|
Then compile this code with the prerequisite compiler flags
|
|
(<literal>-fopenmp</literal> and any necessary architecture-specific
|
|
flags for atomic operations.)
|
|
</para>
|
|
|
|
<para> The following table provides the names and headers of all the
|
|
parallel algorithms that can be used in a similar manner:
|
|
</para>
|
|
|
|
<table frame="all">
|
|
<title>Parallel Algorithms</title>
|
|
|
|
<tgroup cols="4" align="left" colsep="1" rowsep="1">
|
|
<colspec colname="c1"/>
|
|
<colspec colname="c2"/>
|
|
<colspec colname="c3"/>
|
|
<colspec colname="c4"/>
|
|
|
|
<thead>
|
|
<row>
|
|
<entry>Algorithm</entry>
|
|
<entry>Header</entry>
|
|
<entry>Parallel algorithm</entry>
|
|
<entry>Parallel header</entry>
|
|
</row>
|
|
</thead>
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry><function>std::accumulate</function></entry>
|
|
<entry><filename class="headerfile">numeric</filename></entry>
|
|
<entry><function>__gnu_parallel::accumulate</function></entry>
|
|
<entry><filename class="headerfile">parallel/numeric</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><function>std::adjacent_difference</function></entry>
|
|
<entry><filename class="headerfile">numeric</filename></entry>
|
|
<entry><function>__gnu_parallel::adjacent_difference</function></entry>
|
|
<entry><filename class="headerfile">parallel/numeric</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><function>std::inner_product</function></entry>
|
|
<entry><filename class="headerfile">numeric</filename></entry>
|
|
<entry><function>__gnu_parallel::inner_product</function></entry>
|
|
<entry><filename class="headerfile">parallel/numeric</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><function>std::partial_sum</function></entry>
|
|
<entry><filename class="headerfile">numeric</filename></entry>
|
|
<entry><function>__gnu_parallel::partial_sum</function></entry>
|
|
<entry><filename class="headerfile">parallel/numeric</filename></entry>
|
|
</row>
|
|
<row>
|
|
<entry><function>std::adjacent_find</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::adjacent_find</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::count</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::count</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::count_if</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::count_if</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::equal</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::equal</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::find</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::find</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::find_if</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::find_if</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::find_first_of</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::find_first_of</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::for_each</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::for_each</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::generate</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::generate</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::generate_n</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::generate_n</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::lexicographical_compare</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::lexicographical_compare</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::mismatch</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::mismatch</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::search</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::search</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::search_n</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::search_n</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::transform</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::transform</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::replace</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::replace</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::replace_if</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::replace_if</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::max_element</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::max_element</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::merge</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::merge</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::min_element</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::min_element</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::nth_element</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::nth_element</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::partial_sort</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::partial_sort</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::partition</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::partition</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::random_shuffle</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::random_shuffle</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::set_union</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::set_union</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::set_intersection</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::set_intersection</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::set_symmetric_difference</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::set_difference</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::set_difference</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::sort</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::sort</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::stable_sort</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::stable_sort</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><function>std::unique_copy</function></entry>
|
|
<entry><filename class="headerfile">algorithm</filename></entry>
|
|
<entry><function>__gnu_parallel::unique_copy</function></entry>
|
|
<entry><filename class="headerfile">parallel/algorithm</filename></entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
|
|
|
|
<para>
|
|
</para>
|
|
<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
|
|
|
|
|
|
<para>
|
|
All parallel algorithms are intended to have signatures that are
|
|
equivalent to the ISO C++ algorithms replaced. For instance, the
|
|
<function>std::adjacent_find</function> function is declared as:
|
|
</para>
|
|
<programlisting>
|
|
namespace std
|
|
{
|
|
template<typename _FIter>
|
|
_FIter
|
|
adjacent_find(_FIter, _FIter);
|
|
}
|
|
</programlisting>
|
|
|
|
<para>
|
|
Which means that there should be something equivalent for the parallel
|
|
version. Indeed, this is the case:
|
|
</para>
|
|
|
|
<programlisting>
|
|
namespace std
|
|
{
|
|
namespace __parallel
|
|
{
|
|
template<typename _FIter>
|
|
_FIter
|
|
adjacent_find(_FIter, _FIter);
|
|
|
|
...
|
|
}
|
|
}
|
|
</programlisting>
|
|
|
|
<para>But.... why the ellipses?
|
|
</para>
|
|
|
|
<para> The ellipses in the example above represent additional overloads
|
|
required for the parallel version of the function. These additional
|
|
overloads are used to dispatch calls from the ISO C++ function
|
|
signature to the appropriate parallel function (or sequential
|
|
function, if no parallel functions are deemed worthy), based on either
|
|
compile-time or run-time conditions.
|
|
</para>
|
|
|
|
<para> The available signature options are specific for the different
|
|
algorithms/algorithm classes.</para>
|
|
|
|
<para> The general view of overloads for the parallel algorithms look like this:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem><para>ISO C++ signature</para></listitem>
|
|
<listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
|
|
<listitem><para>ISO C++ signature + algorithm-specific tag type
|
|
(several signatures)</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para> Please note that the implementation may use additional functions
|
|
(designated with the <code>_switch</code> suffix) to dispatch from the
|
|
ISO C++ signature to the correct parallel version. Also, some of the
|
|
algorithms do not have support for run-time conditions, so the last
|
|
overload is therefore missing.
|
|
</para>
|
|
|
|
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
|
|
|
|
|
|
|
|
<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
|
|
|
|
|
|
<para>
|
|
Several aspects of the overall runtime environment can be manipulated
|
|
by standard OpenMP function calls.
|
|
</para>
|
|
|
|
<para>
|
|
To specify the number of threads to be used for the algorithms globally,
|
|
use the function <function>omp_set_num_threads</function>. An example:
|
|
</para>
|
|
|
|
<programlisting>
|
|
#include <stdlib.h>
|
|
#include <omp.h>
|
|
|
|
int main()
|
|
{
|
|
// Explicitly set number of threads.
|
|
const int threads_wanted = 20;
|
|
omp_set_dynamic(false);
|
|
omp_set_num_threads(threads_wanted);
|
|
|
|
// Call parallel mode algorithms.
|
|
|
|
return 0;
|
|
}
|
|
</programlisting>
|
|
|
|
<para>
|
|
Some algorithms allow the number of threads being set for a particular call,
|
|
by augmenting the algorithm variant.
|
|
See the next section for further information.
|
|
</para>
|
|
|
|
<para>
|
|
Other parts of the runtime environment able to be manipulated include
|
|
nested parallelism (<function>omp_set_nested</function>), schedule kind
|
|
(<function>omp_set_schedule</function>), and others. See the OpenMP
|
|
documentation for more information.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
|
|
|
|
|
|
<para>
|
|
To force an algorithm to execute sequentially, even though parallelism
|
|
is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
|
|
add <classname>__gnu_parallel::sequential_tag()</classname> to the end
|
|
of the algorithm's argument list.
|
|
</para>
|
|
|
|
<para>
|
|
Like so:
|
|
</para>
|
|
|
|
<programlisting>
|
|
std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
|
|
</programlisting>
|
|
|
|
<para>
|
|
Some parallel algorithm variants can be excluded from compilation by
|
|
preprocessor defines. See the doxygen documentation on
|
|
<code>compiletime_settings.h</code> and <code>features.h</code> for details.
|
|
</para>
|
|
|
|
<para>
|
|
For some algorithms, the desired variant can be chosen at compile-time by
|
|
appending a tag object. The available options are specific to the particular
|
|
algorithm (class).
|
|
</para>
|
|
|
|
<para>
|
|
For the "embarrassingly parallel" algorithms, there is only one "tag object
|
|
type", the enum _Parallelism.
|
|
It takes one of the following values,
|
|
<code>__gnu_parallel::parallel_tag</code>,
|
|
<code>__gnu_parallel::balanced_tag</code>,
|
|
<code>__gnu_parallel::unbalanced_tag</code>,
|
|
<code>__gnu_parallel::omp_loop_tag</code>,
|
|
<code>__gnu_parallel::omp_loop_static_tag</code>.
|
|
This means that the actual parallelization strategy is chosen at run-time.
|
|
(Choosing the variants at compile-time will come soon.)
|
|
</para>
|
|
|
|
<para>
|
|
For the following algorithms in general, we have
|
|
<code>__gnu_parallel::parallel_tag</code> and
|
|
<code>__gnu_parallel::default_parallel_tag</code>, in addition to
|
|
<code>__gnu_parallel::sequential_tag</code>.
|
|
<code>__gnu_parallel::default_parallel_tag</code> chooses the default
|
|
algorithm at compiletime, as does omitting the tag.
|
|
<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
|
|
(see next section).
|
|
For all tags, the number of threads desired for this call can optionally be
|
|
passed to the respective tag's constructor.
|
|
</para>
|
|
|
|
<para>
|
|
The <code>multiway_merge</code> algorithm comes with the additional choices,
|
|
<code>__gnu_parallel::exact_tag</code> and
|
|
<code>__gnu_parallel::sampling_tag</code>.
|
|
Exact and sampling are the two available splitting strategies.
|
|
</para>
|
|
|
|
<para>
|
|
For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
|
|
several additional choices, namely
|
|
<code>__gnu_parallel::multiway_mergesort_tag</code>,
|
|
<code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
|
|
<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
|
|
<code>__gnu_parallel::quicksort_tag</code>, and
|
|
<code>__gnu_parallel::balanced_quicksort_tag</code>.
|
|
Multiway mergesort comes with the two splitting strategies for multi-way
|
|
merging. The quicksort options cannot be used for <code>stable_sort</code>.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
|
|
|
|
|
|
<para>
|
|
The default parallelization strategy, the choice of specific algorithm
|
|
strategy, the minimum threshold limits for individual parallel
|
|
algorithms, and aspects of the underlying hardware can be specified as
|
|
desired via manipulation
|
|
of <classname>__gnu_parallel::_Settings</classname> member data.
|
|
</para>
|
|
|
|
<para>
|
|
First off, the choice of parallelization strategy: serial, parallel,
|
|
or heuristically deduced. This corresponds
|
|
to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
|
|
value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
|
|
type. Choices
|
|
include: <type>heuristic</type>, <type>force_sequential</type>,
|
|
and <type>force_parallel</type>. The default is <type>heuristic</type>.
|
|
</para>
|
|
|
|
|
|
<para>
|
|
Next, the sub-choices for algorithm variant, if not fixed at compile-time.
|
|
Specific algorithms like <function>find</function> or <function>sort</function>
|
|
can be implemented in multiple ways: when this is the case,
|
|
a <classname>__gnu_parallel::_Settings</classname> member exists to
|
|
pick the default strategy. For
|
|
example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
|
|
have any values of
|
|
enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
|
|
or <type>QS_BALANCED</type>.
|
|
</para>
|
|
|
|
<para>
|
|
Likewise for setting the minimal threshold for algorithm
|
|
parallelization. Parallelism always incurs some overhead. Thus, it is
|
|
not helpful to parallelize operations on very small sets of
|
|
data. Because of this, measures are taken to avoid parallelizing below
|
|
a certain, pre-determined threshold. For each algorithm, a minimum
|
|
problem size is encoded as a variable in the
|
|
active <classname>__gnu_parallel::_Settings</classname> object. This
|
|
threshold variable follows the following naming scheme:
|
|
<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So,
|
|
for <function>fill</function>, the threshold variable
|
|
is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
|
|
</para>
|
|
|
|
<para>
|
|
Finally, hardware details like L1/L2 cache size can be hardwired
|
|
via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
|
|
</para>
|
|
|
|
<para>
|
|
</para>
|
|
|
|
<para>
|
|
All these configuration variables can be changed by the user, if
|
|
desired.
|
|
There exists one global instance of the class <classname>_Settings</classname>,
|
|
i. e. it is a singleton. It can be read and written by calling
|
|
<code>__gnu_parallel::_Settings::get</code> and
|
|
<code>__gnu_parallel::_Settings::set</code>, respectively.
|
|
Please note that the first call return a const object, so direct manipulation
|
|
is forbidden.
|
|
See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html">
|
|
<filename class="headerfile">settings.h</filename></link>
|
|
for complete details.
|
|
</para>
|
|
|
|
<para>
|
|
A small example of tuning the default:
|
|
</para>
|
|
|
|
<programlisting>
|
|
#include <parallel/algorithm>
|
|
#include <parallel/settings.h>
|
|
|
|
int main()
|
|
{
|
|
__gnu_parallel::_Settings s;
|
|
s.algorithm_strategy = __gnu_parallel::force_parallel;
|
|
__gnu_parallel::_Settings::set(s);
|
|
|
|
// Do work... all algorithms will be parallelized, always.
|
|
|
|
return 0;
|
|
}
|
|
</programlisting>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
|
|
|
|
|
|
<para> One namespace contain versions of code that are always
|
|
explicitly sequential:
|
|
<code>__gnu_serial</code>.
|
|
</para>
|
|
|
|
<para> Two namespaces contain the parallel mode:
|
|
<code>std::__parallel</code> and <code>__gnu_parallel</code>.
|
|
</para>
|
|
|
|
<para> Parallel implementations of standard components, including
|
|
template helpers to select parallelism, are defined in <code>namespace
|
|
std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
|
|
<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
|
|
implementations are injected into <code>namespace
|
|
__gnu_parallel</code> with using declarations.
|
|
</para>
|
|
|
|
<para> Support and general infrastructure is in <code>namespace
|
|
__gnu_parallel</code>.
|
|
</para>
|
|
|
|
<para> More information, and an organized index of types and functions
|
|
related to the parallel mode on a per-namespace basis, can be found in
|
|
the generated source documentation.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
|
|
|
|
|
|
<para>
|
|
Both the normal conformance and regression tests and the
|
|
supplemental performance tests work.
|
|
</para>
|
|
|
|
<para>
|
|
To run the conformance and regression tests with the parallel mode
|
|
active,
|
|
</para>
|
|
|
|
<screen>
|
|
<userinput>make check-parallel</userinput>
|
|
</screen>
|
|
|
|
<para>
|
|
The log and summary files for conformance testing are in the
|
|
<filename class="directory">testsuite/parallel</filename> directory.
|
|
</para>
|
|
|
|
<para>
|
|
To run the performance tests with the parallel mode active,
|
|
</para>
|
|
|
|
<screen>
|
|
<userinput>make check-performance-parallel</userinput>
|
|
</screen>
|
|
|
|
<para>
|
|
The result file for performance testing are in the
|
|
<filename class="directory">testsuite</filename> directory, in the file
|
|
<filename>libstdc++_performance.sum</filename>. In addition, the
|
|
policy-based containers have their own visualizations, which have
|
|
additional software dependencies than the usual bare-boned text
|
|
file, and can be generated by using the <code>make
|
|
doc-performance</code> rule in the testsuite's Makefile.
|
|
</para>
|
|
</section>
|
|
|
|
<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
|
|
|
|
|
|
<biblioentry>
|
|
<citetitle>
|
|
Parallelization of Bulk Operations for STL Dictionaries
|
|
</citetitle>
|
|
|
|
<author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
|
|
<author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
|
|
|
|
<copyright>
|
|
<year>2007</year>
|
|
<holder/>
|
|
</copyright>
|
|
|
|
<publisher>
|
|
<publishername>
|
|
Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
|
|
</publishername>
|
|
</publisher>
|
|
</biblioentry>
|
|
|
|
<biblioentry>
|
|
<citetitle>
|
|
The Multi-Core Standard Template Library
|
|
</citetitle>
|
|
|
|
<author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
|
|
<author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
|
|
<author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
|
|
|
|
<copyright>
|
|
<year>2007</year>
|
|
<holder/>
|
|
</copyright>
|
|
|
|
<publisher>
|
|
<publishername>
|
|
Euro-Par 2007: Parallel Processing. (LNCS 4641)
|
|
</publishername>
|
|
</publisher>
|
|
</biblioentry>
|
|
|
|
</bibliography>
|
|
|
|
</chapter>
|