diff options
Diffstat (limited to 'libstdc++-v3/doc/html/manual/bk01pt03ch18s04.html')
-rw-r--r-- | libstdc++-v3/doc/html/manual/bk01pt03ch18s04.html | 213 |
1 files changed, 213 insertions, 0 deletions
diff --git a/libstdc++-v3/doc/html/manual/bk01pt03ch18s04.html b/libstdc++-v3/doc/html/manual/bk01pt03ch18s04.html new file mode 100644 index 000000000..91c951d38 --- /dev/null +++ b/libstdc++-v3/doc/html/manual/bk01pt03ch18s04.html @@ -0,0 +1,213 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Design</title><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"/><meta name="keywords" content=" C++ , library , parallel "/><meta name="keywords" content=" ISO C++ , library "/><link rel="home" href="../spine.html" title="The GNU C++ Library"/><link rel="up" href="parallel_mode.html" title="Chapter 18. Parallel Mode"/><link rel="prev" href="bk01pt03ch18s03.html" title="Using"/><link rel="next" href="bk01pt03ch18s05.html" title="Testing"/></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Design</th></tr><tr><td align="left"><a accesskey="p" href="bk01pt03ch18s03.html">Prev</a> </td><th width="60%" align="center">Chapter 18. Parallel Mode</th><td align="right"> <a accesskey="n" href="bk01pt03ch18s05.html">Next</a></td></tr></table><hr/></div><div class="section" title="Design"><div class="titlepage"><div><div><h2 class="title"><a id="manual.ext.parallel_mode.design"/>Design</h2></div></div></div><p> + </p><div class="section" title="Interface Basics"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.intro"/>Interface Basics</h3></div></div></div><p> +All parallel algorithms are intended to have signatures that are +equivalent to the ISO C++ algorithms replaced. For instance, the +<code class="function">std::adjacent_find</code> function is declared as: +</p><pre class="programlisting"> +namespace std +{ + template<typename _FIter> + _FIter + adjacent_find(_FIter, _FIter); +} +</pre><p> +Which means that there should be something equivalent for the parallel +version. Indeed, this is the case: +</p><pre class="programlisting"> +namespace std +{ + namespace __parallel + { + template<typename _FIter> + _FIter + adjacent_find(_FIter, _FIter); + + ... + } +} +</pre><p>But.... why the ellipses? +</p><p> The ellipses in the example above represent additional overloads +required for the parallel version of the function. These additional +overloads are used to dispatch calls from the ISO C++ function +signature to the appropriate parallel function (or sequential +function, if no parallel functions are deemed worthy), based on either +compile-time or run-time conditions. +</p><p> The available signature options are specific for the different +algorithms/algorithm classes.</p><p> The general view of overloads for the parallel algorithms look like this: +</p><div class="itemizedlist"><ul class="itemizedlist"><li class="listitem"><p>ISO C++ signature</p></li><li class="listitem"><p>ISO C++ signature + sequential_tag argument</p></li><li class="listitem"><p>ISO C++ signature + algorithm-specific tag type + (several signatures)</p></li></ul></div><p> Please note that the implementation may use additional functions +(designated with the <code class="code">_switch</code> suffix) to dispatch from the +ISO C++ signature to the correct parallel version. Also, some of the +algorithms do not have support for run-time conditions, so the last +overload is therefore missing. +</p></div><div class="section" title="Configuration and Tuning"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.tuning"/>Configuration and Tuning</h3></div></div></div><div class="section" title="Setting up the OpenMP Environment"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.omp"/>Setting up the OpenMP Environment</h4></div></div></div><p> +Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls. +</p><p> +To specify the number of threads to be used for the algorithms globally, +use the function <code class="function">omp_set_num_threads</code>. An example: +</p><pre class="programlisting"> +#include <stdlib.h> +#include <omp.h> + +int main() +{ + // Explicitly set number of threads. + const int threads_wanted = 20; + omp_set_dynamic(false); + omp_set_num_threads(threads_wanted); + + // Call parallel mode algorithms. + + return 0; +} +</pre><p> + Some algorithms allow the number of threads being set for a particular call, + by augmenting the algorithm variant. + See the next section for further information. +</p><p> +Other parts of the runtime environment able to be manipulated include +nested parallelism (<code class="function">omp_set_nested</code>), schedule kind +(<code class="function">omp_set_schedule</code>), and others. See the OpenMP +documentation for more information. +</p></div><div class="section" title="Compile Time Switches"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.compile"/>Compile Time Switches</h4></div></div></div><p> +To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro <code class="constant">_GLIBCXX_PARALLEL</code>, +add <code class="classname">__gnu_parallel::sequential_tag()</code> to the end +of the algorithm's argument list. +</p><p> +Like so: +</p><pre class="programlisting"> +std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); +</pre><p> +Some parallel algorithm variants can be excluded from compilation by +preprocessor defines. See the doxygen documentation on +<code class="code">compiletime_settings.h</code> and <code class="code">features.h</code> for details. +</p><p> +For some algorithms, the desired variant can be chosen at compile-time by +appending a tag object. The available options are specific to the particular +algorithm (class). +</p><p> +For the "embarrassingly parallel" algorithms, there is only one "tag object +type", the enum _Parallelism. +It takes one of the following values, +<code class="code">__gnu_parallel::parallel_tag</code>, +<code class="code">__gnu_parallel::balanced_tag</code>, +<code class="code">__gnu_parallel::unbalanced_tag</code>, +<code class="code">__gnu_parallel::omp_loop_tag</code>, +<code class="code">__gnu_parallel::omp_loop_static_tag</code>. +This means that the actual parallelization strategy is chosen at run-time. +(Choosing the variants at compile-time will come soon.) +</p><p> +For the following algorithms in general, we have +<code class="code">__gnu_parallel::parallel_tag</code> and +<code class="code">__gnu_parallel::default_parallel_tag</code>, in addition to +<code class="code">__gnu_parallel::sequential_tag</code>. +<code class="code">__gnu_parallel::default_parallel_tag</code> chooses the default +algorithm at compiletime, as does omitting the tag. +<code class="code">__gnu_parallel::parallel_tag</code> postpones the decision to runtime +(see next section). +For all tags, the number of threads desired for this call can optionally be +passed to the respective tag's constructor. +</p><p> +The <code class="code">multiway_merge</code> algorithm comes with the additional choices, +<code class="code">__gnu_parallel::exact_tag</code> and +<code class="code">__gnu_parallel::sampling_tag</code>. +Exact and sampling are the two available splitting strategies. +</p><p> +For the <code class="code">sort</code> and <code class="code">stable_sort</code> algorithms, there are +several additional choices, namely +<code class="code">__gnu_parallel::multiway_mergesort_tag</code>, +<code class="code">__gnu_parallel::multiway_mergesort_exact_tag</code>, +<code class="code">__gnu_parallel::multiway_mergesort_sampling_tag</code>, +<code class="code">__gnu_parallel::quicksort_tag</code>, and +<code class="code">__gnu_parallel::balanced_quicksort_tag</code>. +Multiway mergesort comes with the two splitting strategies for multi-way +merging. The quicksort options cannot be used for <code class="code">stable_sort</code>. +</p></div><div class="section" title="Run Time Settings and Defaults"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.settings"/>Run Time Settings and Defaults</h4></div></div></div><p> +The default parallelization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of <code class="classname">__gnu_parallel::_Settings</code> member data. +</p><p> +First off, the choice of parallelization strategy: serial, parallel, +or heuristically deduced. This corresponds +to <code class="code">__gnu_parallel::_Settings::algorithm_strategy</code> and is a +value of enum <span class="type">__gnu_parallel::_AlgorithmStrategy</span> +type. Choices +include: <span class="type">heuristic</span>, <span class="type">force_sequential</span>, +and <span class="type">force_parallel</span>. The default is <span class="type">heuristic</span>. +</p><p> +Next, the sub-choices for algorithm variant, if not fixed at compile-time. +Specific algorithms like <code class="function">find</code> or <code class="function">sort</code> +can be implemented in multiple ways: when this is the case, +a <code class="classname">__gnu_parallel::_Settings</code> member exists to +pick the default strategy. For +example, <code class="code">__gnu_parallel::_Settings::sort_algorithm</code> can +have any values of +enum <span class="type">__gnu_parallel::_SortAlgorithm</span>: <span class="type">MWMS</span>, <span class="type">QS</span>, +or <span class="type">QS_BALANCED</span>. +</p><p> +Likewise for setting the minimal threshold for algorithm +parallelization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active <code class="classname">__gnu_parallel::_Settings</code> object. This +threshold variable follows the following naming scheme: +<code class="code">__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, +for <code class="function">fill</code>, the threshold variable +is <code class="code">__gnu_parallel::_Settings::fill_minimal_n</code>, +</p><p> +Finally, hardware details like L1/L2 cache size can be hardwired +via <code class="code">__gnu_parallel::_Settings::L1_cache_size</code> and friends. +</p><p> +</p><p> +All these configuration variables can be changed by the user, if +desired. +There exists one global instance of the class <code class="classname">_Settings</code>, +i. e. it is a singleton. It can be read and written by calling +<code class="code">__gnu_parallel::_Settings::get</code> and +<code class="code">__gnu_parallel::_Settings::set</code>, respectively. +Please note that the first call return a const object, so direct manipulation +is forbidden. +See <a class="link" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html"> + <code class="filename">settings.h</code></a> +for complete details. +</p><p> +A small example of tuning the default: +</p><pre class="programlisting"> +#include <parallel/algorithm> +#include <parallel/settings.h> + +int main() +{ + __gnu_parallel::_Settings s; + s.algorithm_strategy = __gnu_parallel::force_parallel; + __gnu_parallel::_Settings::set(s); + + // Do work... all algorithms will be parallelized, always. + + return 0; +} +</pre></div></div><div class="section" title="Implementation Namespaces"><div class="titlepage"><div><div><h3 class="title"><a id="parallel_mode.design.impl"/>Implementation Namespaces</h3></div></div></div><p> One namespace contain versions of code that are always +explicitly sequential: +<code class="code">__gnu_serial</code>. +</p><p> Two namespaces contain the parallel mode: +<code class="code">std::__parallel</code> and <code class="code">__gnu_parallel</code>. +</p><p> Parallel implementations of standard components, including +template helpers to select parallelism, are defined in <code class="code">namespace +std::__parallel</code>. For instance, <code class="function">std::transform</code> from <code class="filename">algorithm</code> has a parallel counterpart in +<code class="function">std::__parallel::transform</code> from <code class="filename">parallel/algorithm</code>. In addition, these parallel +implementations are injected into <code class="code">namespace +__gnu_parallel</code> with using declarations. +</p><p> Support and general infrastructure is in <code class="code">namespace +__gnu_parallel</code>. +</p><p> More information, and an organized index of types and functions +related to the parallel mode on a per-namespace basis, can be found in +the generated source documentation. +</p></div></div><div class="navfooter"><hr/><table width="100%" summary="Navigation footer"><tr><td align="left"><a accesskey="p" href="bk01pt03ch18s03.html">Prev</a> </td><td align="center"><a accesskey="u" href="parallel_mode.html">Up</a></td><td align="right"> <a accesskey="n" href="bk01pt03ch18s05.html">Next</a></td></tr><tr><td align="left" valign="top">Using </td><td align="center"><a accesskey="h" href="../spine.html">Home</a></td><td align="right" valign="top"> Testing</td></tr></table></div></body></html> |