Design

Design
Prev	Chapter 18. Parallel Mode	Next

Interface Basics

+All parallel algorithms are intended to have signatures that are +equivalent to the ISO C++ algorithms replaced. For instance, the +std::adjacent_find function is declared as: +

+namespace std
+{
+  template<typename _FIter>
+    _FIter
+    adjacent_find(_FIter, _FIter);
+}
+

+Which means that there should be something equivalent for the parallel +version. Indeed, this is the case: +

+namespace std
+{
+  namespace __parallel
+  {
+    template<typename _FIter>
+      _FIter
+      adjacent_find(_FIter, _FIter);
+
+    ...
+  }
+}
+

But.... why the ellipses? +

The ellipses in the example above represent additional overloads +required for the parallel version of the function. These additional +overloads are used to dispatch calls from the ISO C++ function +signature to the appropriate parallel function (or sequential +function, if no parallel functions are deemed worthy), based on either +compile-time or run-time conditions. +

The available signature options are specific for the different +algorithms/algorithm classes.

The general view of overloads for the parallel algorithms look like this: +

ISO C++ signature
ISO C++ signature + sequential_tag argument
ISO C++ signature + algorithm-specific tag type + (several signatures)

Please note that the implementation may use additional functions +(designated with the _switch suffix) to dispatch from the +ISO C++ signature to the correct parallel version. Also, some of the +algorithms do not have support for run-time conditions, so the last +overload is therefore missing. +

Configuration and Tuning

Setting up the OpenMP Environment

+Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls. +

+To specify the number of threads to be used for the algorithms globally, +use the function omp_set_num_threads. An example: +

+#include <stdlib.h>
+#include <omp.h>
+
+int main()
+{
+  // Explicitly set number of threads.
+  const int threads_wanted = 20;
+  omp_set_dynamic(false);
+  omp_set_num_threads(threads_wanted);
+
+  // Call parallel mode algorithms.
+
+  return 0;
+}
+

+ Some algorithms allow the number of threads being set for a particular call, + by augmenting the algorithm variant. + See the next section for further information. +

+Other parts of the runtime environment able to be manipulated include +nested parallelism (omp_set_nested), schedule kind +(omp_set_schedule), and others. See the OpenMP +documentation for more information. +

Compile Time Switches

+To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro _GLIBCXX_PARALLEL, +add __gnu_parallel::sequential_tag() to the end +of the algorithm's argument list. +

+Like so: +

+std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
+

+Some parallel algorithm variants can be excluded from compilation by +preprocessor defines. See the doxygen documentation on +compiletime_settings.h and features.h for details. +

+For some algorithms, the desired variant can be chosen at compile-time by +appending a tag object. The available options are specific to the particular +algorithm (class). +

+For the "embarrassingly parallel" algorithms, there is only one "tag object +type", the enum _Parallelism. +It takes one of the following values, +__gnu_parallel::parallel_tag, +__gnu_parallel::balanced_tag, +__gnu_parallel::unbalanced_tag, +__gnu_parallel::omp_loop_tag, +__gnu_parallel::omp_loop_static_tag. +This means that the actual parallelization strategy is chosen at run-time. +(Choosing the variants at compile-time will come soon.) +

+For the following algorithms in general, we have +__gnu_parallel::parallel_tag and +__gnu_parallel::default_parallel_tag, in addition to +__gnu_parallel::sequential_tag. +__gnu_parallel::default_parallel_tag chooses the default +algorithm at compiletime, as does omitting the tag. +__gnu_parallel::parallel_tag postpones the decision to runtime +(see next section). +For all tags, the number of threads desired for this call can optionally be +passed to the respective tag's constructor. +

+The multiway_merge algorithm comes with the additional choices, +__gnu_parallel::exact_tag and +__gnu_parallel::sampling_tag. +Exact and sampling are the two available splitting strategies. +

+For the sort and stable_sort algorithms, there are +several additional choices, namely +__gnu_parallel::multiway_mergesort_tag, +__gnu_parallel::multiway_mergesort_exact_tag, +__gnu_parallel::multiway_mergesort_sampling_tag, +__gnu_parallel::quicksort_tag, and +__gnu_parallel::balanced_quicksort_tag. +Multiway mergesort comes with the two splitting strategies for multi-way +merging. The quicksort options cannot be used for stable_sort. +

Run Time Settings and Defaults

+The default parallelization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of __gnu_parallel::_Settings member data. +

+First off, the choice of parallelization strategy: serial, parallel, +or heuristically deduced. This corresponds +to __gnu_parallel::_Settings::algorithm_strategy and is a +value of enum __gnu_parallel::_AlgorithmStrategy +type. Choices +include: heuristic, force_sequential, +and force_parallel. The default is heuristic. +

+Next, the sub-choices for algorithm variant, if not fixed at compile-time. +Specific algorithms like find or sort +can be implemented in multiple ways: when this is the case, +a __gnu_parallel::_Settings member exists to +pick the default strategy. For +example, __gnu_parallel::_Settings::sort_algorithm can +have any values of +enum __gnu_parallel::_SortAlgorithm: MWMS, QS, +or QS_BALANCED. +

+Likewise for setting the minimal threshold for algorithm +parallelization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active __gnu_parallel::_Settings object. This +threshold variable follows the following naming scheme: +__gnu_parallel::_Settings::[algorithm]_minimal_n. So, +for fill, the threshold variable +is __gnu_parallel::_Settings::fill_minimal_n, +

+Finally, hardware details like L1/L2 cache size can be hardwired +via __gnu_parallel::_Settings::L1_cache_size and friends. +

+All these configuration variables can be changed by the user, if +desired. +There exists one global instance of the class _Settings, +i. e. it is a singleton. It can be read and written by calling +__gnu_parallel::_Settings::get and +__gnu_parallel::_Settings::set, respectively. +Please note that the first call return a const object, so direct manipulation +is forbidden. +See + settings.h +for complete details. +

+A small example of tuning the default: +

+#include <parallel/algorithm>
+#include <parallel/settings.h>
+
+int main()
+{
+  __gnu_parallel::_Settings s;
+  s.algorithm_strategy = __gnu_parallel::force_parallel;
+  __gnu_parallel::_Settings::set(s);
+
+  // Do work... all algorithms will be parallelized, always.
+
+  return 0;
+}
+

Implementation Namespaces

One namespace contain versions of code that are always +explicitly sequential: +__gnu_serial. +

Two namespaces contain the parallel mode: +std::__parallel and __gnu_parallel. +

Parallel implementations of standard components, including +template helpers to select parallelism, are defined in namespace +std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in +std::__parallel::transform from parallel/algorithm. In addition, these parallel +implementations are injected into namespace +__gnu_parallel with using declarations. +

Support and general infrastructure is in namespace +__gnu_parallel. +

More information, and an organized index of types and functions +related to the parallel mode on a per-namespace basis, can be found in +the generated source documentation. +

Prev	Up	Next
Using	Home	Testing