1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 
2	 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
3<?dbhtml filename="parallel_mode.html"?>
4
5<info><title>Parallel Mode</title>
6  <keywordset>
7    <keyword>C++</keyword>
8    <keyword>library</keyword>
9    <keyword>parallel</keyword>
10  </keywordset>
11</info>
12
13
14
15<para> The libstdc++ parallel mode is an experimental parallel
16implementation of many algorithms of the C++ Standard Library.
17</para>
18
19<para>
20Several of the standard algorithms, for instance
21<function>std::sort</function>, are made parallel using OpenMP
22annotations. These parallel mode constructs can be invoked by
23explicit source declaration or by compiling existing sources with a
24specific compiler flag.
25</para>
26
27<note>
28  <para>
29    The parallel mode has not been kept up to date with recent C++ standards
30    and so it only conforms to the C++03 requirements.
31    That means that move-only predicates may not work with parallel mode
32    algorithms, and for C++20 most of the algorithms cannot be used in
33    <code>constexpr</code> functions.
34  </para>
35  <para>
36    For C++17 and above there are new overloads of the standard algorithms
37    which take an execution policy argument. You should consider using those
38    instead of the non-standard parallel mode extensions.
39  </para>
40</note>
41
42<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
43  
44
45<para>The following library components in the include
46<filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
47<itemizedlist>
48  <listitem><para><function>std::accumulate</function></para></listitem>
49  <listitem><para><function>std::adjacent_difference</function></para></listitem>
50  <listitem><para><function>std::inner_product</function></para></listitem>
51  <listitem><para><function>std::partial_sum</function></para></listitem>
52</itemizedlist>
53
54<para>The following library components in the include
55<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
56<itemizedlist>
57  <listitem><para><function>std::adjacent_find</function></para></listitem>
58  <listitem><para><function>std::count</function></para></listitem>
59  <listitem><para><function>std::count_if</function></para></listitem>
60  <listitem><para><function>std::equal</function></para></listitem>
61  <listitem><para><function>std::find</function></para></listitem>
62  <listitem><para><function>std::find_if</function></para></listitem>
63  <listitem><para><function>std::find_first_of</function></para></listitem>
64  <listitem><para><function>std::for_each</function></para></listitem>
65  <listitem><para><function>std::generate</function></para></listitem>
66  <listitem><para><function>std::generate_n</function></para></listitem>
67  <listitem><para><function>std::lexicographical_compare</function></para></listitem>
68  <listitem><para><function>std::mismatch</function></para></listitem>
69  <listitem><para><function>std::search</function></para></listitem>
70  <listitem><para><function>std::search_n</function></para></listitem>
71  <listitem><para><function>std::transform</function></para></listitem>
72  <listitem><para><function>std::replace</function></para></listitem>
73  <listitem><para><function>std::replace_if</function></para></listitem>
74  <listitem><para><function>std::max_element</function></para></listitem>
75  <listitem><para><function>std::merge</function></para></listitem>
76  <listitem><para><function>std::min_element</function></para></listitem>
77  <listitem><para><function>std::nth_element</function></para></listitem>
78  <listitem><para><function>std::partial_sort</function></para></listitem>
79  <listitem><para><function>std::partition</function></para></listitem>
80  <listitem><para><function>std::random_shuffle</function></para></listitem>
81  <listitem><para><function>std::set_union</function></para></listitem>
82  <listitem><para><function>std::set_intersection</function></para></listitem>
83  <listitem><para><function>std::set_symmetric_difference</function></para></listitem>
84  <listitem><para><function>std::set_difference</function></para></listitem>
85  <listitem><para><function>std::sort</function></para></listitem>
86  <listitem><para><function>std::stable_sort</function></para></listitem>
87  <listitem><para><function>std::unique_copy</function></para></listitem>
88</itemizedlist>
89
90</section>
91
92<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
93<?dbhtml filename="parallel_mode_semantics.html"?>
94  
95
96<para> The parallel mode STL algorithms are currently not exception-safe,
97i.e. user-defined functors must not throw exceptions.
98Also, the order of execution is not guaranteed for some functions, of course.
99Therefore, user-defined functors should not have any concurrent side effects.
100</para>
101
102<para> Since the current GCC OpenMP implementation does not support
103OpenMP parallel regions in concurrent threads,
104it is not possible to call parallel STL algorithm in
105concurrent threads, either.
106It might work with other compilers, though.</para>
107
108</section>
109
110<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
111<?dbhtml filename="parallel_mode_using.html"?>
112  
113
114<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
115  
116
117<para>
118  Any use of parallel functionality requires additional compiler
119  and runtime support, in particular support for OpenMP. Adding this support is
120  not difficult: just compile your application with the compiler
121  flag <literal>-fopenmp</literal>. This will link
122  in <code>libgomp</code>, the
123  <link xmlns:xlink="http://www.w3.org/1999/xlink"
124    xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
125    Multi Processing Runtime Library</link>,
126  whose presence is mandatory.
127</para>
128
129<para>
130In addition, hardware that supports atomic operations and a compiler
131  capable of producing atomic operations is mandatory: GCC defaults to no
132  support for atomic operations on some common hardware
133  architectures. Activating atomic operations may require explicit
134  compiler flags on some targets (like sparc and x86), such
135  as <literal>-march=i686</literal>,
136  <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
137  the GCC manual for more information.
138</para>
139
140</section>
141
142<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
143  
144
145<para>
146  To use the libstdc++ parallel mode, compile your application with
147  the prerequisite flags as detailed above, and in addition
148  add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
149  use of the standard (sequential) algorithms to the appropriate parallel
150  equivalents. Please note that this doesn't necessarily mean that
151  everything will end up being executed in a parallel manner, but
152  rather that the heuristics and settings coded into the parallel
153  versions will be used to determine if all, some, or no algorithms
154  will be executed using parallel variants.
155</para>
156
157<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
158  sizes and behavior of standard class templates such as
159  <function>std::search</function>, and therefore one can only link code
160  compiled with parallel mode and code compiled without parallel mode
161  if no instantiation of a container is passed between the two
162  translation units. Parallel mode functionality has distinct linkage,
163  and cannot be confused with normal mode symbols.
164</para>
165</section>
166
167<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
168  
169
170<para>When it is not feasible to recompile your entire application, or
171  only specific algorithms need to be parallel-aware, individual
172  parallel algorithms can be made available explicitly. These
173  parallel algorithms are functionally equivalent to the standard
174  drop-in algorithms used in parallel mode, but they are available in
175  a separate namespace as GNU extensions and may be used in programs
176  compiled with either release mode or with parallel mode.
177</para>
178
179
180<para>An example of using a parallel version
181of <function>std::sort</function>, but no other parallel algorithms, is:
182</para>
183
184<programlisting>
185#include &lt;vector&gt;
186#include &lt;parallel/algorithm&gt;
187
188int main()
189{
190  std::vector&lt;int&gt; v(100);
191
192  // ...
193
194  // Explicitly force a call to parallel sort.
195  __gnu_parallel::sort(v.begin(), v.end());
196  return 0;
197}
198</programlisting>
199
200<para>
201Then compile this code with the prerequisite compiler flags
202(<literal>-fopenmp</literal> and any necessary architecture-specific
203flags for atomic operations.)
204</para>
205
206<para> The following table provides the names and headers of all the
207  parallel algorithms that can be used in a similar manner:
208</para>
209
210<table frame="all" xml:id="table.parallel_algos">
211<title>Parallel Algorithms</title>
212
213<tgroup cols="4" align="left" colsep="1" rowsep="1">
214<colspec colname="c1"/>
215<colspec colname="c2"/>
216<colspec colname="c3"/>
217<colspec colname="c4"/>
218
219<thead>
220  <row>
221    <entry>Algorithm</entry>
222    <entry>Header</entry>
223    <entry>Parallel algorithm</entry>
224    <entry>Parallel header</entry>
225  </row>
226</thead>
227
228<tbody>
229  <row>
230    <entry><function>std::accumulate</function></entry>
231    <entry><filename class="headerfile">numeric</filename></entry>
232    <entry><function>__gnu_parallel::accumulate</function></entry>
233    <entry><filename class="headerfile">parallel/numeric</filename></entry>
234  </row>
235  <row>
236    <entry><function>std::adjacent_difference</function></entry>
237    <entry><filename class="headerfile">numeric</filename></entry>
238    <entry><function>__gnu_parallel::adjacent_difference</function></entry>
239    <entry><filename class="headerfile">parallel/numeric</filename></entry>
240  </row>
241  <row>
242    <entry><function>std::inner_product</function></entry>
243    <entry><filename class="headerfile">numeric</filename></entry>
244    <entry><function>__gnu_parallel::inner_product</function></entry>
245    <entry><filename class="headerfile">parallel/numeric</filename></entry>
246  </row>
247  <row>
248    <entry><function>std::partial_sum</function></entry>
249    <entry><filename class="headerfile">numeric</filename></entry>
250    <entry><function>__gnu_parallel::partial_sum</function></entry>
251    <entry><filename class="headerfile">parallel/numeric</filename></entry>
252  </row>
253  <row>
254    <entry><function>std::adjacent_find</function></entry>
255    <entry><filename class="headerfile">algorithm</filename></entry>
256    <entry><function>__gnu_parallel::adjacent_find</function></entry>
257    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
258  </row>
259
260  <row>
261    <entry><function>std::count</function></entry>
262    <entry><filename class="headerfile">algorithm</filename></entry>
263    <entry><function>__gnu_parallel::count</function></entry>
264    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
265  </row>
266
267  <row>
268    <entry><function>std::count_if</function></entry>
269    <entry><filename class="headerfile">algorithm</filename></entry>
270    <entry><function>__gnu_parallel::count_if</function></entry>
271    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
272  </row>
273
274  <row>
275    <entry><function>std::equal</function></entry>
276    <entry><filename class="headerfile">algorithm</filename></entry>
277    <entry><function>__gnu_parallel::equal</function></entry>
278    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
279  </row>
280
281  <row>
282    <entry><function>std::find</function></entry>
283    <entry><filename class="headerfile">algorithm</filename></entry>
284    <entry><function>__gnu_parallel::find</function></entry>
285    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
286  </row>
287
288  <row>
289    <entry><function>std::find_if</function></entry>
290    <entry><filename class="headerfile">algorithm</filename></entry>
291    <entry><function>__gnu_parallel::find_if</function></entry>
292    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
293  </row>
294
295  <row>
296    <entry><function>std::find_first_of</function></entry>
297    <entry><filename class="headerfile">algorithm</filename></entry>
298    <entry><function>__gnu_parallel::find_first_of</function></entry>
299    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
300  </row>
301
302  <row>
303    <entry><function>std::for_each</function></entry>
304    <entry><filename class="headerfile">algorithm</filename></entry>
305    <entry><function>__gnu_parallel::for_each</function></entry>
306    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
307  </row>
308
309  <row>
310    <entry><function>std::generate</function></entry>
311    <entry><filename class="headerfile">algorithm</filename></entry>
312    <entry><function>__gnu_parallel::generate</function></entry>
313    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
314  </row>
315
316  <row>
317    <entry><function>std::generate_n</function></entry>
318    <entry><filename class="headerfile">algorithm</filename></entry>
319    <entry><function>__gnu_parallel::generate_n</function></entry>
320    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
321  </row>
322
323  <row>
324    <entry><function>std::lexicographical_compare</function></entry>
325    <entry><filename class="headerfile">algorithm</filename></entry>
326    <entry><function>__gnu_parallel::lexicographical_compare</function></entry>
327    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
328  </row>
329
330  <row>
331    <entry><function>std::mismatch</function></entry>
332    <entry><filename class="headerfile">algorithm</filename></entry>
333    <entry><function>__gnu_parallel::mismatch</function></entry>
334    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
335  </row>
336
337  <row>
338    <entry><function>std::search</function></entry>
339    <entry><filename class="headerfile">algorithm</filename></entry>
340    <entry><function>__gnu_parallel::search</function></entry>
341    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
342  </row>
343
344  <row>
345    <entry><function>std::search_n</function></entry>
346    <entry><filename class="headerfile">algorithm</filename></entry>
347    <entry><function>__gnu_parallel::search_n</function></entry>
348    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
349  </row>
350
351  <row>
352    <entry><function>std::transform</function></entry>
353    <entry><filename class="headerfile">algorithm</filename></entry>
354    <entry><function>__gnu_parallel::transform</function></entry>
355    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
356  </row>
357
358  <row>
359    <entry><function>std::replace</function></entry>
360    <entry><filename class="headerfile">algorithm</filename></entry>
361    <entry><function>__gnu_parallel::replace</function></entry>
362    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
363  </row>
364
365  <row>
366    <entry><function>std::replace_if</function></entry>
367    <entry><filename class="headerfile">algorithm</filename></entry>
368    <entry><function>__gnu_parallel::replace_if</function></entry>
369    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
370  </row>
371
372  <row>
373    <entry><function>std::max_element</function></entry>
374    <entry><filename class="headerfile">algorithm</filename></entry>
375    <entry><function>__gnu_parallel::max_element</function></entry>
376    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
377  </row>
378
379  <row>
380    <entry><function>std::merge</function></entry>
381    <entry><filename class="headerfile">algorithm</filename></entry>
382    <entry><function>__gnu_parallel::merge</function></entry>
383    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
384  </row>
385
386  <row>
387    <entry><function>std::min_element</function></entry>
388    <entry><filename class="headerfile">algorithm</filename></entry>
389    <entry><function>__gnu_parallel::min_element</function></entry>
390    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
391  </row>
392
393  <row>
394    <entry><function>std::nth_element</function></entry>
395    <entry><filename class="headerfile">algorithm</filename></entry>
396    <entry><function>__gnu_parallel::nth_element</function></entry>
397    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
398  </row>
399
400  <row>
401    <entry><function>std::partial_sort</function></entry>
402    <entry><filename class="headerfile">algorithm</filename></entry>
403    <entry><function>__gnu_parallel::partial_sort</function></entry>
404    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
405  </row>
406
407  <row>
408    <entry><function>std::partition</function></entry>
409    <entry><filename class="headerfile">algorithm</filename></entry>
410    <entry><function>__gnu_parallel::partition</function></entry>
411    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
412  </row>
413
414  <row>
415    <entry><function>std::random_shuffle</function></entry>
416    <entry><filename class="headerfile">algorithm</filename></entry>
417    <entry><function>__gnu_parallel::random_shuffle</function></entry>
418    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
419  </row>
420
421  <row>
422    <entry><function>std::set_union</function></entry>
423    <entry><filename class="headerfile">algorithm</filename></entry>
424    <entry><function>__gnu_parallel::set_union</function></entry>
425    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
426  </row>
427
428  <row>
429    <entry><function>std::set_intersection</function></entry>
430    <entry><filename class="headerfile">algorithm</filename></entry>
431    <entry><function>__gnu_parallel::set_intersection</function></entry>
432    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
433  </row>
434
435  <row>
436    <entry><function>std::set_symmetric_difference</function></entry>
437    <entry><filename class="headerfile">algorithm</filename></entry>
438    <entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
439    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
440  </row>
441
442  <row>
443    <entry><function>std::set_difference</function></entry>
444    <entry><filename class="headerfile">algorithm</filename></entry>
445    <entry><function>__gnu_parallel::set_difference</function></entry>
446    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
447  </row>
448
449  <row>
450    <entry><function>std::sort</function></entry>
451    <entry><filename class="headerfile">algorithm</filename></entry>
452    <entry><function>__gnu_parallel::sort</function></entry>
453    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
454  </row>
455
456  <row>
457    <entry><function>std::stable_sort</function></entry>
458    <entry><filename class="headerfile">algorithm</filename></entry>
459    <entry><function>__gnu_parallel::stable_sort</function></entry>
460    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
461  </row>
462
463  <row>
464    <entry><function>std::unique_copy</function></entry>
465    <entry><filename class="headerfile">algorithm</filename></entry>
466    <entry><function>__gnu_parallel::unique_copy</function></entry>
467    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
468  </row>
469</tbody>
470</tgroup>
471</table>
472
473</section>
474
475</section>
476
477<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
478<?dbhtml filename="parallel_mode_design.html"?>
479  
480  <para>
481  </para>
482<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
483  
484
485<para>
486All parallel algorithms are intended to have signatures that are
487equivalent to the ISO C++ algorithms replaced. For instance, the
488<function>std::adjacent_find</function> function is declared as:
489</para>
490<programlisting>
491namespace std
492{
493  template&lt;typename _FIter&gt;
494    _FIter
495    adjacent_find(_FIter, _FIter);
496}
497</programlisting>
498
499<para>
500Which means that there should be something equivalent for the parallel
501version. Indeed, this is the case:
502</para>
503
504<programlisting>
505namespace std
506{
507  namespace __parallel
508  {
509    template&lt;typename _FIter&gt;
510      _FIter
511      adjacent_find(_FIter, _FIter);
512
513    ...
514  }
515}
516</programlisting>
517
518<para>But.... why the ellipses?
519</para>
520
521<para> The ellipses in the example above represent additional overloads
522required for the parallel version of the function. These additional
523overloads are used to dispatch calls from the ISO C++ function
524signature to the appropriate parallel function (or sequential
525function, if no parallel functions are deemed worthy), based on either
526compile-time or run-time conditions.
527</para>
528
529<para> The available signature options are specific for the different
530algorithms/algorithm classes.</para>
531
532<para> The general view of overloads for the parallel algorithms look like this:
533</para>
534<itemizedlist>
535   <listitem><para>ISO C++ signature</para></listitem>
536   <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
537   <listitem><para>ISO C++ signature + algorithm-specific tag type
538    (several signatures)</para></listitem>
539</itemizedlist>
540
541<para> Please note that the implementation may use additional functions
542(designated with the <code>_switch</code> suffix) to dispatch from the
543ISO C++ signature to the correct parallel version. Also, some of the
544algorithms do not have support for run-time conditions, so the last
545overload is therefore missing.
546</para>
547
548
549</section>
550
551<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
552  
553
554
555<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
556  
557
558<para>
559Several aspects of the overall runtime environment can be manipulated
560by standard OpenMP function calls.
561</para>
562
563<para>
564To specify the number of threads to be used for the algorithms globally,
565use the function <function>omp_set_num_threads</function>. An example:
566</para>
567
568<programlisting>
569#include &lt;stdlib.h&gt;
570#include &lt;omp.h&gt;
571
572int main()
573{
574  // Explicitly set number of threads.
575  const int threads_wanted = 20;
576  omp_set_dynamic(false);
577  omp_set_num_threads(threads_wanted);
578
579  // Call parallel mode algorithms.
580
581  return 0;
582}
583</programlisting>
584
585<para>
586 Some algorithms allow the number of threads being set for a particular call,
587 by augmenting the algorithm variant.
588 See the next section for further information.
589</para>
590
591<para>
592Other parts of the runtime environment able to be manipulated include
593nested parallelism (<function>omp_set_nested</function>), schedule kind
594(<function>omp_set_schedule</function>), and others. See the OpenMP
595documentation for more information.
596</para>
597
598</section>
599
600<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
601  
602
603<para>
604To force an algorithm to execute sequentially, even though parallelism
605is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
606add <classname>__gnu_parallel::sequential_tag()</classname> to the end
607of the algorithm's argument list.
608</para>
609
610<para>
611Like so:
612</para>
613
614<programlisting>
615std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
616</programlisting>
617
618<para>
619Some parallel algorithm variants can be excluded from compilation by
620preprocessor defines. See the doxygen documentation on
621<code>compiletime_settings.h</code> and <code>features.h</code> for details.
622</para>
623
624<para>
625For some algorithms, the desired variant can be chosen at compile-time by
626appending a tag object. The available options are specific to the particular
627algorithm (class).
628</para>
629
630<para>
631For the "embarrassingly parallel" algorithms, there is only one "tag object
632type", the enum _Parallelism.
633It takes one of the following values,
634<code>__gnu_parallel::parallel_tag</code>,
635<code>__gnu_parallel::balanced_tag</code>,
636<code>__gnu_parallel::unbalanced_tag</code>,
637<code>__gnu_parallel::omp_loop_tag</code>,
638<code>__gnu_parallel::omp_loop_static_tag</code>.
639This means that the actual parallelization strategy is chosen at run-time.
640(Choosing the variants at compile-time will come soon.)
641</para>
642
643<para>
644For the following algorithms in general, we have
645<code>__gnu_parallel::parallel_tag</code> and
646<code>__gnu_parallel::default_parallel_tag</code>, in addition to
647<code>__gnu_parallel::sequential_tag</code>.
648<code>__gnu_parallel::default_parallel_tag</code> chooses the default
649algorithm at compiletime, as does omitting the tag.
650<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
651(see next section).
652For all tags, the number of threads desired for this call can optionally be
653passed to the respective tag's constructor.
654</para>
655
656<para>
657The <code>multiway_merge</code> algorithm comes with the additional choices,
658<code>__gnu_parallel::exact_tag</code> and
659<code>__gnu_parallel::sampling_tag</code>.
660Exact and sampling are the two available splitting strategies.
661</para>
662
663<para>
664For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
665several additional choices, namely
666<code>__gnu_parallel::multiway_mergesort_tag</code>,
667<code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
668<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
669<code>__gnu_parallel::quicksort_tag</code>, and
670<code>__gnu_parallel::balanced_quicksort_tag</code>.
671Multiway mergesort comes with the two splitting strategies for multi-way
672merging. The quicksort options cannot be used for <code>stable_sort</code>.
673</para>
674
675</section>
676
677<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
678  
679
680<para>
681The default parallelization strategy, the choice of specific algorithm
682strategy, the minimum threshold limits for individual parallel
683algorithms, and aspects of the underlying hardware can be specified as
684desired via manipulation
685of <classname>__gnu_parallel::_Settings</classname> member data.
686</para>
687
688<para>
689First off, the choice of parallelization strategy: serial, parallel,
690or heuristically deduced. This corresponds
691to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
692value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
693type. Choices
694include: <type>heuristic</type>, <type>force_sequential</type>,
695and <type>force_parallel</type>. The default is <type>heuristic</type>.
696</para>
697
698
699<para>
700Next, the sub-choices for algorithm variant, if not fixed at compile-time.
701Specific algorithms like <function>find</function> or <function>sort</function>
702can be implemented in multiple ways: when this is the case,
703a <classname>__gnu_parallel::_Settings</classname> member exists to
704pick the default strategy. For
705example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
706have any values of
707enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
708or <type>QS_BALANCED</type>.
709</para>
710
711<para>
712Likewise for setting the minimal threshold for algorithm
713parallelization.  Parallelism always incurs some overhead. Thus, it is
714not helpful to parallelize operations on very small sets of
715data. Because of this, measures are taken to avoid parallelizing below
716a certain, pre-determined threshold. For each algorithm, a minimum
717problem size is encoded as a variable in the
718active <classname>__gnu_parallel::_Settings</classname> object.  This
719threshold variable follows the following naming scheme:
720<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So,
721for <function>fill</function>, the threshold variable
722is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
723</para>
724
725<para>
726Finally, hardware details like L1/L2 cache size can be hardwired
727via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
728</para>
729
730<para>
731</para>
732
733<para>
734All these configuration variables can be changed by the user, if
735desired.
736There exists one global instance of the class <classname>_Settings</classname>,
737i. e. it is a singleton. It can be read and written by calling
738<code>__gnu_parallel::_Settings::get</code> and
739<code>__gnu_parallel::_Settings::set</code>, respectively.
740Please note that the first call return a const object, so direct manipulation
741is forbidden.
742See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html">
743  <filename class="headerfile">&lt;parallel/settings.h&gt;</filename></link>
744for complete details.
745</para>
746
747<para>
748A small example of tuning the default:
749</para>
750
751<programlisting>
752#include &lt;parallel/algorithm&gt;
753#include &lt;parallel/settings.h&gt;
754
755int main()
756{
757  __gnu_parallel::_Settings s;
758  s.algorithm_strategy = __gnu_parallel::force_parallel;
759  __gnu_parallel::_Settings::set(s);
760
761  // Do work... all algorithms will be parallelized, always.
762
763  return 0;
764}
765</programlisting>
766
767</section>
768
769</section>
770
771<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
772  
773
774<para> One namespace contain versions of code that are always
775explicitly sequential:
776<code>__gnu_serial</code>.
777</para>
778
779<para> Two namespaces contain the parallel mode:
780<code>std::__parallel</code> and <code>__gnu_parallel</code>.
781</para>
782
783<para> Parallel implementations of standard components, including
784template helpers to select parallelism, are defined in <code>namespace
785std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
786<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
787implementations are injected into <code>namespace
788__gnu_parallel</code> with using declarations.
789</para>
790
791<para> Support and general infrastructure is in <code>namespace
792__gnu_parallel</code>.
793</para>
794
795<para> More information, and an organized index of types and functions
796related to the parallel mode on a per-namespace basis, can be found in
797the generated source documentation.
798</para>
799
800</section>
801
802</section>
803
804<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
805<?dbhtml filename="parallel_mode_test.html"?>
806  
807
808  <para>
809    Both the normal conformance and regression tests and the
810    supplemental performance tests work.
811  </para>
812
813  <para>
814    To run the conformance and regression tests with the parallel mode
815    active,
816  </para>
817
818  <screen>
819  <userinput>make check-parallel</userinput>
820  </screen>
821
822  <para>
823    The log and summary files for conformance testing are in the
824    <filename class="directory">testsuite/parallel</filename> directory.
825  </para>
826
827  <para>
828    To run the performance tests with the parallel mode active,
829  </para>
830
831  <screen>
832  <userinput>make check-performance-parallel</userinput>
833  </screen>
834
835  <para>
836    The result file for performance testing are in the
837    <filename class="directory">testsuite</filename> directory, in the file
838    <filename>libstdc++_performance.sum</filename>. In addition, the
839    policy-based containers have their own visualizations, which have
840    additional software dependencies than the usual bare-boned text
841    file, and can be generated by using the <code>make
842    doc-performance</code> rule in the testsuite's Makefile.
843</para>
844</section>
845
846<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
847
848
849  <biblioentry>
850    <citetitle>
851      Parallelization of Bulk Operations for STL Dictionaries
852    </citetitle>
853
854    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
855    <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
856
857    <copyright>
858      <year>2007</year>
859      <holder/>
860    </copyright>
861
862    <publisher>
863      <publishername>
864	Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
865      </publishername>
866    </publisher>
867  </biblioentry>
868
869  <biblioentry>
870    <citetitle>
871      The Multi-Core Standard Template Library
872    </citetitle>
873
874    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
875    <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
876    <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
877
878    <copyright>
879      <year>2007</year>
880      <holder/>
881    </copyright>
882
883    <publisher>
884      <publishername>
885	 Euro-Par 2007: Parallel Processing. (LNCS 4641)
886      </publishername>
887    </publisher>
888  </biblioentry>
889
890</bibliography>
891
892</chapter>
893