parallel_mode.xml revision 1.1.1.3
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 
2	 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
3<?dbhtml filename="parallel_mode.html"?>
4
5<info><title>Parallel Mode</title>
6  <keywordset>
7    <keyword>C++</keyword>
8    <keyword>library</keyword>
9    <keyword>parallel</keyword>
10  </keywordset>
11</info>
12
13
14
15<para> The libstdc++ parallel mode is an experimental parallel
16implementation of many algorithms the C++ Standard Library.
17</para>
18
19<para>
20Several of the standard algorithms, for instance
21<function>std::sort</function>, are made parallel using OpenMP
22annotations. These parallel mode constructs and can be invoked by
23explicit source declaration or by compiling existing sources with a
24specific compiler flag.
25</para>
26
27
28<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
29  
30
31<para>The following library components in the include
32<filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
33<itemizedlist>
34  <listitem><para><function>std::accumulate</function></para></listitem>
35  <listitem><para><function>std::adjacent_difference</function></para></listitem>
36  <listitem><para><function>std::inner_product</function></para></listitem>
37  <listitem><para><function>std::partial_sum</function></para></listitem>
38</itemizedlist>
39
40<para>The following library components in the include
41<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
42<itemizedlist>
43  <listitem><para><function>std::adjacent_find</function></para></listitem>
44  <listitem><para><function>std::count</function></para></listitem>
45  <listitem><para><function>std::count_if</function></para></listitem>
46  <listitem><para><function>std::equal</function></para></listitem>
47  <listitem><para><function>std::find</function></para></listitem>
48  <listitem><para><function>std::find_if</function></para></listitem>
49  <listitem><para><function>std::find_first_of</function></para></listitem>
50  <listitem><para><function>std::for_each</function></para></listitem>
51  <listitem><para><function>std::generate</function></para></listitem>
52  <listitem><para><function>std::generate_n</function></para></listitem>
53  <listitem><para><function>std::lexicographical_compare</function></para></listitem>
54  <listitem><para><function>std::mismatch</function></para></listitem>
55  <listitem><para><function>std::search</function></para></listitem>
56  <listitem><para><function>std::search_n</function></para></listitem>
57  <listitem><para><function>std::transform</function></para></listitem>
58  <listitem><para><function>std::replace</function></para></listitem>
59  <listitem><para><function>std::replace_if</function></para></listitem>
60  <listitem><para><function>std::max_element</function></para></listitem>
61  <listitem><para><function>std::merge</function></para></listitem>
62  <listitem><para><function>std::min_element</function></para></listitem>
63  <listitem><para><function>std::nth_element</function></para></listitem>
64  <listitem><para><function>std::partial_sort</function></para></listitem>
65  <listitem><para><function>std::partition</function></para></listitem>
66  <listitem><para><function>std::random_shuffle</function></para></listitem>
67  <listitem><para><function>std::set_union</function></para></listitem>
68  <listitem><para><function>std::set_intersection</function></para></listitem>
69  <listitem><para><function>std::set_symmetric_difference</function></para></listitem>
70  <listitem><para><function>std::set_difference</function></para></listitem>
71  <listitem><para><function>std::sort</function></para></listitem>
72  <listitem><para><function>std::stable_sort</function></para></listitem>
73  <listitem><para><function>std::unique_copy</function></para></listitem>
74</itemizedlist>
75
76</section>
77
78<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
79<?dbhtml filename="parallel_mode_semantics.html"?>
80  
81
82<para> The parallel mode STL algorithms are currently not exception-safe,
83i.e. user-defined functors must not throw exceptions.
84Also, the order of execution is not guaranteed for some functions, of course.
85Therefore, user-defined functors should not have any concurrent side effects.
86</para>
87
88<para> Since the current GCC OpenMP implementation does not support
89OpenMP parallel regions in concurrent threads,
90it is not possible to call parallel STL algorithm in
91concurrent threads, either.
92It might work with other compilers, though.</para>
93
94</section>
95
96<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
97<?dbhtml filename="parallel_mode_using.html"?>
98  
99
100<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
101  
102
103<para>
104  Any use of parallel functionality requires additional compiler
105  and runtime support, in particular support for OpenMP. Adding this support is
106  not difficult: just compile your application with the compiler
107  flag <literal>-fopenmp</literal>. This will link
108  in <code>libgomp</code>, the
109  <link xmlns:xlink="http://www.w3.org/1999/xlink"
110    xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
111    Multi Processing Runtime Library</link>,
112  whose presence is mandatory.
113</para>
114
115<para>
116In addition, hardware that supports atomic operations and a compiler
117  capable of producing atomic operations is mandatory: GCC defaults to no
118  support for atomic operations on some common hardware
119  architectures. Activating atomic operations may require explicit
120  compiler flags on some targets (like sparc and x86), such
121  as <literal>-march=i686</literal>,
122  <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
123  the GCC manual for more information.
124</para>
125
126</section>
127
128<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
129  
130
131<para>
132  To use the libstdc++ parallel mode, compile your application with
133  the prerequisite flags as detailed above, and in addition
134  add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
135  use of the standard (sequential) algorithms to the appropriate parallel
136  equivalents. Please note that this doesn't necessarily mean that
137  everything will end up being executed in a parallel manner, but
138  rather that the heuristics and settings coded into the parallel
139  versions will be used to determine if all, some, or no algorithms
140  will be executed using parallel variants.
141</para>
142
143<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
144  sizes and behavior of standard class templates such as
145  <function>std::search</function>, and therefore one can only link code
146  compiled with parallel mode and code compiled without parallel mode
147  if no instantiation of a container is passed between the two
148  translation units. Parallel mode functionality has distinct linkage,
149  and cannot be confused with normal mode symbols.
150</para>
151</section>
152
153<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
154  
155
156<para>When it is not feasible to recompile your entire application, or
157  only specific algorithms need to be parallel-aware, individual
158  parallel algorithms can be made available explicitly. These
159  parallel algorithms are functionally equivalent to the standard
160  drop-in algorithms used in parallel mode, but they are available in
161  a separate namespace as GNU extensions and may be used in programs
162  compiled with either release mode or with parallel mode.
163</para>
164
165
166<para>An example of using a parallel version
167of <function>std::sort</function>, but no other parallel algorithms, is:
168</para>
169
170<programlisting>
171#include &lt;vector&gt;
172#include &lt;parallel/algorithm&gt;
173
174int main()
175{
176  std::vector&lt;int&gt; v(100);
177
178  // ...
179
180  // Explicitly force a call to parallel sort.
181  __gnu_parallel::sort(v.begin(), v.end());
182  return 0;
183}
184</programlisting>
185
186<para>
187Then compile this code with the prerequisite compiler flags
188(<literal>-fopenmp</literal> and any necessary architecture-specific
189flags for atomic operations.)
190</para>
191
192<para> The following table provides the names and headers of all the
193  parallel algorithms that can be used in a similar manner:
194</para>
195
196<table frame="all" xml:id="table.parallel_algos">
197<title>Parallel Algorithms</title>
198
199<tgroup cols="4" align="left" colsep="1" rowsep="1">
200<colspec colname="c1"/>
201<colspec colname="c2"/>
202<colspec colname="c3"/>
203<colspec colname="c4"/>
204
205<thead>
206  <row>
207    <entry>Algorithm</entry>
208    <entry>Header</entry>
209    <entry>Parallel algorithm</entry>
210    <entry>Parallel header</entry>
211  </row>
212</thead>
213
214<tbody>
215  <row>
216    <entry><function>std::accumulate</function></entry>
217    <entry><filename class="headerfile">numeric</filename></entry>
218    <entry><function>__gnu_parallel::accumulate</function></entry>
219    <entry><filename class="headerfile">parallel/numeric</filename></entry>
220  </row>
221  <row>
222    <entry><function>std::adjacent_difference</function></entry>
223    <entry><filename class="headerfile">numeric</filename></entry>
224    <entry><function>__gnu_parallel::adjacent_difference</function></entry>
225    <entry><filename class="headerfile">parallel/numeric</filename></entry>
226  </row>
227  <row>
228    <entry><function>std::inner_product</function></entry>
229    <entry><filename class="headerfile">numeric</filename></entry>
230    <entry><function>__gnu_parallel::inner_product</function></entry>
231    <entry><filename class="headerfile">parallel/numeric</filename></entry>
232  </row>
233  <row>
234    <entry><function>std::partial_sum</function></entry>
235    <entry><filename class="headerfile">numeric</filename></entry>
236    <entry><function>__gnu_parallel::partial_sum</function></entry>
237    <entry><filename class="headerfile">parallel/numeric</filename></entry>
238  </row>
239  <row>
240    <entry><function>std::adjacent_find</function></entry>
241    <entry><filename class="headerfile">algorithm</filename></entry>
242    <entry><function>__gnu_parallel::adjacent_find</function></entry>
243    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
244  </row>
245
246  <row>
247    <entry><function>std::count</function></entry>
248    <entry><filename class="headerfile">algorithm</filename></entry>
249    <entry><function>__gnu_parallel::count</function></entry>
250    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
251  </row>
252
253  <row>
254    <entry><function>std::count_if</function></entry>
255    <entry><filename class="headerfile">algorithm</filename></entry>
256    <entry><function>__gnu_parallel::count_if</function></entry>
257    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
258  </row>
259
260  <row>
261    <entry><function>std::equal</function></entry>
262    <entry><filename class="headerfile">algorithm</filename></entry>
263    <entry><function>__gnu_parallel::equal</function></entry>
264    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
265  </row>
266
267  <row>
268    <entry><function>std::find</function></entry>
269    <entry><filename class="headerfile">algorithm</filename></entry>
270    <entry><function>__gnu_parallel::find</function></entry>
271    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
272  </row>
273
274  <row>
275    <entry><function>std::find_if</function></entry>
276    <entry><filename class="headerfile">algorithm</filename></entry>
277    <entry><function>__gnu_parallel::find_if</function></entry>
278    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
279  </row>
280
281  <row>
282    <entry><function>std::find_first_of</function></entry>
283    <entry><filename class="headerfile">algorithm</filename></entry>
284    <entry><function>__gnu_parallel::find_first_of</function></entry>
285    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
286  </row>
287
288  <row>
289    <entry><function>std::for_each</function></entry>
290    <entry><filename class="headerfile">algorithm</filename></entry>
291    <entry><function>__gnu_parallel::for_each</function></entry>
292    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
293  </row>
294
295  <row>
296    <entry><function>std::generate</function></entry>
297    <entry><filename class="headerfile">algorithm</filename></entry>
298    <entry><function>__gnu_parallel::generate</function></entry>
299    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
300  </row>
301
302  <row>
303    <entry><function>std::generate_n</function></entry>
304    <entry><filename class="headerfile">algorithm</filename></entry>
305    <entry><function>__gnu_parallel::generate_n</function></entry>
306    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
307  </row>
308
309  <row>
310    <entry><function>std::lexicographical_compare</function></entry>
311    <entry><filename class="headerfile">algorithm</filename></entry>
312    <entry><function>__gnu_parallel::lexicographical_compare</function></entry>
313    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
314  </row>
315
316  <row>
317    <entry><function>std::mismatch</function></entry>
318    <entry><filename class="headerfile">algorithm</filename></entry>
319    <entry><function>__gnu_parallel::mismatch</function></entry>
320    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
321  </row>
322
323  <row>
324    <entry><function>std::search</function></entry>
325    <entry><filename class="headerfile">algorithm</filename></entry>
326    <entry><function>__gnu_parallel::search</function></entry>
327    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
328  </row>
329
330  <row>
331    <entry><function>std::search_n</function></entry>
332    <entry><filename class="headerfile">algorithm</filename></entry>
333    <entry><function>__gnu_parallel::search_n</function></entry>
334    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
335  </row>
336
337  <row>
338    <entry><function>std::transform</function></entry>
339    <entry><filename class="headerfile">algorithm</filename></entry>
340    <entry><function>__gnu_parallel::transform</function></entry>
341    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
342  </row>
343
344  <row>
345    <entry><function>std::replace</function></entry>
346    <entry><filename class="headerfile">algorithm</filename></entry>
347    <entry><function>__gnu_parallel::replace</function></entry>
348    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
349  </row>
350
351  <row>
352    <entry><function>std::replace_if</function></entry>
353    <entry><filename class="headerfile">algorithm</filename></entry>
354    <entry><function>__gnu_parallel::replace_if</function></entry>
355    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
356  </row>
357
358  <row>
359    <entry><function>std::max_element</function></entry>
360    <entry><filename class="headerfile">algorithm</filename></entry>
361    <entry><function>__gnu_parallel::max_element</function></entry>
362    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
363  </row>
364
365  <row>
366    <entry><function>std::merge</function></entry>
367    <entry><filename class="headerfile">algorithm</filename></entry>
368    <entry><function>__gnu_parallel::merge</function></entry>
369    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
370  </row>
371
372  <row>
373    <entry><function>std::min_element</function></entry>
374    <entry><filename class="headerfile">algorithm</filename></entry>
375    <entry><function>__gnu_parallel::min_element</function></entry>
376    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
377  </row>
378
379  <row>
380    <entry><function>std::nth_element</function></entry>
381    <entry><filename class="headerfile">algorithm</filename></entry>
382    <entry><function>__gnu_parallel::nth_element</function></entry>
383    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
384  </row>
385
386  <row>
387    <entry><function>std::partial_sort</function></entry>
388    <entry><filename class="headerfile">algorithm</filename></entry>
389    <entry><function>__gnu_parallel::partial_sort</function></entry>
390    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
391  </row>
392
393  <row>
394    <entry><function>std::partition</function></entry>
395    <entry><filename class="headerfile">algorithm</filename></entry>
396    <entry><function>__gnu_parallel::partition</function></entry>
397    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
398  </row>
399
400  <row>
401    <entry><function>std::random_shuffle</function></entry>
402    <entry><filename class="headerfile">algorithm</filename></entry>
403    <entry><function>__gnu_parallel::random_shuffle</function></entry>
404    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
405  </row>
406
407  <row>
408    <entry><function>std::set_union</function></entry>
409    <entry><filename class="headerfile">algorithm</filename></entry>
410    <entry><function>__gnu_parallel::set_union</function></entry>
411    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
412  </row>
413
414  <row>
415    <entry><function>std::set_intersection</function></entry>
416    <entry><filename class="headerfile">algorithm</filename></entry>
417    <entry><function>__gnu_parallel::set_intersection</function></entry>
418    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
419  </row>
420
421  <row>
422    <entry><function>std::set_symmetric_difference</function></entry>
423    <entry><filename class="headerfile">algorithm</filename></entry>
424    <entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
425    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
426  </row>
427
428  <row>
429    <entry><function>std::set_difference</function></entry>
430    <entry><filename class="headerfile">algorithm</filename></entry>
431    <entry><function>__gnu_parallel::set_difference</function></entry>
432    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
433  </row>
434
435  <row>
436    <entry><function>std::sort</function></entry>
437    <entry><filename class="headerfile">algorithm</filename></entry>
438    <entry><function>__gnu_parallel::sort</function></entry>
439    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
440  </row>
441
442  <row>
443    <entry><function>std::stable_sort</function></entry>
444    <entry><filename class="headerfile">algorithm</filename></entry>
445    <entry><function>__gnu_parallel::stable_sort</function></entry>
446    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
447  </row>
448
449  <row>
450    <entry><function>std::unique_copy</function></entry>
451    <entry><filename class="headerfile">algorithm</filename></entry>
452    <entry><function>__gnu_parallel::unique_copy</function></entry>
453    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
454  </row>
455</tbody>
456</tgroup>
457</table>
458
459</section>
460
461</section>
462
463<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
464<?dbhtml filename="parallel_mode_design.html"?>
465  
466  <para>
467  </para>
468<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
469  
470
471<para>
472All parallel algorithms are intended to have signatures that are
473equivalent to the ISO C++ algorithms replaced. For instance, the
474<function>std::adjacent_find</function> function is declared as:
475</para>
476<programlisting>
477namespace std
478{
479  template&lt;typename _FIter&gt;
480    _FIter
481    adjacent_find(_FIter, _FIter);
482}
483</programlisting>
484
485<para>
486Which means that there should be something equivalent for the parallel
487version. Indeed, this is the case:
488</para>
489
490<programlisting>
491namespace std
492{
493  namespace __parallel
494  {
495    template&lt;typename _FIter&gt;
496      _FIter
497      adjacent_find(_FIter, _FIter);
498
499    ...
500  }
501}
502</programlisting>
503
504<para>But.... why the ellipses?
505</para>
506
507<para> The ellipses in the example above represent additional overloads
508required for the parallel version of the function. These additional
509overloads are used to dispatch calls from the ISO C++ function
510signature to the appropriate parallel function (or sequential
511function, if no parallel functions are deemed worthy), based on either
512compile-time or run-time conditions.
513</para>
514
515<para> The available signature options are specific for the different
516algorithms/algorithm classes.</para>
517
518<para> The general view of overloads for the parallel algorithms look like this:
519</para>
520<itemizedlist>
521   <listitem><para>ISO C++ signature</para></listitem>
522   <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
523   <listitem><para>ISO C++ signature + algorithm-specific tag type
524    (several signatures)</para></listitem>
525</itemizedlist>
526
527<para> Please note that the implementation may use additional functions
528(designated with the <code>_switch</code> suffix) to dispatch from the
529ISO C++ signature to the correct parallel version. Also, some of the
530algorithms do not have support for run-time conditions, so the last
531overload is therefore missing.
532</para>
533
534
535</section>
536
537<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
538  
539
540
541<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
542  
543
544<para>
545Several aspects of the overall runtime environment can be manipulated
546by standard OpenMP function calls.
547</para>
548
549<para>
550To specify the number of threads to be used for the algorithms globally,
551use the function <function>omp_set_num_threads</function>. An example:
552</para>
553
554<programlisting>
555#include &lt;stdlib.h&gt;
556#include &lt;omp.h&gt;
557
558int main()
559{
560  // Explicitly set number of threads.
561  const int threads_wanted = 20;
562  omp_set_dynamic(false);
563  omp_set_num_threads(threads_wanted);
564
565  // Call parallel mode algorithms.
566
567  return 0;
568}
569</programlisting>
570
571<para>
572 Some algorithms allow the number of threads being set for a particular call,
573 by augmenting the algorithm variant.
574 See the next section for further information.
575</para>
576
577<para>
578Other parts of the runtime environment able to be manipulated include
579nested parallelism (<function>omp_set_nested</function>), schedule kind
580(<function>omp_set_schedule</function>), and others. See the OpenMP
581documentation for more information.
582</para>
583
584</section>
585
586<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
587  
588
589<para>
590To force an algorithm to execute sequentially, even though parallelism
591is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
592add <classname>__gnu_parallel::sequential_tag()</classname> to the end
593of the algorithm's argument list.
594</para>
595
596<para>
597Like so:
598</para>
599
600<programlisting>
601std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
602</programlisting>
603
604<para>
605Some parallel algorithm variants can be excluded from compilation by
606preprocessor defines. See the doxygen documentation on
607<code>compiletime_settings.h</code> and <code>features.h</code> for details.
608</para>
609
610<para>
611For some algorithms, the desired variant can be chosen at compile-time by
612appending a tag object. The available options are specific to the particular
613algorithm (class).
614</para>
615
616<para>
617For the "embarrassingly parallel" algorithms, there is only one "tag object
618type", the enum _Parallelism.
619It takes one of the following values,
620<code>__gnu_parallel::parallel_tag</code>,
621<code>__gnu_parallel::balanced_tag</code>,
622<code>__gnu_parallel::unbalanced_tag</code>,
623<code>__gnu_parallel::omp_loop_tag</code>,
624<code>__gnu_parallel::omp_loop_static_tag</code>.
625This means that the actual parallelization strategy is chosen at run-time.
626(Choosing the variants at compile-time will come soon.)
627</para>
628
629<para>
630For the following algorithms in general, we have
631<code>__gnu_parallel::parallel_tag</code> and
632<code>__gnu_parallel::default_parallel_tag</code>, in addition to
633<code>__gnu_parallel::sequential_tag</code>.
634<code>__gnu_parallel::default_parallel_tag</code> chooses the default
635algorithm at compiletime, as does omitting the tag.
636<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
637(see next section).
638For all tags, the number of threads desired for this call can optionally be
639passed to the respective tag's constructor.
640</para>
641
642<para>
643The <code>multiway_merge</code> algorithm comes with the additional choices,
644<code>__gnu_parallel::exact_tag</code> and
645<code>__gnu_parallel::sampling_tag</code>.
646Exact and sampling are the two available splitting strategies.
647</para>
648
649<para>
650For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
651several additional choices, namely
652<code>__gnu_parallel::multiway_mergesort_tag</code>,
653<code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
654<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
655<code>__gnu_parallel::quicksort_tag</code>, and
656<code>__gnu_parallel::balanced_quicksort_tag</code>.
657Multiway mergesort comes with the two splitting strategies for multi-way
658merging. The quicksort options cannot be used for <code>stable_sort</code>.
659</para>
660
661</section>
662
663<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
664  
665
666<para>
667The default parallelization strategy, the choice of specific algorithm
668strategy, the minimum threshold limits for individual parallel
669algorithms, and aspects of the underlying hardware can be specified as
670desired via manipulation
671of <classname>__gnu_parallel::_Settings</classname> member data.
672</para>
673
674<para>
675First off, the choice of parallelization strategy: serial, parallel,
676or heuristically deduced. This corresponds
677to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
678value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
679type. Choices
680include: <type>heuristic</type>, <type>force_sequential</type>,
681and <type>force_parallel</type>. The default is <type>heuristic</type>.
682</para>
683
684
685<para>
686Next, the sub-choices for algorithm variant, if not fixed at compile-time.
687Specific algorithms like <function>find</function> or <function>sort</function>
688can be implemented in multiple ways: when this is the case,
689a <classname>__gnu_parallel::_Settings</classname> member exists to
690pick the default strategy. For
691example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
692have any values of
693enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
694or <type>QS_BALANCED</type>.
695</para>
696
697<para>
698Likewise for setting the minimal threshold for algorithm
699parallelization.  Parallelism always incurs some overhead. Thus, it is
700not helpful to parallelize operations on very small sets of
701data. Because of this, measures are taken to avoid parallelizing below
702a certain, pre-determined threshold. For each algorithm, a minimum
703problem size is encoded as a variable in the
704active <classname>__gnu_parallel::_Settings</classname> object.  This
705threshold variable follows the following naming scheme:
706<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So,
707for <function>fill</function>, the threshold variable
708is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
709</para>
710
711<para>
712Finally, hardware details like L1/L2 cache size can be hardwired
713via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
714</para>
715
716<para>
717</para>
718
719<para>
720All these configuration variables can be changed by the user, if
721desired.
722There exists one global instance of the class <classname>_Settings</classname>,
723i. e. it is a singleton. It can be read and written by calling
724<code>__gnu_parallel::_Settings::get</code> and
725<code>__gnu_parallel::_Settings::set</code>, respectively.
726Please note that the first call return a const object, so direct manipulation
727is forbidden.
728See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html">
729  <filename class="headerfile">settings.h</filename></link>
730for complete details.
731</para>
732
733<para>
734A small example of tuning the default:
735</para>
736
737<programlisting>
738#include &lt;parallel/algorithm&gt;
739#include &lt;parallel/settings.h&gt;
740
741int main()
742{
743  __gnu_parallel::_Settings s;
744  s.algorithm_strategy = __gnu_parallel::force_parallel;
745  __gnu_parallel::_Settings::set(s);
746
747  // Do work... all algorithms will be parallelized, always.
748
749  return 0;
750}
751</programlisting>
752
753</section>
754
755</section>
756
757<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
758  
759
760<para> One namespace contain versions of code that are always
761explicitly sequential:
762<code>__gnu_serial</code>.
763</para>
764
765<para> Two namespaces contain the parallel mode:
766<code>std::__parallel</code> and <code>__gnu_parallel</code>.
767</para>
768
769<para> Parallel implementations of standard components, including
770template helpers to select parallelism, are defined in <code>namespace
771std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
772<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
773implementations are injected into <code>namespace
774__gnu_parallel</code> with using declarations.
775</para>
776
777<para> Support and general infrastructure is in <code>namespace
778__gnu_parallel</code>.
779</para>
780
781<para> More information, and an organized index of types and functions
782related to the parallel mode on a per-namespace basis, can be found in
783the generated source documentation.
784</para>
785
786</section>
787
788</section>
789
790<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
791<?dbhtml filename="parallel_mode_test.html"?>
792  
793
794  <para>
795    Both the normal conformance and regression tests and the
796    supplemental performance tests work.
797  </para>
798
799  <para>
800    To run the conformance and regression tests with the parallel mode
801    active,
802  </para>
803
804  <screen>
805  <userinput>make check-parallel</userinput>
806  </screen>
807
808  <para>
809    The log and summary files for conformance testing are in the
810    <filename class="directory">testsuite/parallel</filename> directory.
811  </para>
812
813  <para>
814    To run the performance tests with the parallel mode active,
815  </para>
816
817  <screen>
818  <userinput>make check-performance-parallel</userinput>
819  </screen>
820
821  <para>
822    The result file for performance testing are in the
823    <filename class="directory">testsuite</filename> directory, in the file
824    <filename>libstdc++_performance.sum</filename>. In addition, the
825    policy-based containers have their own visualizations, which have
826    additional software dependencies than the usual bare-boned text
827    file, and can be generated by using the <code>make
828    doc-performance</code> rule in the testsuite's Makefile.
829</para>
830</section>
831
832<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
833
834
835  <biblioentry>
836    <citetitle>
837      Parallelization of Bulk Operations for STL Dictionaries
838    </citetitle>
839
840    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
841    <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
842
843    <copyright>
844      <year>2007</year>
845      <holder/>
846    </copyright>
847
848    <publisher>
849      <publishername>
850	Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
851      </publishername>
852    </publisher>
853  </biblioentry>
854
855  <biblioentry>
856    <citetitle>
857      The Multi-Core Standard Template Library
858    </citetitle>
859
860    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
861    <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
862    <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
863
864    <copyright>
865      <year>2007</year>
866      <holder/>
867    </copyright>
868
869    <publisher>
870      <publishername>
871	 Euro-Par 2007: Parallel Processing. (LNCS 4641)
872      </publishername>
873    </publisher>
874  </biblioentry>
875
876</bibliography>
877
878</chapter>
879