parallel_mode.xml revision 1.1.1.2
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 
2	 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
3<?dbhtml filename="parallel_mode.html"?>
4
5<info><title>Parallel Mode</title>
6  <keywordset>
7    <keyword>C++</keyword>
8    <keyword>library</keyword>
9    <keyword>parallel</keyword>
10  </keywordset>
11</info>
12
13
14
15<para> The libstdc++ parallel mode is an experimental parallel
16implementation of many algorithms the C++ Standard Library.
17</para>
18
19<para>
20Several of the standard algorithms, for instance
21<function>std::sort</function>, are made parallel using OpenMP
22annotations. These parallel mode constructs and can be invoked by
23explicit source declaration or by compiling existing sources with a
24specific compiler flag.
25</para>
26
27
28<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
29  
30
31<para>The following library components in the include
32<filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
33<itemizedlist>
34  <listitem><para><function>std::accumulate</function></para></listitem>
35  <listitem><para><function>std::adjacent_difference</function></para></listitem>
36  <listitem><para><function>std::inner_product</function></para></listitem>
37  <listitem><para><function>std::partial_sum</function></para></listitem>
38</itemizedlist>
39
40<para>The following library components in the include
41<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
42<itemizedlist>
43  <listitem><para><function>std::adjacent_find</function></para></listitem>
44  <listitem><para><function>std::count</function></para></listitem>
45  <listitem><para><function>std::count_if</function></para></listitem>
46  <listitem><para><function>std::equal</function></para></listitem>
47  <listitem><para><function>std::find</function></para></listitem>
48  <listitem><para><function>std::find_if</function></para></listitem>
49  <listitem><para><function>std::find_first_of</function></para></listitem>
50  <listitem><para><function>std::for_each</function></para></listitem>
51  <listitem><para><function>std::generate</function></para></listitem>
52  <listitem><para><function>std::generate_n</function></para></listitem>
53  <listitem><para><function>std::lexicographical_compare</function></para></listitem>
54  <listitem><para><function>std::mismatch</function></para></listitem>
55  <listitem><para><function>std::search</function></para></listitem>
56  <listitem><para><function>std::search_n</function></para></listitem>
57  <listitem><para><function>std::transform</function></para></listitem>
58  <listitem><para><function>std::replace</function></para></listitem>
59  <listitem><para><function>std::replace_if</function></para></listitem>
60  <listitem><para><function>std::max_element</function></para></listitem>
61  <listitem><para><function>std::merge</function></para></listitem>
62  <listitem><para><function>std::min_element</function></para></listitem>
63  <listitem><para><function>std::nth_element</function></para></listitem>
64  <listitem><para><function>std::partial_sort</function></para></listitem>
65  <listitem><para><function>std::partition</function></para></listitem>
66  <listitem><para><function>std::random_shuffle</function></para></listitem>
67  <listitem><para><function>std::set_union</function></para></listitem>
68  <listitem><para><function>std::set_intersection</function></para></listitem>
69  <listitem><para><function>std::set_symmetric_difference</function></para></listitem>
70  <listitem><para><function>std::set_difference</function></para></listitem>
71  <listitem><para><function>std::sort</function></para></listitem>
72  <listitem><para><function>std::stable_sort</function></para></listitem>
73  <listitem><para><function>std::unique_copy</function></para></listitem>
74</itemizedlist>
75
76</section>
77
78<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
79<?dbhtml filename="parallel_mode_semantics.html"?>
80  
81
82<para> The parallel mode STL algorithms are currently not exception-safe,
83i.e. user-defined functors must not throw exceptions.
84Also, the order of execution is not guaranteed for some functions, of course.
85Therefore, user-defined functors should not have any concurrent side effects.
86</para>
87
88<para> Since the current GCC OpenMP implementation does not support
89OpenMP parallel regions in concurrent threads,
90it is not possible to call parallel STL algorithm in
91concurrent threads, either.
92It might work with other compilers, though.</para>
93
94</section>
95
96<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
97<?dbhtml filename="parallel_mode_using.html"?>
98  
99
100<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
101  
102
103<para>
104  Any use of parallel functionality requires additional compiler
105  and runtime support, in particular support for OpenMP. Adding this support is
106  not difficult: just compile your application with the compiler
107  flag <literal>-fopenmp</literal>. This will link
108  in <code>libgomp</code>, the
109  OpenMP <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU implementation</link>,
110  whose presence is mandatory.
111</para>
112
113<para>
114In addition, hardware that supports atomic operations and a compiler
115  capable of producing atomic operations is mandatory: GCC defaults to no
116  support for atomic operations on some common hardware
117  architectures. Activating atomic operations may require explicit
118  compiler flags on some targets (like sparc and x86), such
119  as <literal>-march=i686</literal>,
120  <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
121  the GCC manual for more information.
122</para>
123
124</section>
125
126<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
127  
128
129<para>
130  To use the libstdc++ parallel mode, compile your application with
131  the prerequisite flags as detailed above, and in addition
132  add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
133  use of the standard (sequential) algorithms to the appropriate parallel
134  equivalents. Please note that this doesn't necessarily mean that
135  everything will end up being executed in a parallel manner, but
136  rather that the heuristics and settings coded into the parallel
137  versions will be used to determine if all, some, or no algorithms
138  will be executed using parallel variants.
139</para>
140
141<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
142  sizes and behavior of standard class templates such as
143  <function>std::search</function>, and therefore one can only link code
144  compiled with parallel mode and code compiled without parallel mode
145  if no instantiation of a container is passed between the two
146  translation units. Parallel mode functionality has distinct linkage,
147  and cannot be confused with normal mode symbols.
148</para>
149</section>
150
151<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
152  
153
154<para>When it is not feasible to recompile your entire application, or
155  only specific algorithms need to be parallel-aware, individual
156  parallel algorithms can be made available explicitly. These
157  parallel algorithms are functionally equivalent to the standard
158  drop-in algorithms used in parallel mode, but they are available in
159  a separate namespace as GNU extensions and may be used in programs
160  compiled with either release mode or with parallel mode.
161</para>
162
163
164<para>An example of using a parallel version
165of <function>std::sort</function>, but no other parallel algorithms, is:
166</para>
167
168<programlisting>
169#include &lt;vector&gt;
170#include &lt;parallel/algorithm&gt;
171
172int main()
173{
174  std::vector&lt;int&gt; v(100);
175
176  // ...
177
178  // Explicitly force a call to parallel sort.
179  __gnu_parallel::sort(v.begin(), v.end());
180  return 0;
181}
182</programlisting>
183
184<para>
185Then compile this code with the prerequisite compiler flags
186(<literal>-fopenmp</literal> and any necessary architecture-specific
187flags for atomic operations.)
188</para>
189
190<para> The following table provides the names and headers of all the
191  parallel algorithms that can be used in a similar manner:
192</para>
193
194<table frame="all">
195<title>Parallel Algorithms</title>
196
197<tgroup cols="4" align="left" colsep="1" rowsep="1">
198<colspec colname="c1"/>
199<colspec colname="c2"/>
200<colspec colname="c3"/>
201<colspec colname="c4"/>
202
203<thead>
204  <row>
205    <entry>Algorithm</entry>
206    <entry>Header</entry>
207    <entry>Parallel algorithm</entry>
208    <entry>Parallel header</entry>
209  </row>
210</thead>
211
212<tbody>
213  <row>
214    <entry><function>std::accumulate</function></entry>
215    <entry><filename class="headerfile">numeric</filename></entry>
216    <entry><function>__gnu_parallel::accumulate</function></entry>
217    <entry><filename class="headerfile">parallel/numeric</filename></entry>
218  </row>
219  <row>
220    <entry><function>std::adjacent_difference</function></entry>
221    <entry><filename class="headerfile">numeric</filename></entry>
222    <entry><function>__gnu_parallel::adjacent_difference</function></entry>
223    <entry><filename class="headerfile">parallel/numeric</filename></entry>
224  </row>
225  <row>
226    <entry><function>std::inner_product</function></entry>
227    <entry><filename class="headerfile">numeric</filename></entry>
228    <entry><function>__gnu_parallel::inner_product</function></entry>
229    <entry><filename class="headerfile">parallel/numeric</filename></entry>
230  </row>
231  <row>
232    <entry><function>std::partial_sum</function></entry>
233    <entry><filename class="headerfile">numeric</filename></entry>
234    <entry><function>__gnu_parallel::partial_sum</function></entry>
235    <entry><filename class="headerfile">parallel/numeric</filename></entry>
236  </row>
237  <row>
238    <entry><function>std::adjacent_find</function></entry>
239    <entry><filename class="headerfile">algorithm</filename></entry>
240    <entry><function>__gnu_parallel::adjacent_find</function></entry>
241    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
242  </row>
243
244  <row>
245    <entry><function>std::count</function></entry>
246    <entry><filename class="headerfile">algorithm</filename></entry>
247    <entry><function>__gnu_parallel::count</function></entry>
248    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
249  </row>
250
251  <row>
252    <entry><function>std::count_if</function></entry>
253    <entry><filename class="headerfile">algorithm</filename></entry>
254    <entry><function>__gnu_parallel::count_if</function></entry>
255    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
256  </row>
257
258  <row>
259    <entry><function>std::equal</function></entry>
260    <entry><filename class="headerfile">algorithm</filename></entry>
261    <entry><function>__gnu_parallel::equal</function></entry>
262    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
263  </row>
264
265  <row>
266    <entry><function>std::find</function></entry>
267    <entry><filename class="headerfile">algorithm</filename></entry>
268    <entry><function>__gnu_parallel::find</function></entry>
269    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
270  </row>
271
272  <row>
273    <entry><function>std::find_if</function></entry>
274    <entry><filename class="headerfile">algorithm</filename></entry>
275    <entry><function>__gnu_parallel::find_if</function></entry>
276    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
277  </row>
278
279  <row>
280    <entry><function>std::find_first_of</function></entry>
281    <entry><filename class="headerfile">algorithm</filename></entry>
282    <entry><function>__gnu_parallel::find_first_of</function></entry>
283    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
284  </row>
285
286  <row>
287    <entry><function>std::for_each</function></entry>
288    <entry><filename class="headerfile">algorithm</filename></entry>
289    <entry><function>__gnu_parallel::for_each</function></entry>
290    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
291  </row>
292
293  <row>
294    <entry><function>std::generate</function></entry>
295    <entry><filename class="headerfile">algorithm</filename></entry>
296    <entry><function>__gnu_parallel::generate</function></entry>
297    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
298  </row>
299
300  <row>
301    <entry><function>std::generate_n</function></entry>
302    <entry><filename class="headerfile">algorithm</filename></entry>
303    <entry><function>__gnu_parallel::generate_n</function></entry>
304    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
305  </row>
306
307  <row>
308    <entry><function>std::lexicographical_compare</function></entry>
309    <entry><filename class="headerfile">algorithm</filename></entry>
310    <entry><function>__gnu_parallel::lexicographical_compare</function></entry>
311    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
312  </row>
313
314  <row>
315    <entry><function>std::mismatch</function></entry>
316    <entry><filename class="headerfile">algorithm</filename></entry>
317    <entry><function>__gnu_parallel::mismatch</function></entry>
318    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
319  </row>
320
321  <row>
322    <entry><function>std::search</function></entry>
323    <entry><filename class="headerfile">algorithm</filename></entry>
324    <entry><function>__gnu_parallel::search</function></entry>
325    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
326  </row>
327
328  <row>
329    <entry><function>std::search_n</function></entry>
330    <entry><filename class="headerfile">algorithm</filename></entry>
331    <entry><function>__gnu_parallel::search_n</function></entry>
332    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
333  </row>
334
335  <row>
336    <entry><function>std::transform</function></entry>
337    <entry><filename class="headerfile">algorithm</filename></entry>
338    <entry><function>__gnu_parallel::transform</function></entry>
339    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
340  </row>
341
342  <row>
343    <entry><function>std::replace</function></entry>
344    <entry><filename class="headerfile">algorithm</filename></entry>
345    <entry><function>__gnu_parallel::replace</function></entry>
346    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
347  </row>
348
349  <row>
350    <entry><function>std::replace_if</function></entry>
351    <entry><filename class="headerfile">algorithm</filename></entry>
352    <entry><function>__gnu_parallel::replace_if</function></entry>
353    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
354  </row>
355
356  <row>
357    <entry><function>std::max_element</function></entry>
358    <entry><filename class="headerfile">algorithm</filename></entry>
359    <entry><function>__gnu_parallel::max_element</function></entry>
360    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
361  </row>
362
363  <row>
364    <entry><function>std::merge</function></entry>
365    <entry><filename class="headerfile">algorithm</filename></entry>
366    <entry><function>__gnu_parallel::merge</function></entry>
367    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
368  </row>
369
370  <row>
371    <entry><function>std::min_element</function></entry>
372    <entry><filename class="headerfile">algorithm</filename></entry>
373    <entry><function>__gnu_parallel::min_element</function></entry>
374    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
375  </row>
376
377  <row>
378    <entry><function>std::nth_element</function></entry>
379    <entry><filename class="headerfile">algorithm</filename></entry>
380    <entry><function>__gnu_parallel::nth_element</function></entry>
381    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
382  </row>
383
384  <row>
385    <entry><function>std::partial_sort</function></entry>
386    <entry><filename class="headerfile">algorithm</filename></entry>
387    <entry><function>__gnu_parallel::partial_sort</function></entry>
388    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
389  </row>
390
391  <row>
392    <entry><function>std::partition</function></entry>
393    <entry><filename class="headerfile">algorithm</filename></entry>
394    <entry><function>__gnu_parallel::partition</function></entry>
395    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
396  </row>
397
398  <row>
399    <entry><function>std::random_shuffle</function></entry>
400    <entry><filename class="headerfile">algorithm</filename></entry>
401    <entry><function>__gnu_parallel::random_shuffle</function></entry>
402    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
403  </row>
404
405  <row>
406    <entry><function>std::set_union</function></entry>
407    <entry><filename class="headerfile">algorithm</filename></entry>
408    <entry><function>__gnu_parallel::set_union</function></entry>
409    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
410  </row>
411
412  <row>
413    <entry><function>std::set_intersection</function></entry>
414    <entry><filename class="headerfile">algorithm</filename></entry>
415    <entry><function>__gnu_parallel::set_intersection</function></entry>
416    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
417  </row>
418
419  <row>
420    <entry><function>std::set_symmetric_difference</function></entry>
421    <entry><filename class="headerfile">algorithm</filename></entry>
422    <entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
423    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
424  </row>
425
426  <row>
427    <entry><function>std::set_difference</function></entry>
428    <entry><filename class="headerfile">algorithm</filename></entry>
429    <entry><function>__gnu_parallel::set_difference</function></entry>
430    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
431  </row>
432
433  <row>
434    <entry><function>std::sort</function></entry>
435    <entry><filename class="headerfile">algorithm</filename></entry>
436    <entry><function>__gnu_parallel::sort</function></entry>
437    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
438  </row>
439
440  <row>
441    <entry><function>std::stable_sort</function></entry>
442    <entry><filename class="headerfile">algorithm</filename></entry>
443    <entry><function>__gnu_parallel::stable_sort</function></entry>
444    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
445  </row>
446
447  <row>
448    <entry><function>std::unique_copy</function></entry>
449    <entry><filename class="headerfile">algorithm</filename></entry>
450    <entry><function>__gnu_parallel::unique_copy</function></entry>
451    <entry><filename class="headerfile">parallel/algorithm</filename></entry>
452  </row>
453</tbody>
454</tgroup>
455</table>
456
457</section>
458
459</section>
460
461<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
462<?dbhtml filename="parallel_mode_design.html"?>
463  
464  <para>
465  </para>
466<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
467  
468
469<para>
470All parallel algorithms are intended to have signatures that are
471equivalent to the ISO C++ algorithms replaced. For instance, the
472<function>std::adjacent_find</function> function is declared as:
473</para>
474<programlisting>
475namespace std
476{
477  template&lt;typename _FIter&gt;
478    _FIter
479    adjacent_find(_FIter, _FIter);
480}
481</programlisting>
482
483<para>
484Which means that there should be something equivalent for the parallel
485version. Indeed, this is the case:
486</para>
487
488<programlisting>
489namespace std
490{
491  namespace __parallel
492  {
493    template&lt;typename _FIter&gt;
494      _FIter
495      adjacent_find(_FIter, _FIter);
496
497    ...
498  }
499}
500</programlisting>
501
502<para>But.... why the ellipses?
503</para>
504
505<para> The ellipses in the example above represent additional overloads
506required for the parallel version of the function. These additional
507overloads are used to dispatch calls from the ISO C++ function
508signature to the appropriate parallel function (or sequential
509function, if no parallel functions are deemed worthy), based on either
510compile-time or run-time conditions.
511</para>
512
513<para> The available signature options are specific for the different
514algorithms/algorithm classes.</para>
515
516<para> The general view of overloads for the parallel algorithms look like this:
517</para>
518<itemizedlist>
519   <listitem><para>ISO C++ signature</para></listitem>
520   <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
521   <listitem><para>ISO C++ signature + algorithm-specific tag type
522    (several signatures)</para></listitem>
523</itemizedlist>
524
525<para> Please note that the implementation may use additional functions
526(designated with the <code>_switch</code> suffix) to dispatch from the
527ISO C++ signature to the correct parallel version. Also, some of the
528algorithms do not have support for run-time conditions, so the last
529overload is therefore missing.
530</para>
531
532
533</section>
534
535<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
536  
537
538
539<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
540  
541
542<para>
543Several aspects of the overall runtime environment can be manipulated
544by standard OpenMP function calls.
545</para>
546
547<para>
548To specify the number of threads to be used for the algorithms globally,
549use the function <function>omp_set_num_threads</function>. An example:
550</para>
551
552<programlisting>
553#include &lt;stdlib.h&gt;
554#include &lt;omp.h&gt;
555
556int main()
557{
558  // Explicitly set number of threads.
559  const int threads_wanted = 20;
560  omp_set_dynamic(false);
561  omp_set_num_threads(threads_wanted);
562
563  // Call parallel mode algorithms.
564
565  return 0;
566}
567</programlisting>
568
569<para>
570 Some algorithms allow the number of threads being set for a particular call,
571 by augmenting the algorithm variant.
572 See the next section for further information.
573</para>
574
575<para>
576Other parts of the runtime environment able to be manipulated include
577nested parallelism (<function>omp_set_nested</function>), schedule kind
578(<function>omp_set_schedule</function>), and others. See the OpenMP
579documentation for more information.
580</para>
581
582</section>
583
584<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
585  
586
587<para>
588To force an algorithm to execute sequentially, even though parallelism
589is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
590add <classname>__gnu_parallel::sequential_tag()</classname> to the end
591of the algorithm's argument list.
592</para>
593
594<para>
595Like so:
596</para>
597
598<programlisting>
599std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
600</programlisting>
601
602<para>
603Some parallel algorithm variants can be excluded from compilation by
604preprocessor defines. See the doxygen documentation on
605<code>compiletime_settings.h</code> and <code>features.h</code> for details.
606</para>
607
608<para>
609For some algorithms, the desired variant can be chosen at compile-time by
610appending a tag object. The available options are specific to the particular
611algorithm (class).
612</para>
613
614<para>
615For the "embarrassingly parallel" algorithms, there is only one "tag object
616type", the enum _Parallelism.
617It takes one of the following values,
618<code>__gnu_parallel::parallel_tag</code>,
619<code>__gnu_parallel::balanced_tag</code>,
620<code>__gnu_parallel::unbalanced_tag</code>,
621<code>__gnu_parallel::omp_loop_tag</code>,
622<code>__gnu_parallel::omp_loop_static_tag</code>.
623This means that the actual parallelization strategy is chosen at run-time.
624(Choosing the variants at compile-time will come soon.)
625</para>
626
627<para>
628For the following algorithms in general, we have
629<code>__gnu_parallel::parallel_tag</code> and
630<code>__gnu_parallel::default_parallel_tag</code>, in addition to
631<code>__gnu_parallel::sequential_tag</code>.
632<code>__gnu_parallel::default_parallel_tag</code> chooses the default
633algorithm at compiletime, as does omitting the tag.
634<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
635(see next section).
636For all tags, the number of threads desired for this call can optionally be
637passed to the respective tag's constructor.
638</para>
639
640<para>
641The <code>multiway_merge</code> algorithm comes with the additional choices,
642<code>__gnu_parallel::exact_tag</code> and
643<code>__gnu_parallel::sampling_tag</code>.
644Exact and sampling are the two available splitting strategies.
645</para>
646
647<para>
648For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
649several additional choices, namely
650<code>__gnu_parallel::multiway_mergesort_tag</code>,
651<code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
652<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
653<code>__gnu_parallel::quicksort_tag</code>, and
654<code>__gnu_parallel::balanced_quicksort_tag</code>.
655Multiway mergesort comes with the two splitting strategies for multi-way
656merging. The quicksort options cannot be used for <code>stable_sort</code>.
657</para>
658
659</section>
660
661<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
662  
663
664<para>
665The default parallelization strategy, the choice of specific algorithm
666strategy, the minimum threshold limits for individual parallel
667algorithms, and aspects of the underlying hardware can be specified as
668desired via manipulation
669of <classname>__gnu_parallel::_Settings</classname> member data.
670</para>
671
672<para>
673First off, the choice of parallelization strategy: serial, parallel,
674or heuristically deduced. This corresponds
675to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
676value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
677type. Choices
678include: <type>heuristic</type>, <type>force_sequential</type>,
679and <type>force_parallel</type>. The default is <type>heuristic</type>.
680</para>
681
682
683<para>
684Next, the sub-choices for algorithm variant, if not fixed at compile-time.
685Specific algorithms like <function>find</function> or <function>sort</function>
686can be implemented in multiple ways: when this is the case,
687a <classname>__gnu_parallel::_Settings</classname> member exists to
688pick the default strategy. For
689example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
690have any values of
691enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
692or <type>QS_BALANCED</type>.
693</para>
694
695<para>
696Likewise for setting the minimal threshold for algorithm
697parallelization.  Parallelism always incurs some overhead. Thus, it is
698not helpful to parallelize operations on very small sets of
699data. Because of this, measures are taken to avoid parallelizing below
700a certain, pre-determined threshold. For each algorithm, a minimum
701problem size is encoded as a variable in the
702active <classname>__gnu_parallel::_Settings</classname> object.  This
703threshold variable follows the following naming scheme:
704<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So,
705for <function>fill</function>, the threshold variable
706is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
707</para>
708
709<para>
710Finally, hardware details like L1/L2 cache size can be hardwired
711via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
712</para>
713
714<para>
715</para>
716
717<para>
718All these configuration variables can be changed by the user, if
719desired.
720There exists one global instance of the class <classname>_Settings</classname>,
721i. e. it is a singleton. It can be read and written by calling
722<code>__gnu_parallel::_Settings::get</code> and
723<code>__gnu_parallel::_Settings::set</code>, respectively.
724Please note that the first call return a const object, so direct manipulation
725is forbidden.
726See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html">
727  <filename class="headerfile">settings.h</filename></link>
728for complete details.
729</para>
730
731<para>
732A small example of tuning the default:
733</para>
734
735<programlisting>
736#include &lt;parallel/algorithm&gt;
737#include &lt;parallel/settings.h&gt;
738
739int main()
740{
741  __gnu_parallel::_Settings s;
742  s.algorithm_strategy = __gnu_parallel::force_parallel;
743  __gnu_parallel::_Settings::set(s);
744
745  // Do work... all algorithms will be parallelized, always.
746
747  return 0;
748}
749</programlisting>
750
751</section>
752
753</section>
754
755<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
756  
757
758<para> One namespace contain versions of code that are always
759explicitly sequential:
760<code>__gnu_serial</code>.
761</para>
762
763<para> Two namespaces contain the parallel mode:
764<code>std::__parallel</code> and <code>__gnu_parallel</code>.
765</para>
766
767<para> Parallel implementations of standard components, including
768template helpers to select parallelism, are defined in <code>namespace
769std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
770<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
771implementations are injected into <code>namespace
772__gnu_parallel</code> with using declarations.
773</para>
774
775<para> Support and general infrastructure is in <code>namespace
776__gnu_parallel</code>.
777</para>
778
779<para> More information, and an organized index of types and functions
780related to the parallel mode on a per-namespace basis, can be found in
781the generated source documentation.
782</para>
783
784</section>
785
786</section>
787
788<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
789<?dbhtml filename="parallel_mode_test.html"?>
790  
791
792  <para>
793    Both the normal conformance and regression tests and the
794    supplemental performance tests work.
795  </para>
796
797  <para>
798    To run the conformance and regression tests with the parallel mode
799    active,
800  </para>
801
802  <screen>
803  <userinput>make check-parallel</userinput>
804  </screen>
805
806  <para>
807    The log and summary files for conformance testing are in the
808    <filename class="directory">testsuite/parallel</filename> directory.
809  </para>
810
811  <para>
812    To run the performance tests with the parallel mode active,
813  </para>
814
815  <screen>
816  <userinput>make check-performance-parallel</userinput>
817  </screen>
818
819  <para>
820    The result file for performance testing are in the
821    <filename class="directory">testsuite</filename> directory, in the file
822    <filename>libstdc++_performance.sum</filename>. In addition, the
823    policy-based containers have their own visualizations, which have
824    additional software dependencies than the usual bare-boned text
825    file, and can be generated by using the <code>make
826    doc-performance</code> rule in the testsuite's Makefile.
827</para>
828</section>
829
830<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
831
832
833  <biblioentry>
834    <citetitle>
835      Parallelization of Bulk Operations for STL Dictionaries
836    </citetitle>
837
838    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
839    <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
840
841    <copyright>
842      <year>2007</year>
843      <holder/>
844    </copyright>
845
846    <publisher>
847      <publishername>
848	Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
849      </publishername>
850    </publisher>
851  </biblioentry>
852
853  <biblioentry>
854    <citetitle>
855      The Multi-Core Standard Template Library
856    </citetitle>
857
858    <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
859    <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
860    <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
861
862    <copyright>
863      <year>2007</year>
864      <holder/>
865    </copyright>
866
867    <publisher>
868      <publishername>
869	 Euro-Par 2007: Parallel Processing. (LNCS 4641)
870      </publishername>
871    </publisher>
872  </biblioentry>
873
874</bibliography>
875
876</chapter>
877