parallel_mode.xml revision 1.1.1.3
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> 3<?dbhtml filename="parallel_mode.html"?> 4 5<info><title>Parallel Mode</title> 6 <keywordset> 7 <keyword>C++</keyword> 8 <keyword>library</keyword> 9 <keyword>parallel</keyword> 10 </keywordset> 11</info> 12 13 14 15<para> The libstdc++ parallel mode is an experimental parallel 16implementation of many algorithms the C++ Standard Library. 17</para> 18 19<para> 20Several of the standard algorithms, for instance 21<function>std::sort</function>, are made parallel using OpenMP 22annotations. These parallel mode constructs and can be invoked by 23explicit source declaration or by compiling existing sources with a 24specific compiler flag. 25</para> 26 27 28<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> 29 30 31<para>The following library components in the include 32<filename class="headerfile">numeric</filename> are included in the parallel mode:</para> 33<itemizedlist> 34 <listitem><para><function>std::accumulate</function></para></listitem> 35 <listitem><para><function>std::adjacent_difference</function></para></listitem> 36 <listitem><para><function>std::inner_product</function></para></listitem> 37 <listitem><para><function>std::partial_sum</function></para></listitem> 38</itemizedlist> 39 40<para>The following library components in the include 41<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> 42<itemizedlist> 43 <listitem><para><function>std::adjacent_find</function></para></listitem> 44 <listitem><para><function>std::count</function></para></listitem> 45 <listitem><para><function>std::count_if</function></para></listitem> 46 <listitem><para><function>std::equal</function></para></listitem> 47 <listitem><para><function>std::find</function></para></listitem> 48 <listitem><para><function>std::find_if</function></para></listitem> 49 <listitem><para><function>std::find_first_of</function></para></listitem> 50 <listitem><para><function>std::for_each</function></para></listitem> 51 <listitem><para><function>std::generate</function></para></listitem> 52 <listitem><para><function>std::generate_n</function></para></listitem> 53 <listitem><para><function>std::lexicographical_compare</function></para></listitem> 54 <listitem><para><function>std::mismatch</function></para></listitem> 55 <listitem><para><function>std::search</function></para></listitem> 56 <listitem><para><function>std::search_n</function></para></listitem> 57 <listitem><para><function>std::transform</function></para></listitem> 58 <listitem><para><function>std::replace</function></para></listitem> 59 <listitem><para><function>std::replace_if</function></para></listitem> 60 <listitem><para><function>std::max_element</function></para></listitem> 61 <listitem><para><function>std::merge</function></para></listitem> 62 <listitem><para><function>std::min_element</function></para></listitem> 63 <listitem><para><function>std::nth_element</function></para></listitem> 64 <listitem><para><function>std::partial_sort</function></para></listitem> 65 <listitem><para><function>std::partition</function></para></listitem> 66 <listitem><para><function>std::random_shuffle</function></para></listitem> 67 <listitem><para><function>std::set_union</function></para></listitem> 68 <listitem><para><function>std::set_intersection</function></para></listitem> 69 <listitem><para><function>std::set_symmetric_difference</function></para></listitem> 70 <listitem><para><function>std::set_difference</function></para></listitem> 71 <listitem><para><function>std::sort</function></para></listitem> 72 <listitem><para><function>std::stable_sort</function></para></listitem> 73 <listitem><para><function>std::unique_copy</function></para></listitem> 74</itemizedlist> 75 76</section> 77 78<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> 79<?dbhtml filename="parallel_mode_semantics.html"?> 80 81 82<para> The parallel mode STL algorithms are currently not exception-safe, 83i.e. user-defined functors must not throw exceptions. 84Also, the order of execution is not guaranteed for some functions, of course. 85Therefore, user-defined functors should not have any concurrent side effects. 86</para> 87 88<para> Since the current GCC OpenMP implementation does not support 89OpenMP parallel regions in concurrent threads, 90it is not possible to call parallel STL algorithm in 91concurrent threads, either. 92It might work with other compilers, though.</para> 93 94</section> 95 96<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> 97<?dbhtml filename="parallel_mode_using.html"?> 98 99 100<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> 101 102 103<para> 104 Any use of parallel functionality requires additional compiler 105 and runtime support, in particular support for OpenMP. Adding this support is 106 not difficult: just compile your application with the compiler 107 flag <literal>-fopenmp</literal>. This will link 108 in <code>libgomp</code>, the 109 <link xmlns:xlink="http://www.w3.org/1999/xlink" 110 xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and 111 Multi Processing Runtime Library</link>, 112 whose presence is mandatory. 113</para> 114 115<para> 116In addition, hardware that supports atomic operations and a compiler 117 capable of producing atomic operations is mandatory: GCC defaults to no 118 support for atomic operations on some common hardware 119 architectures. Activating atomic operations may require explicit 120 compiler flags on some targets (like sparc and x86), such 121 as <literal>-march=i686</literal>, 122 <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See 123 the GCC manual for more information. 124</para> 125 126</section> 127 128<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> 129 130 131<para> 132 To use the libstdc++ parallel mode, compile your application with 133 the prerequisite flags as detailed above, and in addition 134 add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all 135 use of the standard (sequential) algorithms to the appropriate parallel 136 equivalents. Please note that this doesn't necessarily mean that 137 everything will end up being executed in a parallel manner, but 138 rather that the heuristics and settings coded into the parallel 139 versions will be used to determine if all, some, or no algorithms 140 will be executed using parallel variants. 141</para> 142 143<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the 144 sizes and behavior of standard class templates such as 145 <function>std::search</function>, and therefore one can only link code 146 compiled with parallel mode and code compiled without parallel mode 147 if no instantiation of a container is passed between the two 148 translation units. Parallel mode functionality has distinct linkage, 149 and cannot be confused with normal mode symbols. 150</para> 151</section> 152 153<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> 154 155 156<para>When it is not feasible to recompile your entire application, or 157 only specific algorithms need to be parallel-aware, individual 158 parallel algorithms can be made available explicitly. These 159 parallel algorithms are functionally equivalent to the standard 160 drop-in algorithms used in parallel mode, but they are available in 161 a separate namespace as GNU extensions and may be used in programs 162 compiled with either release mode or with parallel mode. 163</para> 164 165 166<para>An example of using a parallel version 167of <function>std::sort</function>, but no other parallel algorithms, is: 168</para> 169 170<programlisting> 171#include <vector> 172#include <parallel/algorithm> 173 174int main() 175{ 176 std::vector<int> v(100); 177 178 // ... 179 180 // Explicitly force a call to parallel sort. 181 __gnu_parallel::sort(v.begin(), v.end()); 182 return 0; 183} 184</programlisting> 185 186<para> 187Then compile this code with the prerequisite compiler flags 188(<literal>-fopenmp</literal> and any necessary architecture-specific 189flags for atomic operations.) 190</para> 191 192<para> The following table provides the names and headers of all the 193 parallel algorithms that can be used in a similar manner: 194</para> 195 196<table frame="all" xml:id="table.parallel_algos"> 197<title>Parallel Algorithms</title> 198 199<tgroup cols="4" align="left" colsep="1" rowsep="1"> 200<colspec colname="c1"/> 201<colspec colname="c2"/> 202<colspec colname="c3"/> 203<colspec colname="c4"/> 204 205<thead> 206 <row> 207 <entry>Algorithm</entry> 208 <entry>Header</entry> 209 <entry>Parallel algorithm</entry> 210 <entry>Parallel header</entry> 211 </row> 212</thead> 213 214<tbody> 215 <row> 216 <entry><function>std::accumulate</function></entry> 217 <entry><filename class="headerfile">numeric</filename></entry> 218 <entry><function>__gnu_parallel::accumulate</function></entry> 219 <entry><filename class="headerfile">parallel/numeric</filename></entry> 220 </row> 221 <row> 222 <entry><function>std::adjacent_difference</function></entry> 223 <entry><filename class="headerfile">numeric</filename></entry> 224 <entry><function>__gnu_parallel::adjacent_difference</function></entry> 225 <entry><filename class="headerfile">parallel/numeric</filename></entry> 226 </row> 227 <row> 228 <entry><function>std::inner_product</function></entry> 229 <entry><filename class="headerfile">numeric</filename></entry> 230 <entry><function>__gnu_parallel::inner_product</function></entry> 231 <entry><filename class="headerfile">parallel/numeric</filename></entry> 232 </row> 233 <row> 234 <entry><function>std::partial_sum</function></entry> 235 <entry><filename class="headerfile">numeric</filename></entry> 236 <entry><function>__gnu_parallel::partial_sum</function></entry> 237 <entry><filename class="headerfile">parallel/numeric</filename></entry> 238 </row> 239 <row> 240 <entry><function>std::adjacent_find</function></entry> 241 <entry><filename class="headerfile">algorithm</filename></entry> 242 <entry><function>__gnu_parallel::adjacent_find</function></entry> 243 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 244 </row> 245 246 <row> 247 <entry><function>std::count</function></entry> 248 <entry><filename class="headerfile">algorithm</filename></entry> 249 <entry><function>__gnu_parallel::count</function></entry> 250 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 251 </row> 252 253 <row> 254 <entry><function>std::count_if</function></entry> 255 <entry><filename class="headerfile">algorithm</filename></entry> 256 <entry><function>__gnu_parallel::count_if</function></entry> 257 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 258 </row> 259 260 <row> 261 <entry><function>std::equal</function></entry> 262 <entry><filename class="headerfile">algorithm</filename></entry> 263 <entry><function>__gnu_parallel::equal</function></entry> 264 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 265 </row> 266 267 <row> 268 <entry><function>std::find</function></entry> 269 <entry><filename class="headerfile">algorithm</filename></entry> 270 <entry><function>__gnu_parallel::find</function></entry> 271 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 272 </row> 273 274 <row> 275 <entry><function>std::find_if</function></entry> 276 <entry><filename class="headerfile">algorithm</filename></entry> 277 <entry><function>__gnu_parallel::find_if</function></entry> 278 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 279 </row> 280 281 <row> 282 <entry><function>std::find_first_of</function></entry> 283 <entry><filename class="headerfile">algorithm</filename></entry> 284 <entry><function>__gnu_parallel::find_first_of</function></entry> 285 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 286 </row> 287 288 <row> 289 <entry><function>std::for_each</function></entry> 290 <entry><filename class="headerfile">algorithm</filename></entry> 291 <entry><function>__gnu_parallel::for_each</function></entry> 292 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 293 </row> 294 295 <row> 296 <entry><function>std::generate</function></entry> 297 <entry><filename class="headerfile">algorithm</filename></entry> 298 <entry><function>__gnu_parallel::generate</function></entry> 299 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 300 </row> 301 302 <row> 303 <entry><function>std::generate_n</function></entry> 304 <entry><filename class="headerfile">algorithm</filename></entry> 305 <entry><function>__gnu_parallel::generate_n</function></entry> 306 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 307 </row> 308 309 <row> 310 <entry><function>std::lexicographical_compare</function></entry> 311 <entry><filename class="headerfile">algorithm</filename></entry> 312 <entry><function>__gnu_parallel::lexicographical_compare</function></entry> 313 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 314 </row> 315 316 <row> 317 <entry><function>std::mismatch</function></entry> 318 <entry><filename class="headerfile">algorithm</filename></entry> 319 <entry><function>__gnu_parallel::mismatch</function></entry> 320 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 321 </row> 322 323 <row> 324 <entry><function>std::search</function></entry> 325 <entry><filename class="headerfile">algorithm</filename></entry> 326 <entry><function>__gnu_parallel::search</function></entry> 327 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 328 </row> 329 330 <row> 331 <entry><function>std::search_n</function></entry> 332 <entry><filename class="headerfile">algorithm</filename></entry> 333 <entry><function>__gnu_parallel::search_n</function></entry> 334 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 335 </row> 336 337 <row> 338 <entry><function>std::transform</function></entry> 339 <entry><filename class="headerfile">algorithm</filename></entry> 340 <entry><function>__gnu_parallel::transform</function></entry> 341 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 342 </row> 343 344 <row> 345 <entry><function>std::replace</function></entry> 346 <entry><filename class="headerfile">algorithm</filename></entry> 347 <entry><function>__gnu_parallel::replace</function></entry> 348 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 349 </row> 350 351 <row> 352 <entry><function>std::replace_if</function></entry> 353 <entry><filename class="headerfile">algorithm</filename></entry> 354 <entry><function>__gnu_parallel::replace_if</function></entry> 355 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 356 </row> 357 358 <row> 359 <entry><function>std::max_element</function></entry> 360 <entry><filename class="headerfile">algorithm</filename></entry> 361 <entry><function>__gnu_parallel::max_element</function></entry> 362 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 363 </row> 364 365 <row> 366 <entry><function>std::merge</function></entry> 367 <entry><filename class="headerfile">algorithm</filename></entry> 368 <entry><function>__gnu_parallel::merge</function></entry> 369 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 370 </row> 371 372 <row> 373 <entry><function>std::min_element</function></entry> 374 <entry><filename class="headerfile">algorithm</filename></entry> 375 <entry><function>__gnu_parallel::min_element</function></entry> 376 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 377 </row> 378 379 <row> 380 <entry><function>std::nth_element</function></entry> 381 <entry><filename class="headerfile">algorithm</filename></entry> 382 <entry><function>__gnu_parallel::nth_element</function></entry> 383 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 384 </row> 385 386 <row> 387 <entry><function>std::partial_sort</function></entry> 388 <entry><filename class="headerfile">algorithm</filename></entry> 389 <entry><function>__gnu_parallel::partial_sort</function></entry> 390 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 391 </row> 392 393 <row> 394 <entry><function>std::partition</function></entry> 395 <entry><filename class="headerfile">algorithm</filename></entry> 396 <entry><function>__gnu_parallel::partition</function></entry> 397 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 398 </row> 399 400 <row> 401 <entry><function>std::random_shuffle</function></entry> 402 <entry><filename class="headerfile">algorithm</filename></entry> 403 <entry><function>__gnu_parallel::random_shuffle</function></entry> 404 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 405 </row> 406 407 <row> 408 <entry><function>std::set_union</function></entry> 409 <entry><filename class="headerfile">algorithm</filename></entry> 410 <entry><function>__gnu_parallel::set_union</function></entry> 411 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 412 </row> 413 414 <row> 415 <entry><function>std::set_intersection</function></entry> 416 <entry><filename class="headerfile">algorithm</filename></entry> 417 <entry><function>__gnu_parallel::set_intersection</function></entry> 418 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 419 </row> 420 421 <row> 422 <entry><function>std::set_symmetric_difference</function></entry> 423 <entry><filename class="headerfile">algorithm</filename></entry> 424 <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> 425 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 426 </row> 427 428 <row> 429 <entry><function>std::set_difference</function></entry> 430 <entry><filename class="headerfile">algorithm</filename></entry> 431 <entry><function>__gnu_parallel::set_difference</function></entry> 432 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 433 </row> 434 435 <row> 436 <entry><function>std::sort</function></entry> 437 <entry><filename class="headerfile">algorithm</filename></entry> 438 <entry><function>__gnu_parallel::sort</function></entry> 439 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 440 </row> 441 442 <row> 443 <entry><function>std::stable_sort</function></entry> 444 <entry><filename class="headerfile">algorithm</filename></entry> 445 <entry><function>__gnu_parallel::stable_sort</function></entry> 446 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 447 </row> 448 449 <row> 450 <entry><function>std::unique_copy</function></entry> 451 <entry><filename class="headerfile">algorithm</filename></entry> 452 <entry><function>__gnu_parallel::unique_copy</function></entry> 453 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 454 </row> 455</tbody> 456</tgroup> 457</table> 458 459</section> 460 461</section> 462 463<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> 464<?dbhtml filename="parallel_mode_design.html"?> 465 466 <para> 467 </para> 468<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> 469 470 471<para> 472All parallel algorithms are intended to have signatures that are 473equivalent to the ISO C++ algorithms replaced. For instance, the 474<function>std::adjacent_find</function> function is declared as: 475</para> 476<programlisting> 477namespace std 478{ 479 template<typename _FIter> 480 _FIter 481 adjacent_find(_FIter, _FIter); 482} 483</programlisting> 484 485<para> 486Which means that there should be something equivalent for the parallel 487version. Indeed, this is the case: 488</para> 489 490<programlisting> 491namespace std 492{ 493 namespace __parallel 494 { 495 template<typename _FIter> 496 _FIter 497 adjacent_find(_FIter, _FIter); 498 499 ... 500 } 501} 502</programlisting> 503 504<para>But.... why the ellipses? 505</para> 506 507<para> The ellipses in the example above represent additional overloads 508required for the parallel version of the function. These additional 509overloads are used to dispatch calls from the ISO C++ function 510signature to the appropriate parallel function (or sequential 511function, if no parallel functions are deemed worthy), based on either 512compile-time or run-time conditions. 513</para> 514 515<para> The available signature options are specific for the different 516algorithms/algorithm classes.</para> 517 518<para> The general view of overloads for the parallel algorithms look like this: 519</para> 520<itemizedlist> 521 <listitem><para>ISO C++ signature</para></listitem> 522 <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> 523 <listitem><para>ISO C++ signature + algorithm-specific tag type 524 (several signatures)</para></listitem> 525</itemizedlist> 526 527<para> Please note that the implementation may use additional functions 528(designated with the <code>_switch</code> suffix) to dispatch from the 529ISO C++ signature to the correct parallel version. Also, some of the 530algorithms do not have support for run-time conditions, so the last 531overload is therefore missing. 532</para> 533 534 535</section> 536 537<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> 538 539 540 541<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> 542 543 544<para> 545Several aspects of the overall runtime environment can be manipulated 546by standard OpenMP function calls. 547</para> 548 549<para> 550To specify the number of threads to be used for the algorithms globally, 551use the function <function>omp_set_num_threads</function>. An example: 552</para> 553 554<programlisting> 555#include <stdlib.h> 556#include <omp.h> 557 558int main() 559{ 560 // Explicitly set number of threads. 561 const int threads_wanted = 20; 562 omp_set_dynamic(false); 563 omp_set_num_threads(threads_wanted); 564 565 // Call parallel mode algorithms. 566 567 return 0; 568} 569</programlisting> 570 571<para> 572 Some algorithms allow the number of threads being set for a particular call, 573 by augmenting the algorithm variant. 574 See the next section for further information. 575</para> 576 577<para> 578Other parts of the runtime environment able to be manipulated include 579nested parallelism (<function>omp_set_nested</function>), schedule kind 580(<function>omp_set_schedule</function>), and others. See the OpenMP 581documentation for more information. 582</para> 583 584</section> 585 586<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> 587 588 589<para> 590To force an algorithm to execute sequentially, even though parallelism 591is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, 592add <classname>__gnu_parallel::sequential_tag()</classname> to the end 593of the algorithm's argument list. 594</para> 595 596<para> 597Like so: 598</para> 599 600<programlisting> 601std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); 602</programlisting> 603 604<para> 605Some parallel algorithm variants can be excluded from compilation by 606preprocessor defines. See the doxygen documentation on 607<code>compiletime_settings.h</code> and <code>features.h</code> for details. 608</para> 609 610<para> 611For some algorithms, the desired variant can be chosen at compile-time by 612appending a tag object. The available options are specific to the particular 613algorithm (class). 614</para> 615 616<para> 617For the "embarrassingly parallel" algorithms, there is only one "tag object 618type", the enum _Parallelism. 619It takes one of the following values, 620<code>__gnu_parallel::parallel_tag</code>, 621<code>__gnu_parallel::balanced_tag</code>, 622<code>__gnu_parallel::unbalanced_tag</code>, 623<code>__gnu_parallel::omp_loop_tag</code>, 624<code>__gnu_parallel::omp_loop_static_tag</code>. 625This means that the actual parallelization strategy is chosen at run-time. 626(Choosing the variants at compile-time will come soon.) 627</para> 628 629<para> 630For the following algorithms in general, we have 631<code>__gnu_parallel::parallel_tag</code> and 632<code>__gnu_parallel::default_parallel_tag</code>, in addition to 633<code>__gnu_parallel::sequential_tag</code>. 634<code>__gnu_parallel::default_parallel_tag</code> chooses the default 635algorithm at compiletime, as does omitting the tag. 636<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime 637(see next section). 638For all tags, the number of threads desired for this call can optionally be 639passed to the respective tag's constructor. 640</para> 641 642<para> 643The <code>multiway_merge</code> algorithm comes with the additional choices, 644<code>__gnu_parallel::exact_tag</code> and 645<code>__gnu_parallel::sampling_tag</code>. 646Exact and sampling are the two available splitting strategies. 647</para> 648 649<para> 650For the <code>sort</code> and <code>stable_sort</code> algorithms, there are 651several additional choices, namely 652<code>__gnu_parallel::multiway_mergesort_tag</code>, 653<code>__gnu_parallel::multiway_mergesort_exact_tag</code>, 654<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, 655<code>__gnu_parallel::quicksort_tag</code>, and 656<code>__gnu_parallel::balanced_quicksort_tag</code>. 657Multiway mergesort comes with the two splitting strategies for multi-way 658merging. The quicksort options cannot be used for <code>stable_sort</code>. 659</para> 660 661</section> 662 663<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> 664 665 666<para> 667The default parallelization strategy, the choice of specific algorithm 668strategy, the minimum threshold limits for individual parallel 669algorithms, and aspects of the underlying hardware can be specified as 670desired via manipulation 671of <classname>__gnu_parallel::_Settings</classname> member data. 672</para> 673 674<para> 675First off, the choice of parallelization strategy: serial, parallel, 676or heuristically deduced. This corresponds 677to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a 678value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> 679type. Choices 680include: <type>heuristic</type>, <type>force_sequential</type>, 681and <type>force_parallel</type>. The default is <type>heuristic</type>. 682</para> 683 684 685<para> 686Next, the sub-choices for algorithm variant, if not fixed at compile-time. 687Specific algorithms like <function>find</function> or <function>sort</function> 688can be implemented in multiple ways: when this is the case, 689a <classname>__gnu_parallel::_Settings</classname> member exists to 690pick the default strategy. For 691example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can 692have any values of 693enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, 694or <type>QS_BALANCED</type>. 695</para> 696 697<para> 698Likewise for setting the minimal threshold for algorithm 699parallelization. Parallelism always incurs some overhead. Thus, it is 700not helpful to parallelize operations on very small sets of 701data. Because of this, measures are taken to avoid parallelizing below 702a certain, pre-determined threshold. For each algorithm, a minimum 703problem size is encoded as a variable in the 704active <classname>__gnu_parallel::_Settings</classname> object. This 705threshold variable follows the following naming scheme: 706<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, 707for <function>fill</function>, the threshold variable 708is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, 709</para> 710 711<para> 712Finally, hardware details like L1/L2 cache size can be hardwired 713via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. 714</para> 715 716<para> 717</para> 718 719<para> 720All these configuration variables can be changed by the user, if 721desired. 722There exists one global instance of the class <classname>_Settings</classname>, 723i. e. it is a singleton. It can be read and written by calling 724<code>__gnu_parallel::_Settings::get</code> and 725<code>__gnu_parallel::_Settings::set</code>, respectively. 726Please note that the first call return a const object, so direct manipulation 727is forbidden. 728See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html"> 729 <filename class="headerfile">settings.h</filename></link> 730for complete details. 731</para> 732 733<para> 734A small example of tuning the default: 735</para> 736 737<programlisting> 738#include <parallel/algorithm> 739#include <parallel/settings.h> 740 741int main() 742{ 743 __gnu_parallel::_Settings s; 744 s.algorithm_strategy = __gnu_parallel::force_parallel; 745 __gnu_parallel::_Settings::set(s); 746 747 // Do work... all algorithms will be parallelized, always. 748 749 return 0; 750} 751</programlisting> 752 753</section> 754 755</section> 756 757<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> 758 759 760<para> One namespace contain versions of code that are always 761explicitly sequential: 762<code>__gnu_serial</code>. 763</para> 764 765<para> Two namespaces contain the parallel mode: 766<code>std::__parallel</code> and <code>__gnu_parallel</code>. 767</para> 768 769<para> Parallel implementations of standard components, including 770template helpers to select parallelism, are defined in <code>namespace 771std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in 772<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel 773implementations are injected into <code>namespace 774__gnu_parallel</code> with using declarations. 775</para> 776 777<para> Support and general infrastructure is in <code>namespace 778__gnu_parallel</code>. 779</para> 780 781<para> More information, and an organized index of types and functions 782related to the parallel mode on a per-namespace basis, can be found in 783the generated source documentation. 784</para> 785 786</section> 787 788</section> 789 790<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> 791<?dbhtml filename="parallel_mode_test.html"?> 792 793 794 <para> 795 Both the normal conformance and regression tests and the 796 supplemental performance tests work. 797 </para> 798 799 <para> 800 To run the conformance and regression tests with the parallel mode 801 active, 802 </para> 803 804 <screen> 805 <userinput>make check-parallel</userinput> 806 </screen> 807 808 <para> 809 The log and summary files for conformance testing are in the 810 <filename class="directory">testsuite/parallel</filename> directory. 811 </para> 812 813 <para> 814 To run the performance tests with the parallel mode active, 815 </para> 816 817 <screen> 818 <userinput>make check-performance-parallel</userinput> 819 </screen> 820 821 <para> 822 The result file for performance testing are in the 823 <filename class="directory">testsuite</filename> directory, in the file 824 <filename>libstdc++_performance.sum</filename>. In addition, the 825 policy-based containers have their own visualizations, which have 826 additional software dependencies than the usual bare-boned text 827 file, and can be generated by using the <code>make 828 doc-performance</code> rule in the testsuite's Makefile. 829</para> 830</section> 831 832<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> 833 834 835 <biblioentry> 836 <citetitle> 837 Parallelization of Bulk Operations for STL Dictionaries 838 </citetitle> 839 840 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 841 <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> 842 843 <copyright> 844 <year>2007</year> 845 <holder/> 846 </copyright> 847 848 <publisher> 849 <publishername> 850 Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) 851 </publishername> 852 </publisher> 853 </biblioentry> 854 855 <biblioentry> 856 <citetitle> 857 The Multi-Core Standard Template Library 858 </citetitle> 859 860 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 861 <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> 862 <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> 863 864 <copyright> 865 <year>2007</year> 866 <holder/> 867 </copyright> 868 869 <publisher> 870 <publishername> 871 Euro-Par 2007: Parallel Processing. (LNCS 4641) 872 </publishername> 873 </publisher> 874 </biblioentry> 875 876</bibliography> 877 878</chapter> 879