parallel_mode.xml revision 1.1.1.5
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> 3<?dbhtml filename="parallel_mode.html"?> 4 5<info><title>Parallel Mode</title> 6 <keywordset> 7 <keyword>C++</keyword> 8 <keyword>library</keyword> 9 <keyword>parallel</keyword> 10 </keywordset> 11</info> 12 13 14 15<para> The libstdc++ parallel mode is an experimental parallel 16implementation of many algorithms of the C++ Standard Library. 17</para> 18 19<para> 20Several of the standard algorithms, for instance 21<function>std::sort</function>, are made parallel using OpenMP 22annotations. These parallel mode constructs can be invoked by 23explicit source declaration or by compiling existing sources with a 24specific compiler flag. 25</para> 26 27<note> 28 <para> 29 The parallel mode has not been kept up to date with recent C++ standards 30 and so it only conforms to the C++03 requirements. 31 That means that move-only predicates may not work with parallel mode 32 algorithms, and for C++20 most of the algorithms cannot be used in 33 <code>constexpr</code> functions. 34 </para> 35 <para> 36 For C++17 and above there are new overloads of the standard algorithms 37 which take an execution policy argument. You should consider using those 38 instead of the non-standard parallel mode extensions. 39 </para> 40</note> 41 42<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> 43 44 45<para>The following library components in the include 46<filename class="headerfile">numeric</filename> are included in the parallel mode:</para> 47<itemizedlist> 48 <listitem><para><function>std::accumulate</function></para></listitem> 49 <listitem><para><function>std::adjacent_difference</function></para></listitem> 50 <listitem><para><function>std::inner_product</function></para></listitem> 51 <listitem><para><function>std::partial_sum</function></para></listitem> 52</itemizedlist> 53 54<para>The following library components in the include 55<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> 56<itemizedlist> 57 <listitem><para><function>std::adjacent_find</function></para></listitem> 58 <listitem><para><function>std::count</function></para></listitem> 59 <listitem><para><function>std::count_if</function></para></listitem> 60 <listitem><para><function>std::equal</function></para></listitem> 61 <listitem><para><function>std::find</function></para></listitem> 62 <listitem><para><function>std::find_if</function></para></listitem> 63 <listitem><para><function>std::find_first_of</function></para></listitem> 64 <listitem><para><function>std::for_each</function></para></listitem> 65 <listitem><para><function>std::generate</function></para></listitem> 66 <listitem><para><function>std::generate_n</function></para></listitem> 67 <listitem><para><function>std::lexicographical_compare</function></para></listitem> 68 <listitem><para><function>std::mismatch</function></para></listitem> 69 <listitem><para><function>std::search</function></para></listitem> 70 <listitem><para><function>std::search_n</function></para></listitem> 71 <listitem><para><function>std::transform</function></para></listitem> 72 <listitem><para><function>std::replace</function></para></listitem> 73 <listitem><para><function>std::replace_if</function></para></listitem> 74 <listitem><para><function>std::max_element</function></para></listitem> 75 <listitem><para><function>std::merge</function></para></listitem> 76 <listitem><para><function>std::min_element</function></para></listitem> 77 <listitem><para><function>std::nth_element</function></para></listitem> 78 <listitem><para><function>std::partial_sort</function></para></listitem> 79 <listitem><para><function>std::partition</function></para></listitem> 80 <listitem><para><function>std::random_shuffle</function></para></listitem> 81 <listitem><para><function>std::set_union</function></para></listitem> 82 <listitem><para><function>std::set_intersection</function></para></listitem> 83 <listitem><para><function>std::set_symmetric_difference</function></para></listitem> 84 <listitem><para><function>std::set_difference</function></para></listitem> 85 <listitem><para><function>std::sort</function></para></listitem> 86 <listitem><para><function>std::stable_sort</function></para></listitem> 87 <listitem><para><function>std::unique_copy</function></para></listitem> 88</itemizedlist> 89 90</section> 91 92<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> 93<?dbhtml filename="parallel_mode_semantics.html"?> 94 95 96<para> The parallel mode STL algorithms are currently not exception-safe, 97i.e. user-defined functors must not throw exceptions. 98Also, the order of execution is not guaranteed for some functions, of course. 99Therefore, user-defined functors should not have any concurrent side effects. 100</para> 101 102<para> Since the current GCC OpenMP implementation does not support 103OpenMP parallel regions in concurrent threads, 104it is not possible to call parallel STL algorithm in 105concurrent threads, either. 106It might work with other compilers, though.</para> 107 108</section> 109 110<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> 111<?dbhtml filename="parallel_mode_using.html"?> 112 113 114<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> 115 116 117<para> 118 Any use of parallel functionality requires additional compiler 119 and runtime support, in particular support for OpenMP. Adding this support is 120 not difficult: just compile your application with the compiler 121 flag <literal>-fopenmp</literal>. This will link 122 in <code>libgomp</code>, the 123 <link xmlns:xlink="http://www.w3.org/1999/xlink" 124 xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and 125 Multi Processing Runtime Library</link>, 126 whose presence is mandatory. 127</para> 128 129<para> 130In addition, hardware that supports atomic operations and a compiler 131 capable of producing atomic operations is mandatory: GCC defaults to no 132 support for atomic operations on some common hardware 133 architectures. Activating atomic operations may require explicit 134 compiler flags on some targets (like sparc and x86), such 135 as <literal>-march=i686</literal>, 136 <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See 137 the GCC manual for more information. 138</para> 139 140</section> 141 142<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> 143 144 145<para> 146 To use the libstdc++ parallel mode, compile your application with 147 the prerequisite flags as detailed above, and in addition 148 add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all 149 use of the standard (sequential) algorithms to the appropriate parallel 150 equivalents. Please note that this doesn't necessarily mean that 151 everything will end up being executed in a parallel manner, but 152 rather that the heuristics and settings coded into the parallel 153 versions will be used to determine if all, some, or no algorithms 154 will be executed using parallel variants. 155</para> 156 157<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the 158 sizes and behavior of standard class templates such as 159 <function>std::search</function>, and therefore one can only link code 160 compiled with parallel mode and code compiled without parallel mode 161 if no instantiation of a container is passed between the two 162 translation units. Parallel mode functionality has distinct linkage, 163 and cannot be confused with normal mode symbols. 164</para> 165</section> 166 167<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> 168 169 170<para>When it is not feasible to recompile your entire application, or 171 only specific algorithms need to be parallel-aware, individual 172 parallel algorithms can be made available explicitly. These 173 parallel algorithms are functionally equivalent to the standard 174 drop-in algorithms used in parallel mode, but they are available in 175 a separate namespace as GNU extensions and may be used in programs 176 compiled with either release mode or with parallel mode. 177</para> 178 179 180<para>An example of using a parallel version 181of <function>std::sort</function>, but no other parallel algorithms, is: 182</para> 183 184<programlisting> 185#include <vector> 186#include <parallel/algorithm> 187 188int main() 189{ 190 std::vector<int> v(100); 191 192 // ... 193 194 // Explicitly force a call to parallel sort. 195 __gnu_parallel::sort(v.begin(), v.end()); 196 return 0; 197} 198</programlisting> 199 200<para> 201Then compile this code with the prerequisite compiler flags 202(<literal>-fopenmp</literal> and any necessary architecture-specific 203flags for atomic operations.) 204</para> 205 206<para> The following table provides the names and headers of all the 207 parallel algorithms that can be used in a similar manner: 208</para> 209 210<table frame="all" xml:id="table.parallel_algos"> 211<title>Parallel Algorithms</title> 212 213<tgroup cols="4" align="left" colsep="1" rowsep="1"> 214<colspec colname="c1"/> 215<colspec colname="c2"/> 216<colspec colname="c3"/> 217<colspec colname="c4"/> 218 219<thead> 220 <row> 221 <entry>Algorithm</entry> 222 <entry>Header</entry> 223 <entry>Parallel algorithm</entry> 224 <entry>Parallel header</entry> 225 </row> 226</thead> 227 228<tbody> 229 <row> 230 <entry><function>std::accumulate</function></entry> 231 <entry><filename class="headerfile">numeric</filename></entry> 232 <entry><function>__gnu_parallel::accumulate</function></entry> 233 <entry><filename class="headerfile">parallel/numeric</filename></entry> 234 </row> 235 <row> 236 <entry><function>std::adjacent_difference</function></entry> 237 <entry><filename class="headerfile">numeric</filename></entry> 238 <entry><function>__gnu_parallel::adjacent_difference</function></entry> 239 <entry><filename class="headerfile">parallel/numeric</filename></entry> 240 </row> 241 <row> 242 <entry><function>std::inner_product</function></entry> 243 <entry><filename class="headerfile">numeric</filename></entry> 244 <entry><function>__gnu_parallel::inner_product</function></entry> 245 <entry><filename class="headerfile">parallel/numeric</filename></entry> 246 </row> 247 <row> 248 <entry><function>std::partial_sum</function></entry> 249 <entry><filename class="headerfile">numeric</filename></entry> 250 <entry><function>__gnu_parallel::partial_sum</function></entry> 251 <entry><filename class="headerfile">parallel/numeric</filename></entry> 252 </row> 253 <row> 254 <entry><function>std::adjacent_find</function></entry> 255 <entry><filename class="headerfile">algorithm</filename></entry> 256 <entry><function>__gnu_parallel::adjacent_find</function></entry> 257 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 258 </row> 259 260 <row> 261 <entry><function>std::count</function></entry> 262 <entry><filename class="headerfile">algorithm</filename></entry> 263 <entry><function>__gnu_parallel::count</function></entry> 264 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 265 </row> 266 267 <row> 268 <entry><function>std::count_if</function></entry> 269 <entry><filename class="headerfile">algorithm</filename></entry> 270 <entry><function>__gnu_parallel::count_if</function></entry> 271 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 272 </row> 273 274 <row> 275 <entry><function>std::equal</function></entry> 276 <entry><filename class="headerfile">algorithm</filename></entry> 277 <entry><function>__gnu_parallel::equal</function></entry> 278 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 279 </row> 280 281 <row> 282 <entry><function>std::find</function></entry> 283 <entry><filename class="headerfile">algorithm</filename></entry> 284 <entry><function>__gnu_parallel::find</function></entry> 285 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 286 </row> 287 288 <row> 289 <entry><function>std::find_if</function></entry> 290 <entry><filename class="headerfile">algorithm</filename></entry> 291 <entry><function>__gnu_parallel::find_if</function></entry> 292 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 293 </row> 294 295 <row> 296 <entry><function>std::find_first_of</function></entry> 297 <entry><filename class="headerfile">algorithm</filename></entry> 298 <entry><function>__gnu_parallel::find_first_of</function></entry> 299 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 300 </row> 301 302 <row> 303 <entry><function>std::for_each</function></entry> 304 <entry><filename class="headerfile">algorithm</filename></entry> 305 <entry><function>__gnu_parallel::for_each</function></entry> 306 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 307 </row> 308 309 <row> 310 <entry><function>std::generate</function></entry> 311 <entry><filename class="headerfile">algorithm</filename></entry> 312 <entry><function>__gnu_parallel::generate</function></entry> 313 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 314 </row> 315 316 <row> 317 <entry><function>std::generate_n</function></entry> 318 <entry><filename class="headerfile">algorithm</filename></entry> 319 <entry><function>__gnu_parallel::generate_n</function></entry> 320 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 321 </row> 322 323 <row> 324 <entry><function>std::lexicographical_compare</function></entry> 325 <entry><filename class="headerfile">algorithm</filename></entry> 326 <entry><function>__gnu_parallel::lexicographical_compare</function></entry> 327 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 328 </row> 329 330 <row> 331 <entry><function>std::mismatch</function></entry> 332 <entry><filename class="headerfile">algorithm</filename></entry> 333 <entry><function>__gnu_parallel::mismatch</function></entry> 334 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 335 </row> 336 337 <row> 338 <entry><function>std::search</function></entry> 339 <entry><filename class="headerfile">algorithm</filename></entry> 340 <entry><function>__gnu_parallel::search</function></entry> 341 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 342 </row> 343 344 <row> 345 <entry><function>std::search_n</function></entry> 346 <entry><filename class="headerfile">algorithm</filename></entry> 347 <entry><function>__gnu_parallel::search_n</function></entry> 348 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 349 </row> 350 351 <row> 352 <entry><function>std::transform</function></entry> 353 <entry><filename class="headerfile">algorithm</filename></entry> 354 <entry><function>__gnu_parallel::transform</function></entry> 355 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 356 </row> 357 358 <row> 359 <entry><function>std::replace</function></entry> 360 <entry><filename class="headerfile">algorithm</filename></entry> 361 <entry><function>__gnu_parallel::replace</function></entry> 362 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 363 </row> 364 365 <row> 366 <entry><function>std::replace_if</function></entry> 367 <entry><filename class="headerfile">algorithm</filename></entry> 368 <entry><function>__gnu_parallel::replace_if</function></entry> 369 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 370 </row> 371 372 <row> 373 <entry><function>std::max_element</function></entry> 374 <entry><filename class="headerfile">algorithm</filename></entry> 375 <entry><function>__gnu_parallel::max_element</function></entry> 376 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 377 </row> 378 379 <row> 380 <entry><function>std::merge</function></entry> 381 <entry><filename class="headerfile">algorithm</filename></entry> 382 <entry><function>__gnu_parallel::merge</function></entry> 383 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 384 </row> 385 386 <row> 387 <entry><function>std::min_element</function></entry> 388 <entry><filename class="headerfile">algorithm</filename></entry> 389 <entry><function>__gnu_parallel::min_element</function></entry> 390 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 391 </row> 392 393 <row> 394 <entry><function>std::nth_element</function></entry> 395 <entry><filename class="headerfile">algorithm</filename></entry> 396 <entry><function>__gnu_parallel::nth_element</function></entry> 397 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 398 </row> 399 400 <row> 401 <entry><function>std::partial_sort</function></entry> 402 <entry><filename class="headerfile">algorithm</filename></entry> 403 <entry><function>__gnu_parallel::partial_sort</function></entry> 404 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 405 </row> 406 407 <row> 408 <entry><function>std::partition</function></entry> 409 <entry><filename class="headerfile">algorithm</filename></entry> 410 <entry><function>__gnu_parallel::partition</function></entry> 411 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 412 </row> 413 414 <row> 415 <entry><function>std::random_shuffle</function></entry> 416 <entry><filename class="headerfile">algorithm</filename></entry> 417 <entry><function>__gnu_parallel::random_shuffle</function></entry> 418 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 419 </row> 420 421 <row> 422 <entry><function>std::set_union</function></entry> 423 <entry><filename class="headerfile">algorithm</filename></entry> 424 <entry><function>__gnu_parallel::set_union</function></entry> 425 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 426 </row> 427 428 <row> 429 <entry><function>std::set_intersection</function></entry> 430 <entry><filename class="headerfile">algorithm</filename></entry> 431 <entry><function>__gnu_parallel::set_intersection</function></entry> 432 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 433 </row> 434 435 <row> 436 <entry><function>std::set_symmetric_difference</function></entry> 437 <entry><filename class="headerfile">algorithm</filename></entry> 438 <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> 439 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 440 </row> 441 442 <row> 443 <entry><function>std::set_difference</function></entry> 444 <entry><filename class="headerfile">algorithm</filename></entry> 445 <entry><function>__gnu_parallel::set_difference</function></entry> 446 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 447 </row> 448 449 <row> 450 <entry><function>std::sort</function></entry> 451 <entry><filename class="headerfile">algorithm</filename></entry> 452 <entry><function>__gnu_parallel::sort</function></entry> 453 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 454 </row> 455 456 <row> 457 <entry><function>std::stable_sort</function></entry> 458 <entry><filename class="headerfile">algorithm</filename></entry> 459 <entry><function>__gnu_parallel::stable_sort</function></entry> 460 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 461 </row> 462 463 <row> 464 <entry><function>std::unique_copy</function></entry> 465 <entry><filename class="headerfile">algorithm</filename></entry> 466 <entry><function>__gnu_parallel::unique_copy</function></entry> 467 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 468 </row> 469</tbody> 470</tgroup> 471</table> 472 473</section> 474 475</section> 476 477<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> 478<?dbhtml filename="parallel_mode_design.html"?> 479 480 <para> 481 </para> 482<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> 483 484 485<para> 486All parallel algorithms are intended to have signatures that are 487equivalent to the ISO C++ algorithms replaced. For instance, the 488<function>std::adjacent_find</function> function is declared as: 489</para> 490<programlisting> 491namespace std 492{ 493 template<typename _FIter> 494 _FIter 495 adjacent_find(_FIter, _FIter); 496} 497</programlisting> 498 499<para> 500Which means that there should be something equivalent for the parallel 501version. Indeed, this is the case: 502</para> 503 504<programlisting> 505namespace std 506{ 507 namespace __parallel 508 { 509 template<typename _FIter> 510 _FIter 511 adjacent_find(_FIter, _FIter); 512 513 ... 514 } 515} 516</programlisting> 517 518<para>But.... why the ellipses? 519</para> 520 521<para> The ellipses in the example above represent additional overloads 522required for the parallel version of the function. These additional 523overloads are used to dispatch calls from the ISO C++ function 524signature to the appropriate parallel function (or sequential 525function, if no parallel functions are deemed worthy), based on either 526compile-time or run-time conditions. 527</para> 528 529<para> The available signature options are specific for the different 530algorithms/algorithm classes.</para> 531 532<para> The general view of overloads for the parallel algorithms look like this: 533</para> 534<itemizedlist> 535 <listitem><para>ISO C++ signature</para></listitem> 536 <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> 537 <listitem><para>ISO C++ signature + algorithm-specific tag type 538 (several signatures)</para></listitem> 539</itemizedlist> 540 541<para> Please note that the implementation may use additional functions 542(designated with the <code>_switch</code> suffix) to dispatch from the 543ISO C++ signature to the correct parallel version. Also, some of the 544algorithms do not have support for run-time conditions, so the last 545overload is therefore missing. 546</para> 547 548 549</section> 550 551<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> 552 553 554 555<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> 556 557 558<para> 559Several aspects of the overall runtime environment can be manipulated 560by standard OpenMP function calls. 561</para> 562 563<para> 564To specify the number of threads to be used for the algorithms globally, 565use the function <function>omp_set_num_threads</function>. An example: 566</para> 567 568<programlisting> 569#include <stdlib.h> 570#include <omp.h> 571 572int main() 573{ 574 // Explicitly set number of threads. 575 const int threads_wanted = 20; 576 omp_set_dynamic(false); 577 omp_set_num_threads(threads_wanted); 578 579 // Call parallel mode algorithms. 580 581 return 0; 582} 583</programlisting> 584 585<para> 586 Some algorithms allow the number of threads being set for a particular call, 587 by augmenting the algorithm variant. 588 See the next section for further information. 589</para> 590 591<para> 592Other parts of the runtime environment able to be manipulated include 593nested parallelism (<function>omp_set_nested</function>), schedule kind 594(<function>omp_set_schedule</function>), and others. See the OpenMP 595documentation for more information. 596</para> 597 598</section> 599 600<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> 601 602 603<para> 604To force an algorithm to execute sequentially, even though parallelism 605is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, 606add <classname>__gnu_parallel::sequential_tag()</classname> to the end 607of the algorithm's argument list. 608</para> 609 610<para> 611Like so: 612</para> 613 614<programlisting> 615std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); 616</programlisting> 617 618<para> 619Some parallel algorithm variants can be excluded from compilation by 620preprocessor defines. See the doxygen documentation on 621<code>compiletime_settings.h</code> and <code>features.h</code> for details. 622</para> 623 624<para> 625For some algorithms, the desired variant can be chosen at compile-time by 626appending a tag object. The available options are specific to the particular 627algorithm (class). 628</para> 629 630<para> 631For the "embarrassingly parallel" algorithms, there is only one "tag object 632type", the enum _Parallelism. 633It takes one of the following values, 634<code>__gnu_parallel::parallel_tag</code>, 635<code>__gnu_parallel::balanced_tag</code>, 636<code>__gnu_parallel::unbalanced_tag</code>, 637<code>__gnu_parallel::omp_loop_tag</code>, 638<code>__gnu_parallel::omp_loop_static_tag</code>. 639This means that the actual parallelization strategy is chosen at run-time. 640(Choosing the variants at compile-time will come soon.) 641</para> 642 643<para> 644For the following algorithms in general, we have 645<code>__gnu_parallel::parallel_tag</code> and 646<code>__gnu_parallel::default_parallel_tag</code>, in addition to 647<code>__gnu_parallel::sequential_tag</code>. 648<code>__gnu_parallel::default_parallel_tag</code> chooses the default 649algorithm at compiletime, as does omitting the tag. 650<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime 651(see next section). 652For all tags, the number of threads desired for this call can optionally be 653passed to the respective tag's constructor. 654</para> 655 656<para> 657The <code>multiway_merge</code> algorithm comes with the additional choices, 658<code>__gnu_parallel::exact_tag</code> and 659<code>__gnu_parallel::sampling_tag</code>. 660Exact and sampling are the two available splitting strategies. 661</para> 662 663<para> 664For the <code>sort</code> and <code>stable_sort</code> algorithms, there are 665several additional choices, namely 666<code>__gnu_parallel::multiway_mergesort_tag</code>, 667<code>__gnu_parallel::multiway_mergesort_exact_tag</code>, 668<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, 669<code>__gnu_parallel::quicksort_tag</code>, and 670<code>__gnu_parallel::balanced_quicksort_tag</code>. 671Multiway mergesort comes with the two splitting strategies for multi-way 672merging. The quicksort options cannot be used for <code>stable_sort</code>. 673</para> 674 675</section> 676 677<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> 678 679 680<para> 681The default parallelization strategy, the choice of specific algorithm 682strategy, the minimum threshold limits for individual parallel 683algorithms, and aspects of the underlying hardware can be specified as 684desired via manipulation 685of <classname>__gnu_parallel::_Settings</classname> member data. 686</para> 687 688<para> 689First off, the choice of parallelization strategy: serial, parallel, 690or heuristically deduced. This corresponds 691to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a 692value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> 693type. Choices 694include: <type>heuristic</type>, <type>force_sequential</type>, 695and <type>force_parallel</type>. The default is <type>heuristic</type>. 696</para> 697 698 699<para> 700Next, the sub-choices for algorithm variant, if not fixed at compile-time. 701Specific algorithms like <function>find</function> or <function>sort</function> 702can be implemented in multiple ways: when this is the case, 703a <classname>__gnu_parallel::_Settings</classname> member exists to 704pick the default strategy. For 705example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can 706have any values of 707enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, 708or <type>QS_BALANCED</type>. 709</para> 710 711<para> 712Likewise for setting the minimal threshold for algorithm 713parallelization. Parallelism always incurs some overhead. Thus, it is 714not helpful to parallelize operations on very small sets of 715data. Because of this, measures are taken to avoid parallelizing below 716a certain, pre-determined threshold. For each algorithm, a minimum 717problem size is encoded as a variable in the 718active <classname>__gnu_parallel::_Settings</classname> object. This 719threshold variable follows the following naming scheme: 720<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, 721for <function>fill</function>, the threshold variable 722is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, 723</para> 724 725<para> 726Finally, hardware details like L1/L2 cache size can be hardwired 727via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. 728</para> 729 730<para> 731</para> 732 733<para> 734All these configuration variables can be changed by the user, if 735desired. 736There exists one global instance of the class <classname>_Settings</classname>, 737i. e. it is a singleton. It can be read and written by calling 738<code>__gnu_parallel::_Settings::get</code> and 739<code>__gnu_parallel::_Settings::set</code>, respectively. 740Please note that the first call return a const object, so direct manipulation 741is forbidden. 742See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html"> 743 <filename class="headerfile"><parallel/settings.h></filename></link> 744for complete details. 745</para> 746 747<para> 748A small example of tuning the default: 749</para> 750 751<programlisting> 752#include <parallel/algorithm> 753#include <parallel/settings.h> 754 755int main() 756{ 757 __gnu_parallel::_Settings s; 758 s.algorithm_strategy = __gnu_parallel::force_parallel; 759 __gnu_parallel::_Settings::set(s); 760 761 // Do work... all algorithms will be parallelized, always. 762 763 return 0; 764} 765</programlisting> 766 767</section> 768 769</section> 770 771<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> 772 773 774<para> One namespace contain versions of code that are always 775explicitly sequential: 776<code>__gnu_serial</code>. 777</para> 778 779<para> Two namespaces contain the parallel mode: 780<code>std::__parallel</code> and <code>__gnu_parallel</code>. 781</para> 782 783<para> Parallel implementations of standard components, including 784template helpers to select parallelism, are defined in <code>namespace 785std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in 786<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel 787implementations are injected into <code>namespace 788__gnu_parallel</code> with using declarations. 789</para> 790 791<para> Support and general infrastructure is in <code>namespace 792__gnu_parallel</code>. 793</para> 794 795<para> More information, and an organized index of types and functions 796related to the parallel mode on a per-namespace basis, can be found in 797the generated source documentation. 798</para> 799 800</section> 801 802</section> 803 804<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> 805<?dbhtml filename="parallel_mode_test.html"?> 806 807 808 <para> 809 Both the normal conformance and regression tests and the 810 supplemental performance tests work. 811 </para> 812 813 <para> 814 To run the conformance and regression tests with the parallel mode 815 active, 816 </para> 817 818 <screen> 819 <userinput>make check-parallel</userinput> 820 </screen> 821 822 <para> 823 The log and summary files for conformance testing are in the 824 <filename class="directory">testsuite/parallel</filename> directory. 825 </para> 826 827 <para> 828 To run the performance tests with the parallel mode active, 829 </para> 830 831 <screen> 832 <userinput>make check-performance-parallel</userinput> 833 </screen> 834 835 <para> 836 The result file for performance testing are in the 837 <filename class="directory">testsuite</filename> directory, in the file 838 <filename>libstdc++_performance.sum</filename>. In addition, the 839 policy-based containers have their own visualizations, which have 840 additional software dependencies than the usual bare-boned text 841 file, and can be generated by using the <code>make 842 doc-performance</code> rule in the testsuite's Makefile. 843</para> 844</section> 845 846<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> 847 848 849 <biblioentry> 850 <citetitle> 851 Parallelization of Bulk Operations for STL Dictionaries 852 </citetitle> 853 854 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 855 <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> 856 857 <copyright> 858 <year>2007</year> 859 <holder/> 860 </copyright> 861 862 <publisher> 863 <publishername> 864 Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) 865 </publishername> 866 </publisher> 867 </biblioentry> 868 869 <biblioentry> 870 <citetitle> 871 The Multi-Core Standard Template Library 872 </citetitle> 873 874 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 875 <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> 876 <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> 877 878 <copyright> 879 <year>2007</year> 880 <holder/> 881 </copyright> 882 883 <publisher> 884 <publishername> 885 Euro-Par 2007: Parallel Processing. (LNCS 4641) 886 </publishername> 887 </publisher> 888 </biblioentry> 889 890</bibliography> 891 892</chapter> 893