parallel_mode.xml revision 1.1.1.2
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> 3<?dbhtml filename="parallel_mode.html"?> 4 5<info><title>Parallel Mode</title> 6 <keywordset> 7 <keyword>C++</keyword> 8 <keyword>library</keyword> 9 <keyword>parallel</keyword> 10 </keywordset> 11</info> 12 13 14 15<para> The libstdc++ parallel mode is an experimental parallel 16implementation of many algorithms the C++ Standard Library. 17</para> 18 19<para> 20Several of the standard algorithms, for instance 21<function>std::sort</function>, are made parallel using OpenMP 22annotations. These parallel mode constructs and can be invoked by 23explicit source declaration or by compiling existing sources with a 24specific compiler flag. 25</para> 26 27 28<section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> 29 30 31<para>The following library components in the include 32<filename class="headerfile">numeric</filename> are included in the parallel mode:</para> 33<itemizedlist> 34 <listitem><para><function>std::accumulate</function></para></listitem> 35 <listitem><para><function>std::adjacent_difference</function></para></listitem> 36 <listitem><para><function>std::inner_product</function></para></listitem> 37 <listitem><para><function>std::partial_sum</function></para></listitem> 38</itemizedlist> 39 40<para>The following library components in the include 41<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> 42<itemizedlist> 43 <listitem><para><function>std::adjacent_find</function></para></listitem> 44 <listitem><para><function>std::count</function></para></listitem> 45 <listitem><para><function>std::count_if</function></para></listitem> 46 <listitem><para><function>std::equal</function></para></listitem> 47 <listitem><para><function>std::find</function></para></listitem> 48 <listitem><para><function>std::find_if</function></para></listitem> 49 <listitem><para><function>std::find_first_of</function></para></listitem> 50 <listitem><para><function>std::for_each</function></para></listitem> 51 <listitem><para><function>std::generate</function></para></listitem> 52 <listitem><para><function>std::generate_n</function></para></listitem> 53 <listitem><para><function>std::lexicographical_compare</function></para></listitem> 54 <listitem><para><function>std::mismatch</function></para></listitem> 55 <listitem><para><function>std::search</function></para></listitem> 56 <listitem><para><function>std::search_n</function></para></listitem> 57 <listitem><para><function>std::transform</function></para></listitem> 58 <listitem><para><function>std::replace</function></para></listitem> 59 <listitem><para><function>std::replace_if</function></para></listitem> 60 <listitem><para><function>std::max_element</function></para></listitem> 61 <listitem><para><function>std::merge</function></para></listitem> 62 <listitem><para><function>std::min_element</function></para></listitem> 63 <listitem><para><function>std::nth_element</function></para></listitem> 64 <listitem><para><function>std::partial_sort</function></para></listitem> 65 <listitem><para><function>std::partition</function></para></listitem> 66 <listitem><para><function>std::random_shuffle</function></para></listitem> 67 <listitem><para><function>std::set_union</function></para></listitem> 68 <listitem><para><function>std::set_intersection</function></para></listitem> 69 <listitem><para><function>std::set_symmetric_difference</function></para></listitem> 70 <listitem><para><function>std::set_difference</function></para></listitem> 71 <listitem><para><function>std::sort</function></para></listitem> 72 <listitem><para><function>std::stable_sort</function></para></listitem> 73 <listitem><para><function>std::unique_copy</function></para></listitem> 74</itemizedlist> 75 76</section> 77 78<section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> 79<?dbhtml filename="parallel_mode_semantics.html"?> 80 81 82<para> The parallel mode STL algorithms are currently not exception-safe, 83i.e. user-defined functors must not throw exceptions. 84Also, the order of execution is not guaranteed for some functions, of course. 85Therefore, user-defined functors should not have any concurrent side effects. 86</para> 87 88<para> Since the current GCC OpenMP implementation does not support 89OpenMP parallel regions in concurrent threads, 90it is not possible to call parallel STL algorithm in 91concurrent threads, either. 92It might work with other compilers, though.</para> 93 94</section> 95 96<section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> 97<?dbhtml filename="parallel_mode_using.html"?> 98 99 100<section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> 101 102 103<para> 104 Any use of parallel functionality requires additional compiler 105 and runtime support, in particular support for OpenMP. Adding this support is 106 not difficult: just compile your application with the compiler 107 flag <literal>-fopenmp</literal>. This will link 108 in <code>libgomp</code>, the 109 OpenMP <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU implementation</link>, 110 whose presence is mandatory. 111</para> 112 113<para> 114In addition, hardware that supports atomic operations and a compiler 115 capable of producing atomic operations is mandatory: GCC defaults to no 116 support for atomic operations on some common hardware 117 architectures. Activating atomic operations may require explicit 118 compiler flags on some targets (like sparc and x86), such 119 as <literal>-march=i686</literal>, 120 <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See 121 the GCC manual for more information. 122</para> 123 124</section> 125 126<section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> 127 128 129<para> 130 To use the libstdc++ parallel mode, compile your application with 131 the prerequisite flags as detailed above, and in addition 132 add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all 133 use of the standard (sequential) algorithms to the appropriate parallel 134 equivalents. Please note that this doesn't necessarily mean that 135 everything will end up being executed in a parallel manner, but 136 rather that the heuristics and settings coded into the parallel 137 versions will be used to determine if all, some, or no algorithms 138 will be executed using parallel variants. 139</para> 140 141<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the 142 sizes and behavior of standard class templates such as 143 <function>std::search</function>, and therefore one can only link code 144 compiled with parallel mode and code compiled without parallel mode 145 if no instantiation of a container is passed between the two 146 translation units. Parallel mode functionality has distinct linkage, 147 and cannot be confused with normal mode symbols. 148</para> 149</section> 150 151<section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> 152 153 154<para>When it is not feasible to recompile your entire application, or 155 only specific algorithms need to be parallel-aware, individual 156 parallel algorithms can be made available explicitly. These 157 parallel algorithms are functionally equivalent to the standard 158 drop-in algorithms used in parallel mode, but they are available in 159 a separate namespace as GNU extensions and may be used in programs 160 compiled with either release mode or with parallel mode. 161</para> 162 163 164<para>An example of using a parallel version 165of <function>std::sort</function>, but no other parallel algorithms, is: 166</para> 167 168<programlisting> 169#include <vector> 170#include <parallel/algorithm> 171 172int main() 173{ 174 std::vector<int> v(100); 175 176 // ... 177 178 // Explicitly force a call to parallel sort. 179 __gnu_parallel::sort(v.begin(), v.end()); 180 return 0; 181} 182</programlisting> 183 184<para> 185Then compile this code with the prerequisite compiler flags 186(<literal>-fopenmp</literal> and any necessary architecture-specific 187flags for atomic operations.) 188</para> 189 190<para> The following table provides the names and headers of all the 191 parallel algorithms that can be used in a similar manner: 192</para> 193 194<table frame="all"> 195<title>Parallel Algorithms</title> 196 197<tgroup cols="4" align="left" colsep="1" rowsep="1"> 198<colspec colname="c1"/> 199<colspec colname="c2"/> 200<colspec colname="c3"/> 201<colspec colname="c4"/> 202 203<thead> 204 <row> 205 <entry>Algorithm</entry> 206 <entry>Header</entry> 207 <entry>Parallel algorithm</entry> 208 <entry>Parallel header</entry> 209 </row> 210</thead> 211 212<tbody> 213 <row> 214 <entry><function>std::accumulate</function></entry> 215 <entry><filename class="headerfile">numeric</filename></entry> 216 <entry><function>__gnu_parallel::accumulate</function></entry> 217 <entry><filename class="headerfile">parallel/numeric</filename></entry> 218 </row> 219 <row> 220 <entry><function>std::adjacent_difference</function></entry> 221 <entry><filename class="headerfile">numeric</filename></entry> 222 <entry><function>__gnu_parallel::adjacent_difference</function></entry> 223 <entry><filename class="headerfile">parallel/numeric</filename></entry> 224 </row> 225 <row> 226 <entry><function>std::inner_product</function></entry> 227 <entry><filename class="headerfile">numeric</filename></entry> 228 <entry><function>__gnu_parallel::inner_product</function></entry> 229 <entry><filename class="headerfile">parallel/numeric</filename></entry> 230 </row> 231 <row> 232 <entry><function>std::partial_sum</function></entry> 233 <entry><filename class="headerfile">numeric</filename></entry> 234 <entry><function>__gnu_parallel::partial_sum</function></entry> 235 <entry><filename class="headerfile">parallel/numeric</filename></entry> 236 </row> 237 <row> 238 <entry><function>std::adjacent_find</function></entry> 239 <entry><filename class="headerfile">algorithm</filename></entry> 240 <entry><function>__gnu_parallel::adjacent_find</function></entry> 241 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 242 </row> 243 244 <row> 245 <entry><function>std::count</function></entry> 246 <entry><filename class="headerfile">algorithm</filename></entry> 247 <entry><function>__gnu_parallel::count</function></entry> 248 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 249 </row> 250 251 <row> 252 <entry><function>std::count_if</function></entry> 253 <entry><filename class="headerfile">algorithm</filename></entry> 254 <entry><function>__gnu_parallel::count_if</function></entry> 255 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 256 </row> 257 258 <row> 259 <entry><function>std::equal</function></entry> 260 <entry><filename class="headerfile">algorithm</filename></entry> 261 <entry><function>__gnu_parallel::equal</function></entry> 262 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 263 </row> 264 265 <row> 266 <entry><function>std::find</function></entry> 267 <entry><filename class="headerfile">algorithm</filename></entry> 268 <entry><function>__gnu_parallel::find</function></entry> 269 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 270 </row> 271 272 <row> 273 <entry><function>std::find_if</function></entry> 274 <entry><filename class="headerfile">algorithm</filename></entry> 275 <entry><function>__gnu_parallel::find_if</function></entry> 276 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 277 </row> 278 279 <row> 280 <entry><function>std::find_first_of</function></entry> 281 <entry><filename class="headerfile">algorithm</filename></entry> 282 <entry><function>__gnu_parallel::find_first_of</function></entry> 283 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 284 </row> 285 286 <row> 287 <entry><function>std::for_each</function></entry> 288 <entry><filename class="headerfile">algorithm</filename></entry> 289 <entry><function>__gnu_parallel::for_each</function></entry> 290 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 291 </row> 292 293 <row> 294 <entry><function>std::generate</function></entry> 295 <entry><filename class="headerfile">algorithm</filename></entry> 296 <entry><function>__gnu_parallel::generate</function></entry> 297 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 298 </row> 299 300 <row> 301 <entry><function>std::generate_n</function></entry> 302 <entry><filename class="headerfile">algorithm</filename></entry> 303 <entry><function>__gnu_parallel::generate_n</function></entry> 304 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 305 </row> 306 307 <row> 308 <entry><function>std::lexicographical_compare</function></entry> 309 <entry><filename class="headerfile">algorithm</filename></entry> 310 <entry><function>__gnu_parallel::lexicographical_compare</function></entry> 311 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 312 </row> 313 314 <row> 315 <entry><function>std::mismatch</function></entry> 316 <entry><filename class="headerfile">algorithm</filename></entry> 317 <entry><function>__gnu_parallel::mismatch</function></entry> 318 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 319 </row> 320 321 <row> 322 <entry><function>std::search</function></entry> 323 <entry><filename class="headerfile">algorithm</filename></entry> 324 <entry><function>__gnu_parallel::search</function></entry> 325 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 326 </row> 327 328 <row> 329 <entry><function>std::search_n</function></entry> 330 <entry><filename class="headerfile">algorithm</filename></entry> 331 <entry><function>__gnu_parallel::search_n</function></entry> 332 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 333 </row> 334 335 <row> 336 <entry><function>std::transform</function></entry> 337 <entry><filename class="headerfile">algorithm</filename></entry> 338 <entry><function>__gnu_parallel::transform</function></entry> 339 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 340 </row> 341 342 <row> 343 <entry><function>std::replace</function></entry> 344 <entry><filename class="headerfile">algorithm</filename></entry> 345 <entry><function>__gnu_parallel::replace</function></entry> 346 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 347 </row> 348 349 <row> 350 <entry><function>std::replace_if</function></entry> 351 <entry><filename class="headerfile">algorithm</filename></entry> 352 <entry><function>__gnu_parallel::replace_if</function></entry> 353 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 354 </row> 355 356 <row> 357 <entry><function>std::max_element</function></entry> 358 <entry><filename class="headerfile">algorithm</filename></entry> 359 <entry><function>__gnu_parallel::max_element</function></entry> 360 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 361 </row> 362 363 <row> 364 <entry><function>std::merge</function></entry> 365 <entry><filename class="headerfile">algorithm</filename></entry> 366 <entry><function>__gnu_parallel::merge</function></entry> 367 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 368 </row> 369 370 <row> 371 <entry><function>std::min_element</function></entry> 372 <entry><filename class="headerfile">algorithm</filename></entry> 373 <entry><function>__gnu_parallel::min_element</function></entry> 374 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 375 </row> 376 377 <row> 378 <entry><function>std::nth_element</function></entry> 379 <entry><filename class="headerfile">algorithm</filename></entry> 380 <entry><function>__gnu_parallel::nth_element</function></entry> 381 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 382 </row> 383 384 <row> 385 <entry><function>std::partial_sort</function></entry> 386 <entry><filename class="headerfile">algorithm</filename></entry> 387 <entry><function>__gnu_parallel::partial_sort</function></entry> 388 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 389 </row> 390 391 <row> 392 <entry><function>std::partition</function></entry> 393 <entry><filename class="headerfile">algorithm</filename></entry> 394 <entry><function>__gnu_parallel::partition</function></entry> 395 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 396 </row> 397 398 <row> 399 <entry><function>std::random_shuffle</function></entry> 400 <entry><filename class="headerfile">algorithm</filename></entry> 401 <entry><function>__gnu_parallel::random_shuffle</function></entry> 402 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 403 </row> 404 405 <row> 406 <entry><function>std::set_union</function></entry> 407 <entry><filename class="headerfile">algorithm</filename></entry> 408 <entry><function>__gnu_parallel::set_union</function></entry> 409 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 410 </row> 411 412 <row> 413 <entry><function>std::set_intersection</function></entry> 414 <entry><filename class="headerfile">algorithm</filename></entry> 415 <entry><function>__gnu_parallel::set_intersection</function></entry> 416 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 417 </row> 418 419 <row> 420 <entry><function>std::set_symmetric_difference</function></entry> 421 <entry><filename class="headerfile">algorithm</filename></entry> 422 <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> 423 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 424 </row> 425 426 <row> 427 <entry><function>std::set_difference</function></entry> 428 <entry><filename class="headerfile">algorithm</filename></entry> 429 <entry><function>__gnu_parallel::set_difference</function></entry> 430 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 431 </row> 432 433 <row> 434 <entry><function>std::sort</function></entry> 435 <entry><filename class="headerfile">algorithm</filename></entry> 436 <entry><function>__gnu_parallel::sort</function></entry> 437 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 438 </row> 439 440 <row> 441 <entry><function>std::stable_sort</function></entry> 442 <entry><filename class="headerfile">algorithm</filename></entry> 443 <entry><function>__gnu_parallel::stable_sort</function></entry> 444 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 445 </row> 446 447 <row> 448 <entry><function>std::unique_copy</function></entry> 449 <entry><filename class="headerfile">algorithm</filename></entry> 450 <entry><function>__gnu_parallel::unique_copy</function></entry> 451 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 452 </row> 453</tbody> 454</tgroup> 455</table> 456 457</section> 458 459</section> 460 461<section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> 462<?dbhtml filename="parallel_mode_design.html"?> 463 464 <para> 465 </para> 466<section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> 467 468 469<para> 470All parallel algorithms are intended to have signatures that are 471equivalent to the ISO C++ algorithms replaced. For instance, the 472<function>std::adjacent_find</function> function is declared as: 473</para> 474<programlisting> 475namespace std 476{ 477 template<typename _FIter> 478 _FIter 479 adjacent_find(_FIter, _FIter); 480} 481</programlisting> 482 483<para> 484Which means that there should be something equivalent for the parallel 485version. Indeed, this is the case: 486</para> 487 488<programlisting> 489namespace std 490{ 491 namespace __parallel 492 { 493 template<typename _FIter> 494 _FIter 495 adjacent_find(_FIter, _FIter); 496 497 ... 498 } 499} 500</programlisting> 501 502<para>But.... why the ellipses? 503</para> 504 505<para> The ellipses in the example above represent additional overloads 506required for the parallel version of the function. These additional 507overloads are used to dispatch calls from the ISO C++ function 508signature to the appropriate parallel function (or sequential 509function, if no parallel functions are deemed worthy), based on either 510compile-time or run-time conditions. 511</para> 512 513<para> The available signature options are specific for the different 514algorithms/algorithm classes.</para> 515 516<para> The general view of overloads for the parallel algorithms look like this: 517</para> 518<itemizedlist> 519 <listitem><para>ISO C++ signature</para></listitem> 520 <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> 521 <listitem><para>ISO C++ signature + algorithm-specific tag type 522 (several signatures)</para></listitem> 523</itemizedlist> 524 525<para> Please note that the implementation may use additional functions 526(designated with the <code>_switch</code> suffix) to dispatch from the 527ISO C++ signature to the correct parallel version. Also, some of the 528algorithms do not have support for run-time conditions, so the last 529overload is therefore missing. 530</para> 531 532 533</section> 534 535<section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> 536 537 538 539<section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> 540 541 542<para> 543Several aspects of the overall runtime environment can be manipulated 544by standard OpenMP function calls. 545</para> 546 547<para> 548To specify the number of threads to be used for the algorithms globally, 549use the function <function>omp_set_num_threads</function>. An example: 550</para> 551 552<programlisting> 553#include <stdlib.h> 554#include <omp.h> 555 556int main() 557{ 558 // Explicitly set number of threads. 559 const int threads_wanted = 20; 560 omp_set_dynamic(false); 561 omp_set_num_threads(threads_wanted); 562 563 // Call parallel mode algorithms. 564 565 return 0; 566} 567</programlisting> 568 569<para> 570 Some algorithms allow the number of threads being set for a particular call, 571 by augmenting the algorithm variant. 572 See the next section for further information. 573</para> 574 575<para> 576Other parts of the runtime environment able to be manipulated include 577nested parallelism (<function>omp_set_nested</function>), schedule kind 578(<function>omp_set_schedule</function>), and others. See the OpenMP 579documentation for more information. 580</para> 581 582</section> 583 584<section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> 585 586 587<para> 588To force an algorithm to execute sequentially, even though parallelism 589is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, 590add <classname>__gnu_parallel::sequential_tag()</classname> to the end 591of the algorithm's argument list. 592</para> 593 594<para> 595Like so: 596</para> 597 598<programlisting> 599std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); 600</programlisting> 601 602<para> 603Some parallel algorithm variants can be excluded from compilation by 604preprocessor defines. See the doxygen documentation on 605<code>compiletime_settings.h</code> and <code>features.h</code> for details. 606</para> 607 608<para> 609For some algorithms, the desired variant can be chosen at compile-time by 610appending a tag object. The available options are specific to the particular 611algorithm (class). 612</para> 613 614<para> 615For the "embarrassingly parallel" algorithms, there is only one "tag object 616type", the enum _Parallelism. 617It takes one of the following values, 618<code>__gnu_parallel::parallel_tag</code>, 619<code>__gnu_parallel::balanced_tag</code>, 620<code>__gnu_parallel::unbalanced_tag</code>, 621<code>__gnu_parallel::omp_loop_tag</code>, 622<code>__gnu_parallel::omp_loop_static_tag</code>. 623This means that the actual parallelization strategy is chosen at run-time. 624(Choosing the variants at compile-time will come soon.) 625</para> 626 627<para> 628For the following algorithms in general, we have 629<code>__gnu_parallel::parallel_tag</code> and 630<code>__gnu_parallel::default_parallel_tag</code>, in addition to 631<code>__gnu_parallel::sequential_tag</code>. 632<code>__gnu_parallel::default_parallel_tag</code> chooses the default 633algorithm at compiletime, as does omitting the tag. 634<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime 635(see next section). 636For all tags, the number of threads desired for this call can optionally be 637passed to the respective tag's constructor. 638</para> 639 640<para> 641The <code>multiway_merge</code> algorithm comes with the additional choices, 642<code>__gnu_parallel::exact_tag</code> and 643<code>__gnu_parallel::sampling_tag</code>. 644Exact and sampling are the two available splitting strategies. 645</para> 646 647<para> 648For the <code>sort</code> and <code>stable_sort</code> algorithms, there are 649several additional choices, namely 650<code>__gnu_parallel::multiway_mergesort_tag</code>, 651<code>__gnu_parallel::multiway_mergesort_exact_tag</code>, 652<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, 653<code>__gnu_parallel::quicksort_tag</code>, and 654<code>__gnu_parallel::balanced_quicksort_tag</code>. 655Multiway mergesort comes with the two splitting strategies for multi-way 656merging. The quicksort options cannot be used for <code>stable_sort</code>. 657</para> 658 659</section> 660 661<section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> 662 663 664<para> 665The default parallelization strategy, the choice of specific algorithm 666strategy, the minimum threshold limits for individual parallel 667algorithms, and aspects of the underlying hardware can be specified as 668desired via manipulation 669of <classname>__gnu_parallel::_Settings</classname> member data. 670</para> 671 672<para> 673First off, the choice of parallelization strategy: serial, parallel, 674or heuristically deduced. This corresponds 675to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a 676value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> 677type. Choices 678include: <type>heuristic</type>, <type>force_sequential</type>, 679and <type>force_parallel</type>. The default is <type>heuristic</type>. 680</para> 681 682 683<para> 684Next, the sub-choices for algorithm variant, if not fixed at compile-time. 685Specific algorithms like <function>find</function> or <function>sort</function> 686can be implemented in multiple ways: when this is the case, 687a <classname>__gnu_parallel::_Settings</classname> member exists to 688pick the default strategy. For 689example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can 690have any values of 691enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, 692or <type>QS_BALANCED</type>. 693</para> 694 695<para> 696Likewise for setting the minimal threshold for algorithm 697parallelization. Parallelism always incurs some overhead. Thus, it is 698not helpful to parallelize operations on very small sets of 699data. Because of this, measures are taken to avoid parallelizing below 700a certain, pre-determined threshold. For each algorithm, a minimum 701problem size is encoded as a variable in the 702active <classname>__gnu_parallel::_Settings</classname> object. This 703threshold variable follows the following naming scheme: 704<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, 705for <function>fill</function>, the threshold variable 706is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, 707</para> 708 709<para> 710Finally, hardware details like L1/L2 cache size can be hardwired 711via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. 712</para> 713 714<para> 715</para> 716 717<para> 718All these configuration variables can be changed by the user, if 719desired. 720There exists one global instance of the class <classname>_Settings</classname>, 721i. e. it is a singleton. It can be read and written by calling 722<code>__gnu_parallel::_Settings::get</code> and 723<code>__gnu_parallel::_Settings::set</code>, respectively. 724Please note that the first call return a const object, so direct manipulation 725is forbidden. 726See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html"> 727 <filename class="headerfile">settings.h</filename></link> 728for complete details. 729</para> 730 731<para> 732A small example of tuning the default: 733</para> 734 735<programlisting> 736#include <parallel/algorithm> 737#include <parallel/settings.h> 738 739int main() 740{ 741 __gnu_parallel::_Settings s; 742 s.algorithm_strategy = __gnu_parallel::force_parallel; 743 __gnu_parallel::_Settings::set(s); 744 745 // Do work... all algorithms will be parallelized, always. 746 747 return 0; 748} 749</programlisting> 750 751</section> 752 753</section> 754 755<section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> 756 757 758<para> One namespace contain versions of code that are always 759explicitly sequential: 760<code>__gnu_serial</code>. 761</para> 762 763<para> Two namespaces contain the parallel mode: 764<code>std::__parallel</code> and <code>__gnu_parallel</code>. 765</para> 766 767<para> Parallel implementations of standard components, including 768template helpers to select parallelism, are defined in <code>namespace 769std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in 770<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel 771implementations are injected into <code>namespace 772__gnu_parallel</code> with using declarations. 773</para> 774 775<para> Support and general infrastructure is in <code>namespace 776__gnu_parallel</code>. 777</para> 778 779<para> More information, and an organized index of types and functions 780related to the parallel mode on a per-namespace basis, can be found in 781the generated source documentation. 782</para> 783 784</section> 785 786</section> 787 788<section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> 789<?dbhtml filename="parallel_mode_test.html"?> 790 791 792 <para> 793 Both the normal conformance and regression tests and the 794 supplemental performance tests work. 795 </para> 796 797 <para> 798 To run the conformance and regression tests with the parallel mode 799 active, 800 </para> 801 802 <screen> 803 <userinput>make check-parallel</userinput> 804 </screen> 805 806 <para> 807 The log and summary files for conformance testing are in the 808 <filename class="directory">testsuite/parallel</filename> directory. 809 </para> 810 811 <para> 812 To run the performance tests with the parallel mode active, 813 </para> 814 815 <screen> 816 <userinput>make check-performance-parallel</userinput> 817 </screen> 818 819 <para> 820 The result file for performance testing are in the 821 <filename class="directory">testsuite</filename> directory, in the file 822 <filename>libstdc++_performance.sum</filename>. In addition, the 823 policy-based containers have their own visualizations, which have 824 additional software dependencies than the usual bare-boned text 825 file, and can be generated by using the <code>make 826 doc-performance</code> rule in the testsuite's Makefile. 827</para> 828</section> 829 830<bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> 831 832 833 <biblioentry> 834 <citetitle> 835 Parallelization of Bulk Operations for STL Dictionaries 836 </citetitle> 837 838 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 839 <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> 840 841 <copyright> 842 <year>2007</year> 843 <holder/> 844 </copyright> 845 846 <publisher> 847 <publishername> 848 Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) 849 </publishername> 850 </publisher> 851 </biblioentry> 852 853 <biblioentry> 854 <citetitle> 855 The Multi-Core Standard Template Library 856 </citetitle> 857 858 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 859 <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> 860 <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> 861 862 <copyright> 863 <year>2007</year> 864 <holder/> 865 </copyright> 866 867 <publisher> 868 <publishername> 869 Euro-Par 2007: Parallel Processing. (LNCS 4641) 870 </publishername> 871 </publisher> 872 </biblioentry> 873 874</bibliography> 875 876</chapter> 877