parallel_mode.xml revision 1.1.1.1
1<?xml version='1.0'?> 2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" 3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" 4[ ]> 5 6<chapter id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> 7<?dbhtml filename="parallel_mode.html"?> 8 9<chapterinfo> 10 <keywordset> 11 <keyword> 12 C++ 13 </keyword> 14 <keyword> 15 library 16 </keyword> 17 <keyword> 18 parallel 19 </keyword> 20 </keywordset> 21</chapterinfo> 22 23<title>Parallel Mode</title> 24 25<para> The libstdc++ parallel mode is an experimental parallel 26implementation of many algorithms the C++ Standard Library. 27</para> 28 29<para> 30Several of the standard algorithms, for instance 31<function>std::sort</function>, are made parallel using OpenMP 32annotations. These parallel mode constructs and can be invoked by 33explicit source declaration or by compiling existing sources with a 34specific compiler flag. 35</para> 36 37 38<sect1 id="manual.ext.parallel_mode.intro" xreflabel="Intro"> 39 <title>Intro</title> 40 41<para>The following library components in the include 42<filename class="headerfile">numeric</filename> are included in the parallel mode:</para> 43<itemizedlist> 44 <listitem><para><function>std::accumulate</function></para></listitem> 45 <listitem><para><function>std::adjacent_difference</function></para></listitem> 46 <listitem><para><function>std::inner_product</function></para></listitem> 47 <listitem><para><function>std::partial_sum</function></para></listitem> 48</itemizedlist> 49 50<para>The following library components in the include 51<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> 52<itemizedlist> 53 <listitem><para><function>std::adjacent_find</function></para></listitem> 54 <listitem><para><function>std::count</function></para></listitem> 55 <listitem><para><function>std::count_if</function></para></listitem> 56 <listitem><para><function>std::equal</function></para></listitem> 57 <listitem><para><function>std::find</function></para></listitem> 58 <listitem><para><function>std::find_if</function></para></listitem> 59 <listitem><para><function>std::find_first_of</function></para></listitem> 60 <listitem><para><function>std::for_each</function></para></listitem> 61 <listitem><para><function>std::generate</function></para></listitem> 62 <listitem><para><function>std::generate_n</function></para></listitem> 63 <listitem><para><function>std::lexicographical_compare</function></para></listitem> 64 <listitem><para><function>std::mismatch</function></para></listitem> 65 <listitem><para><function>std::search</function></para></listitem> 66 <listitem><para><function>std::search_n</function></para></listitem> 67 <listitem><para><function>std::transform</function></para></listitem> 68 <listitem><para><function>std::replace</function></para></listitem> 69 <listitem><para><function>std::replace_if</function></para></listitem> 70 <listitem><para><function>std::max_element</function></para></listitem> 71 <listitem><para><function>std::merge</function></para></listitem> 72 <listitem><para><function>std::min_element</function></para></listitem> 73 <listitem><para><function>std::nth_element</function></para></listitem> 74 <listitem><para><function>std::partial_sort</function></para></listitem> 75 <listitem><para><function>std::partition</function></para></listitem> 76 <listitem><para><function>std::random_shuffle</function></para></listitem> 77 <listitem><para><function>std::set_union</function></para></listitem> 78 <listitem><para><function>std::set_intersection</function></para></listitem> 79 <listitem><para><function>std::set_symmetric_difference</function></para></listitem> 80 <listitem><para><function>std::set_difference</function></para></listitem> 81 <listitem><para><function>std::sort</function></para></listitem> 82 <listitem><para><function>std::stable_sort</function></para></listitem> 83 <listitem><para><function>std::unique_copy</function></para></listitem> 84</itemizedlist> 85 86</sect1> 87 88<sect1 id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"> 89 <title>Semantics</title> 90 91<para> The parallel mode STL algorithms are currently not exception-safe, 92i.e. user-defined functors must not throw exceptions. 93Also, the order of execution is not guaranteed for some functions, of course. 94Therefore, user-defined functors should not have any concurrent side effects. 95</para> 96 97<para> Since the current GCC OpenMP implementation does not support 98OpenMP parallel regions in concurrent threads, 99it is not possible to call parallel STL algorithm in 100concurrent threads, either. 101It might work with other compilers, though.</para> 102 103</sect1> 104 105<sect1 id="manual.ext.parallel_mode.using" xreflabel="Using"> 106 <title>Using</title> 107 108<sect2 id="parallel_mode.using.prereq_flags"> 109 <title>Prerequisite Compiler Flags</title> 110 111<para> 112 Any use of parallel functionality requires additional compiler 113 and runtime support, in particular support for OpenMP. Adding this support is 114 not difficult: just compile your application with the compiler 115 flag <literal>-fopenmp</literal>. This will link 116 in <code>libgomp</code>, the GNU 117 OpenMP <ulink url="http://gcc.gnu.org/onlinedocs/libgomp">implementation</ulink>, 118 whose presence is mandatory. 119</para> 120 121<para> 122In addition, hardware that supports atomic operations and a compiler 123 capable of producing atomic operations is mandatory: GCC defaults to no 124 support for atomic operations on some common hardware 125 architectures. Activating atomic operations may require explicit 126 compiler flags on some targets (like sparc and x86), such 127 as <literal>-march=i686</literal>, 128 <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See 129 the GCC manual for more information. 130</para> 131 132</sect2> 133 134<sect2 id="parallel_mode.using.parallel_mode"> 135 <title>Using Parallel Mode</title> 136 137<para> 138 To use the libstdc++ parallel mode, compile your application with 139 the prerequisite flags as detailed above, and in addition 140 add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all 141 use of the standard (sequential) algorithms to the appropriate parallel 142 equivalents. Please note that this doesn't necessarily mean that 143 everything will end up being executed in a parallel manner, but 144 rather that the heuristics and settings coded into the parallel 145 versions will be used to determine if all, some, or no algorithms 146 will be executed using parallel variants. 147</para> 148 149<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the 150 sizes and behavior of standard class templates such as 151 <function>std::search</function>, and therefore one can only link code 152 compiled with parallel mode and code compiled without parallel mode 153 if no instantiation of a container is passed between the two 154 translation units. Parallel mode functionality has distinct linkage, 155 and cannot be confused with normal mode symbols. 156</para> 157</sect2> 158 159<sect2 id="parallel_mode.using.specific"> 160 <title>Using Specific Parallel Components</title> 161 162<para>When it is not feasible to recompile your entire application, or 163 only specific algorithms need to be parallel-aware, individual 164 parallel algorithms can be made available explicitly. These 165 parallel algorithms are functionally equivalent to the standard 166 drop-in algorithms used in parallel mode, but they are available in 167 a separate namespace as GNU extensions and may be used in programs 168 compiled with either release mode or with parallel mode. 169</para> 170 171 172<para>An example of using a parallel version 173of <function>std::sort</function>, but no other parallel algorithms, is: 174</para> 175 176<programlisting> 177#include <vector> 178#include <parallel/algorithm> 179 180int main() 181{ 182 std::vector<int> v(100); 183 184 // ... 185 186 // Explicitly force a call to parallel sort. 187 __gnu_parallel::sort(v.begin(), v.end()); 188 return 0; 189} 190</programlisting> 191 192<para> 193Then compile this code with the prerequisite compiler flags 194(<literal>-fopenmp</literal> and any necessary architecture-specific 195flags for atomic operations.) 196</para> 197 198<para> The following table provides the names and headers of all the 199 parallel algorithms that can be used in a similar manner: 200</para> 201 202<table frame='all'> 203<title>Parallel Algorithms</title> 204<tgroup cols='4' align='left' colsep='1' rowsep='1'> 205<colspec colname='c1'></colspec> 206<colspec colname='c2'></colspec> 207<colspec colname='c3'></colspec> 208<colspec colname='c4'></colspec> 209 210<thead> 211 <row> 212 <entry>Algorithm</entry> 213 <entry>Header</entry> 214 <entry>Parallel algorithm</entry> 215 <entry>Parallel header</entry> 216 </row> 217</thead> 218 219<tbody> 220 <row> 221 <entry><function>std::accumulate</function></entry> 222 <entry><filename class="headerfile">numeric</filename></entry> 223 <entry><function>__gnu_parallel::accumulate</function></entry> 224 <entry><filename class="headerfile">parallel/numeric</filename></entry> 225 </row> 226 <row> 227 <entry><function>std::adjacent_difference</function></entry> 228 <entry><filename class="headerfile">numeric</filename></entry> 229 <entry><function>__gnu_parallel::adjacent_difference</function></entry> 230 <entry><filename class="headerfile">parallel/numeric</filename></entry> 231 </row> 232 <row> 233 <entry><function>std::inner_product</function></entry> 234 <entry><filename class="headerfile">numeric</filename></entry> 235 <entry><function>__gnu_parallel::inner_product</function></entry> 236 <entry><filename class="headerfile">parallel/numeric</filename></entry> 237 </row> 238 <row> 239 <entry><function>std::partial_sum</function></entry> 240 <entry><filename class="headerfile">numeric</filename></entry> 241 <entry><function>__gnu_parallel::partial_sum</function></entry> 242 <entry><filename class="headerfile">parallel/numeric</filename></entry> 243 </row> 244 <row> 245 <entry><function>std::adjacent_find</function></entry> 246 <entry><filename class="headerfile">algorithm</filename></entry> 247 <entry><function>__gnu_parallel::adjacent_find</function></entry> 248 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 249 </row> 250 251 <row> 252 <entry><function>std::count</function></entry> 253 <entry><filename class="headerfile">algorithm</filename></entry> 254 <entry><function>__gnu_parallel::count</function></entry> 255 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 256 </row> 257 258 <row> 259 <entry><function>std::count_if</function></entry> 260 <entry><filename class="headerfile">algorithm</filename></entry> 261 <entry><function>__gnu_parallel::count_if</function></entry> 262 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 263 </row> 264 265 <row> 266 <entry><function>std::equal</function></entry> 267 <entry><filename class="headerfile">algorithm</filename></entry> 268 <entry><function>__gnu_parallel::equal</function></entry> 269 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 270 </row> 271 272 <row> 273 <entry><function>std::find</function></entry> 274 <entry><filename class="headerfile">algorithm</filename></entry> 275 <entry><function>__gnu_parallel::find</function></entry> 276 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 277 </row> 278 279 <row> 280 <entry><function>std::find_if</function></entry> 281 <entry><filename class="headerfile">algorithm</filename></entry> 282 <entry><function>__gnu_parallel::find_if</function></entry> 283 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 284 </row> 285 286 <row> 287 <entry><function>std::find_first_of</function></entry> 288 <entry><filename class="headerfile">algorithm</filename></entry> 289 <entry><function>__gnu_parallel::find_first_of</function></entry> 290 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 291 </row> 292 293 <row> 294 <entry><function>std::for_each</function></entry> 295 <entry><filename class="headerfile">algorithm</filename></entry> 296 <entry><function>__gnu_parallel::for_each</function></entry> 297 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 298 </row> 299 300 <row> 301 <entry><function>std::generate</function></entry> 302 <entry><filename class="headerfile">algorithm</filename></entry> 303 <entry><function>__gnu_parallel::generate</function></entry> 304 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 305 </row> 306 307 <row> 308 <entry><function>std::generate_n</function></entry> 309 <entry><filename class="headerfile">algorithm</filename></entry> 310 <entry><function>__gnu_parallel::generate_n</function></entry> 311 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 312 </row> 313 314 <row> 315 <entry><function>std::lexicographical_compare</function></entry> 316 <entry><filename class="headerfile">algorithm</filename></entry> 317 <entry><function>__gnu_parallel::lexicographical_compare</function></entry> 318 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 319 </row> 320 321 <row> 322 <entry><function>std::mismatch</function></entry> 323 <entry><filename class="headerfile">algorithm</filename></entry> 324 <entry><function>__gnu_parallel::mismatch</function></entry> 325 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 326 </row> 327 328 <row> 329 <entry><function>std::search</function></entry> 330 <entry><filename class="headerfile">algorithm</filename></entry> 331 <entry><function>__gnu_parallel::search</function></entry> 332 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 333 </row> 334 335 <row> 336 <entry><function>std::search_n</function></entry> 337 <entry><filename class="headerfile">algorithm</filename></entry> 338 <entry><function>__gnu_parallel::search_n</function></entry> 339 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 340 </row> 341 342 <row> 343 <entry><function>std::transform</function></entry> 344 <entry><filename class="headerfile">algorithm</filename></entry> 345 <entry><function>__gnu_parallel::transform</function></entry> 346 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 347 </row> 348 349 <row> 350 <entry><function>std::replace</function></entry> 351 <entry><filename class="headerfile">algorithm</filename></entry> 352 <entry><function>__gnu_parallel::replace</function></entry> 353 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 354 </row> 355 356 <row> 357 <entry><function>std::replace_if</function></entry> 358 <entry><filename class="headerfile">algorithm</filename></entry> 359 <entry><function>__gnu_parallel::replace_if</function></entry> 360 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 361 </row> 362 363 <row> 364 <entry><function>std::max_element</function></entry> 365 <entry><filename class="headerfile">algorithm</filename></entry> 366 <entry><function>__gnu_parallel::max_element</function></entry> 367 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 368 </row> 369 370 <row> 371 <entry><function>std::merge</function></entry> 372 <entry><filename class="headerfile">algorithm</filename></entry> 373 <entry><function>__gnu_parallel::merge</function></entry> 374 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 375 </row> 376 377 <row> 378 <entry><function>std::min_element</function></entry> 379 <entry><filename class="headerfile">algorithm</filename></entry> 380 <entry><function>__gnu_parallel::min_element</function></entry> 381 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 382 </row> 383 384 <row> 385 <entry><function>std::nth_element</function></entry> 386 <entry><filename class="headerfile">algorithm</filename></entry> 387 <entry><function>__gnu_parallel::nth_element</function></entry> 388 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 389 </row> 390 391 <row> 392 <entry><function>std::partial_sort</function></entry> 393 <entry><filename class="headerfile">algorithm</filename></entry> 394 <entry><function>__gnu_parallel::partial_sort</function></entry> 395 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 396 </row> 397 398 <row> 399 <entry><function>std::partition</function></entry> 400 <entry><filename class="headerfile">algorithm</filename></entry> 401 <entry><function>__gnu_parallel::partition</function></entry> 402 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 403 </row> 404 405 <row> 406 <entry><function>std::random_shuffle</function></entry> 407 <entry><filename class="headerfile">algorithm</filename></entry> 408 <entry><function>__gnu_parallel::random_shuffle</function></entry> 409 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 410 </row> 411 412 <row> 413 <entry><function>std::set_union</function></entry> 414 <entry><filename class="headerfile">algorithm</filename></entry> 415 <entry><function>__gnu_parallel::set_union</function></entry> 416 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 417 </row> 418 419 <row> 420 <entry><function>std::set_intersection</function></entry> 421 <entry><filename class="headerfile">algorithm</filename></entry> 422 <entry><function>__gnu_parallel::set_intersection</function></entry> 423 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 424 </row> 425 426 <row> 427 <entry><function>std::set_symmetric_difference</function></entry> 428 <entry><filename class="headerfile">algorithm</filename></entry> 429 <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> 430 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 431 </row> 432 433 <row> 434 <entry><function>std::set_difference</function></entry> 435 <entry><filename class="headerfile">algorithm</filename></entry> 436 <entry><function>__gnu_parallel::set_difference</function></entry> 437 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 438 </row> 439 440 <row> 441 <entry><function>std::sort</function></entry> 442 <entry><filename class="headerfile">algorithm</filename></entry> 443 <entry><function>__gnu_parallel::sort</function></entry> 444 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 445 </row> 446 447 <row> 448 <entry><function>std::stable_sort</function></entry> 449 <entry><filename class="headerfile">algorithm</filename></entry> 450 <entry><function>__gnu_parallel::stable_sort</function></entry> 451 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 452 </row> 453 454 <row> 455 <entry><function>std::unique_copy</function></entry> 456 <entry><filename class="headerfile">algorithm</filename></entry> 457 <entry><function>__gnu_parallel::unique_copy</function></entry> 458 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 459 </row> 460</tbody> 461</tgroup> 462</table> 463 464</sect2> 465 466</sect1> 467 468<sect1 id="manual.ext.parallel_mode.design" xreflabel="Design"> 469 <title>Design</title> 470 <para> 471 </para> 472<sect2 id="parallel_mode.design.intro" xreflabel="Intro"> 473 <title>Interface Basics</title> 474 475<para> 476All parallel algorithms are intended to have signatures that are 477equivalent to the ISO C++ algorithms replaced. For instance, the 478<function>std::adjacent_find</function> function is declared as: 479</para> 480<programlisting> 481namespace std 482{ 483 template<typename _FIter> 484 _FIter 485 adjacent_find(_FIter, _FIter); 486} 487</programlisting> 488 489<para> 490Which means that there should be something equivalent for the parallel 491version. Indeed, this is the case: 492</para> 493 494<programlisting> 495namespace std 496{ 497 namespace __parallel 498 { 499 template<typename _FIter> 500 _FIter 501 adjacent_find(_FIter, _FIter); 502 503 ... 504 } 505} 506</programlisting> 507 508<para>But.... why the ellipses? 509</para> 510 511<para> The ellipses in the example above represent additional overloads 512required for the parallel version of the function. These additional 513overloads are used to dispatch calls from the ISO C++ function 514signature to the appropriate parallel function (or sequential 515function, if no parallel functions are deemed worthy), based on either 516compile-time or run-time conditions. 517</para> 518 519<para> The available signature options are specific for the different 520algorithms/algorithm classes.</para> 521 522<para> The general view of overloads for the parallel algorithms look like this: 523</para> 524<itemizedlist> 525 <listitem><para>ISO C++ signature</para></listitem> 526 <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> 527 <listitem><para>ISO C++ signature + algorithm-specific tag type 528 (several signatures)</para></listitem> 529</itemizedlist> 530 531<para> Please note that the implementation may use additional functions 532(designated with the <code>_switch</code> suffix) to dispatch from the 533ISO C++ signature to the correct parallel version. Also, some of the 534algorithms do not have support for run-time conditions, so the last 535overload is therefore missing. 536</para> 537 538 539</sect2> 540 541<sect2 id="parallel_mode.design.tuning" xreflabel="Tuning"> 542 <title>Configuration and Tuning</title> 543 544 545<sect3 id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"> 546 <title>Setting up the OpenMP Environment</title> 547 548<para> 549Several aspects of the overall runtime environment can be manipulated 550by standard OpenMP function calls. 551</para> 552 553<para> 554To specify the number of threads to be used for the algorithms globally, 555use the function <function>omp_set_num_threads</function>. An example: 556</para> 557 558<programlisting> 559#include <stdlib.h> 560#include <omp.h> 561 562int main() 563{ 564 // Explicitly set number of threads. 565 const int threads_wanted = 20; 566 omp_set_dynamic(false); 567 omp_set_num_threads(threads_wanted); 568 569 // Call parallel mode algorithms. 570 571 return 0; 572} 573</programlisting> 574 575<para> 576 Some algorithms allow the number of threads being set for a particular call, 577 by augmenting the algorithm variant. 578 See the next section for further information. 579</para> 580 581<para> 582Other parts of the runtime environment able to be manipulated include 583nested parallelism (<function>omp_set_nested</function>), schedule kind 584(<function>omp_set_schedule</function>), and others. See the OpenMP 585documentation for more information. 586</para> 587 588</sect3> 589 590<sect3 id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"> 591 <title>Compile Time Switches</title> 592 593<para> 594To force an algorithm to execute sequentially, even though parallelism 595is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, 596add <classname>__gnu_parallel::sequential_tag()</classname> to the end 597of the algorithm's argument list. 598</para> 599 600<para> 601Like so: 602</para> 603 604<programlisting> 605std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); 606</programlisting> 607 608<para> 609Some parallel algorithm variants can be excluded from compilation by 610preprocessor defines. See the doxygen documentation on 611<code>compiletime_settings.h</code> and <code>features.h</code> for details. 612</para> 613 614<para> 615For some algorithms, the desired variant can be chosen at compile-time by 616appending a tag object. The available options are specific to the particular 617algorithm (class). 618</para> 619 620<para> 621For the "embarrassingly parallel" algorithms, there is only one "tag object 622type", the enum _Parallelism. 623It takes one of the following values, 624<code>__gnu_parallel::parallel_tag</code>, 625<code>__gnu_parallel::balanced_tag</code>, 626<code>__gnu_parallel::unbalanced_tag</code>, 627<code>__gnu_parallel::omp_loop_tag</code>, 628<code>__gnu_parallel::omp_loop_static_tag</code>. 629This means that the actual parallelization strategy is chosen at run-time. 630(Choosing the variants at compile-time will come soon.) 631</para> 632 633<para> 634For the following algorithms in general, we have 635<code>__gnu_parallel::parallel_tag</code> and 636<code>__gnu_parallel::default_parallel_tag</code>, in addition to 637<code>__gnu_parallel::sequential_tag</code>. 638<code>__gnu_parallel::default_parallel_tag</code> chooses the default 639algorithm at compiletime, as does omitting the tag. 640<code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime 641(see next section). 642For all tags, the number of threads desired for this call can optionally be 643passed to the respective tag's constructor. 644</para> 645 646<para> 647The <code>multiway_merge</code> algorithm comes with the additional choices, 648<code>__gnu_parallel::exact_tag</code> and 649<code>__gnu_parallel::sampling_tag</code>. 650Exact and sampling are the two available splitting strategies. 651</para> 652 653<para> 654For the <code>sort</code> and <code>stable_sort</code> algorithms, there are 655several additional choices, namely 656<code>__gnu_parallel::multiway_mergesort_tag</code>, 657<code>__gnu_parallel::multiway_mergesort_exact_tag</code>, 658<code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, 659<code>__gnu_parallel::quicksort_tag</code>, and 660<code>__gnu_parallel::balanced_quicksort_tag</code>. 661Multiway mergesort comes with the two splitting strategies for multi-way 662merging. The quicksort options cannot be used for <code>stable_sort</code>. 663</para> 664 665</sect3> 666 667<sect3 id="parallel_mode.design.tuning.settings" xreflabel="_Settings"> 668 <title>Run Time Settings and Defaults</title> 669 670<para> 671The default parallelization strategy, the choice of specific algorithm 672strategy, the minimum threshold limits for individual parallel 673algorithms, and aspects of the underlying hardware can be specified as 674desired via manipulation 675of <classname>__gnu_parallel::_Settings</classname> member data. 676</para> 677 678<para> 679First off, the choice of parallelization strategy: serial, parallel, 680or heuristically deduced. This corresponds 681to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a 682value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> 683type. Choices 684include: <type>heuristic</type>, <type>force_sequential</type>, 685and <type>force_parallel</type>. The default is <type>heuristic</type>. 686</para> 687 688 689<para> 690Next, the sub-choices for algorithm variant, if not fixed at compile-time. 691Specific algorithms like <function>find</function> or <function>sort</function> 692can be implemented in multiple ways: when this is the case, 693a <classname>__gnu_parallel::_Settings</classname> member exists to 694pick the default strategy. For 695example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can 696have any values of 697enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, 698or <type>QS_BALANCED</type>. 699</para> 700 701<para> 702Likewise for setting the minimal threshold for algorithm 703parallelization. Parallelism always incurs some overhead. Thus, it is 704not helpful to parallelize operations on very small sets of 705data. Because of this, measures are taken to avoid parallelizing below 706a certain, pre-determined threshold. For each algorithm, a minimum 707problem size is encoded as a variable in the 708active <classname>__gnu_parallel::_Settings</classname> object. This 709threshold variable follows the following naming scheme: 710<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, 711for <function>fill</function>, the threshold variable 712is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, 713</para> 714 715<para> 716Finally, hardware details like L1/L2 cache size can be hardwired 717via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. 718</para> 719 720<para> 721</para> 722 723<para> 724All these configuration variables can be changed by the user, if 725desired. 726There exists one global instance of the class <classname>_Settings</classname>, 727i. e. it is a singleton. It can be read and written by calling 728<code>__gnu_parallel::_Settings::get</code> and 729<code>__gnu_parallel::_Settings::set</code>, respectively. 730Please note that the first call return a const object, so direct manipulation 731is forbidden. 732See <ulink url="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01005.html"> 733 <filename class="headerfile">settings.h</filename></ulink> 734for complete details. 735</para> 736 737<para> 738A small example of tuning the default: 739</para> 740 741<programlisting> 742#include <parallel/algorithm> 743#include <parallel/settings.h> 744 745int main() 746{ 747 __gnu_parallel::_Settings s; 748 s.algorithm_strategy = __gnu_parallel::force_parallel; 749 __gnu_parallel::_Settings::set(s); 750 751 // Do work... all algorithms will be parallelized, always. 752 753 return 0; 754} 755</programlisting> 756 757</sect3> 758 759</sect2> 760 761<sect2 id="parallel_mode.design.impl" xreflabel="Impl"> 762 <title>Implementation Namespaces</title> 763 764<para> One namespace contain versions of code that are always 765explicitly sequential: 766<code>__gnu_serial</code>. 767</para> 768 769<para> Two namespaces contain the parallel mode: 770<code>std::__parallel</code> and <code>__gnu_parallel</code>. 771</para> 772 773<para> Parallel implementations of standard components, including 774template helpers to select parallelism, are defined in <code>namespace 775std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in 776<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel 777implementations are injected into <code>namespace 778__gnu_parallel</code> with using declarations. 779</para> 780 781<para> Support and general infrastructure is in <code>namespace 782__gnu_parallel</code>. 783</para> 784 785<para> More information, and an organized index of types and functions 786related to the parallel mode on a per-namespace basis, can be found in 787the generated source documentation. 788</para> 789 790</sect2> 791 792</sect1> 793 794<sect1 id="manual.ext.parallel_mode.test" xreflabel="Testing"> 795 <title>Testing</title> 796 797 <para> 798 Both the normal conformance and regression tests and the 799 supplemental performance tests work. 800 </para> 801 802 <para> 803 To run the conformance and regression tests with the parallel mode 804 active, 805 </para> 806 807 <screen> 808 <userinput>make check-parallel</userinput> 809 </screen> 810 811 <para> 812 The log and summary files for conformance testing are in the 813 <filename class="directory">testsuite/parallel</filename> directory. 814 </para> 815 816 <para> 817 To run the performance tests with the parallel mode active, 818 </para> 819 820 <screen> 821 <userinput>make check-performance-parallel</userinput> 822 </screen> 823 824 <para> 825 The result file for performance testing are in the 826 <filename class="directory">testsuite</filename> directory, in the file 827 <filename>libstdc++_performance.sum</filename>. In addition, the 828 policy-based containers have their own visualizations, which have 829 additional software dependencies than the usual bare-boned text 830 file, and can be generated by using the <code>make 831 doc-performance</code> rule in the testsuite's Makefile. 832</para> 833</sect1> 834 835<bibliography id="parallel_mode.biblio"> 836<title>Bibliography</title> 837 838 <biblioentry> 839 <title> 840 Parallelization of Bulk Operations for STL Dictionaries 841 </title> 842 843 <author> 844 <firstname>Johannes</firstname> 845 <surname>Singler</surname> 846 </author> 847 <author> 848 <firstname>Leonor</firstname> 849 <surname>Frias</surname> 850 </author> 851 852 <copyright> 853 <year>2007</year> 854 <holder></holder> 855 </copyright> 856 857 <publisher> 858 <publishername> 859 Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) 860 </publishername> 861 </publisher> 862 </biblioentry> 863 864 <biblioentry> 865 <title> 866 The Multi-Core Standard Template Library 867 </title> 868 869 <author> 870 <firstname>Johannes</firstname> 871 <surname>Singler</surname> 872 </author> 873 <author> 874 <firstname>Peter</firstname> 875 <surname>Sanders</surname> 876 </author> 877 <author> 878 <firstname>Felix</firstname> 879 <surname>Putze</surname> 880 </author> 881 882 <copyright> 883 <year>2007</year> 884 <holder></holder> 885 </copyright> 886 887 <publisher> 888 <publishername> 889 Euro-Par 2007: Parallel Processing. (LNCS 4641) 890 </publishername> 891 </publisher> 892 </biblioentry> 893 894</bibliography> 895 896</chapter> 897