Get Started with Parallel STL

ID 标签 751646
已更新 11/9/2018
版本 Latest
公共

author-image

作者

Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17. The implementation also supports the unsequenced execution policy specified in Parallelism TS version 2 and proposed for the next version of the C++ standard in the C++ working group paper P1001R1.

Parallel STL offers efficient support for both parallel and vectorized execution of algorithms for Intel® processors. For sequential execution, it relies on an available implementation of the C++ standard library.

Parallel STL is available as a part of Intel® Parallel Studio XE and Intel® System Studio.

Prerequisites

To use Parallel STL, you must have the following software installed:

  • C++ compiler with:
    • Support for C++11
    • Support for OpenMP* 4.0 SIMD constructs
  • Intel® Threading Building Blocks (Intel® TBB) 2019

The latest version of the Intel® C++ Compiler is recommended for better performance of Parallel STL algorithms, comparing to previous compiler versions.

To build an application that uses Parallel STL on the command line, you need to set the environment variables for compilation and linkage. You can do this by calling suite-level environment scripts such as compilervars.{sh|csh|bat}, or you can set just the Parallel STL environment variables by running pstlvars.{sh|csh|bat} in <install_dir>/{linux|mac|windows}/pstl/bin.

<install_dir> is the installation directory, by default, it is:

For Linux* and macOS*:

  • For super-users:      /opt/intel/compilers_and_libraries_<version>
  • For ordinary users:  $HOME/intel/compilers_and_libraries_<version>

For Windows*:

  • <Program Files>\IntelSWTools\compilers_and_libraries_<version>

Using Parallel STL

Follow these steps to add Parallel STL to your application:

  1. Add the <install_dir>/include folder to the compiler include paths. You can do this by calling the pstlvars script.

  2. Add #include "pstl/execution" to your code. Then add a subset of the following set of lines, depending on the algorithms you intend to use:

    • #include "pstl/algorithm"
    • #include "pstl/numeric"
    • #include "pstl/memory"
  3. When using algorithms and execution policies, specify the namespaces std::execution in case of there is no vendor implementation of C++17 standard library or pstl::execution otherwise. See the 'Examples' section below.
  4. For any of the implemented algorithms, pass one of the values seq, unseq, par or par_unseq as the first parameter in a call to the algorithm to specify the desired execution policy. The policies have the following meaning:

     

    Execution policy

    Meaning

    seq Sequential execution.
    unseq Unsequenced SIMD execution. This policy requires that all functions provided are SIMD-safe.
    par Parallel execution by multiple threads.
    par_unseq Combined effect of unseq and par.

     

  5. Compile the code as C++11 (or later) and using compiler options for vectorization:

    • For the Intel® C++ Compiler:
      • For Linux* and macOS*: -qopenmp-simd
      • For Windows*: /Qopenmp-simd
    • For other compilers, find a switch that enables OpenMP* 4.0 SIMD constructs.

    To get good performance, specify the target platform. For the Intel C++ Compiler, some of the relevant options are:

    • For Linux* and macOS*: -xHOST, -xSSE4.1, -xCORE-AVX2, -xMIC-AVX512.
    • For Windows*: /QxHOST, /QxSSE4.1, /QxCORE-AVX2, /QxMIC-AVX512.
    If using a different compiler, see its documentation.

     

  6. Link with the Intel TBB dynamic library for parallelism. For the Intel C++ Compiler, use the options:

    • For Linux* and macOS*: -tbb
    • For Windows*: /Qtbb (optional, this should be handled by #pragma comment(lib, <libname>))

Version Macros

Macros related to versioning, as described below. You should not redefine these macros.

PSTL_VERSION

Current Parallel STL version. The value is a decimal numeral of the form xyy where x is the major version number and yy is the minor version number.

PSTL_VERSION_MAJOR

PSTL_VERSION/100; that is, the major version number.

PSTL_VERSION_MINOR

PSTL_VERSION - PSTL_VERSION_MAJOR * 100; that is, the minor version number.

Macros

PSTL_USE_PARALLEL_POLICIES

 

This macro controls the use of parallel policies.

When set to 0, it disables the par and par_unseq policies, making their use a compilation error. It's recommended for code that only uses vectorization with unseq policy, to avoid dependency on the TBB runtime library.

When the macro is not defined (default) or evaluates to a non-zero value all execution policies are enabled.

 

PSTL_USE_NONTEMPORAL_STORES

 

This macro enables the use of #pragma vector nontemporal in the algorithms std::copy, std::copy_n, std::fill, std::fill_n, std::generate, std::generate_n, std::move, std::rotate, std::rotate_copy, std::swap_ranges with the unseq policy. For further details about the pragma, see the User and Reference Guide for the Intel® C++ Compiler at https://software.intel.com/en-us/node/524559.

If the macro evaluates to a non-zero value, the use of #pragma vector nontemporal is enabled.

When the macro is not defined (default) or set to 0, the macro does nothing.

 

PSTL_USAGE_WARNINGS

 

This macro enables Parallel STL to emit compile-time messages, such as warnings about an algorithm not supporting a certain execution policy.

When set to 1, the macro allows the implementation to emit usage warnings. When the macro is not defined (default) or evaluates to zero, usage warnings are disabled.

 

Examples

Example 1

The following code calls vectorized copy:

#include "pstl/execution"
#include "pstl/algorithm"
void foo(float* a, float* b, int n) {
    std::copy(pstl::execution::unseq, a, a+n, b);
}

Example 2

This example calls the parallelized version of fill_n:

#include <vector>
#include "pstl/execution"
#include "pstl/algorithm"

int main()
{
    std::vector<int> data(10000000);
    std::fill_n(pstl::execution::par_unseq, data.begin(), data.size(), -1);  // Fill the vector with -1

    return 0;
}

Algorithms

The following table specifies whether parallel and unsequenced execution are supported for each of the C++17 algorithms accepting execution policies. Using an unsupported combination of algorithm and execution policy will result in sequential execution.

 

Algorithm

Algorithm page at cppreference.com

Implementation

adjacent_difference http://en.cppreference.com/w/cpp/algorithm/adjacent_difference parallel, unsequenced
adjacent_find http://en.cppreference.com/w/cpp/algorithm/adjacent_find parallel, unsequenced
all_of http://en.cppreference.com/w/cpp/algorithm/all_any_none_of parallel, unsequenced
any_of http://en.cppreference.com/w/cpp/algorithm/all_any_none_of parallel, unsequenced
copy http://en.cppreference.com/w/cpp/algorithm/copy parallel, unsequenced
copy_if http://en.cppreference.com/w/cpp/algorithm/copy parallel, unsequenced
copy_n http://en.cppreference.com/w/cpp/algorithm/copy_n parallel, unsequenced
count http://en.cppreference.com/w/cpp/algorithm/count parallel, unsequenced
count_if http://en.cppreference.com/w/cpp/algorithm/count parallel, unsequenced
destroy http://en.cppreference.com/w/cpp/memory/destroy parallel, unsequenced
destroy_n http://en.cppreference.com/w/cpp/memory/destroy_n parallel, unsequenced
equal http://en.cppreference.com/w/cpp/algorithm/equal parallel, unsequenced
exclusive_scan http://en.cppreference.com/w/cpp/algorithm/exclusive_scan parallel, unsequenced
fill http://en.cppreference.com/w/cpp/algorithm/fill parallel, unsequenced
fill_n http://en.cppreference.com/w/cpp/algorithm/fill_n parallel, unsequenced
find http://en.cppreference.com/w/cpp/algorithm/find parallel, unsequenced
find_end http://en.cppreference.com/w/cpp/algorithm/find_end parallel, unsequenced
find_first_of http://en.cppreference.com/w/cpp/algorithm/find_first_of parallel, unsequenced
find_if http://en.cppreference.com/w/cpp/algorithm/find parallel, unsequenced
find_if_not http://en.cppreference.com/w/cpp/algorithm/find parallel, unsequenced
for_each http://en.cppreference.com/w/cpp/algorithm/for_each parallel, unsequenced
for_each_n http://en.cppreference.com/w/cpp/algorithm/for_each_n parallel, unsequenced
generate http://en.cppreference.com/w/cpp/algorithm/generate parallel, unsequenced
generate_n http://en.cppreference.com/w/cpp/algorithm/generate_n parallel, unsequenced
includes http://en.cppreference.com/w/cpp/algorithm/includes parallel
inclusive_scan http://en.cppreference.com/w/cpp/algorithm/inclusive_scan parallel, unsequenced
inplace_merge http://en.cppreference.com/w/cpp/algorithm/inplace_merge parallel
is_heap http://en.cppreference.com/w/cpp/algorithm/is_heap parallel, unsequenced
is_heap_until http://en.cppreference.com/w/cpp/algorithm/is_heap_until parallel, unsequenced
is_partitioned http://en.cppreference.com/w/cpp/algorithm/is_partitioned parallel, unsequenced
is_sorted http://en.cppreference.com/w/cpp/algorithm/is_sorted parallel, unsequenced
is_sorted_until http://en.cppreference.com/w/cpp/algorithm/is_sorted_until parallel, unsequenced
lexicographical_compare http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare parallel, unsequenced
max_element http://en.cppreference.com/w/cpp/algorithm/max_element parallel, unsequenced
merge http://en.cppreference.com/w/cpp/algorithm/merge parallel
min_element http://en.cppreference.com/w/cpp/algorithm/min_element parallel, unsequenced
minmax_element http://en.cppreference.com/w/cpp/algorithm/minmax_element parallel, unsequenced
mismatch http://en.cppreference.com/w/cpp/algorithm/mismatch parallel, unsequenced
move http://en.cppreference.com/w/cpp/algorithm/move parallel, unsequenced
none_of http://en.cppreference.com/w/cpp/algorithm/all_any_none_of parallel, unsequenced
nth_element http://en.cppreference.com/w/cpp/algorithm/nth_element parallel
partial_sort http://en.cppreference.com/w/cpp/algorithm/partial_sort parallel
partial_sort_copy http://en.cppreference.com/w/cpp/algorithm/partial_sort_copy parallel
partition http://en.cppreference.com/w/cpp/algorithm/partition parallel
partition_copy http://en.cppreference.com/w/cpp/algorithm/partition_copy parallel, unsequenced
reduce http://en.cppreference.com/w/cpp/algorithm/reduce parallel, unsequenced
remove http://en.cppreference.com/w/cpp/algorithm/remove parallel, unsequenced
remove_copy http://en.cppreference.com/w/cpp/algorithm/remove_copy parallel, unsequenced
remove_copy_if http://en.cppreference.com/w/cpp/algorithm/remove_copy parallel, unsequenced
remove_if http://en.cppreference.com/w/cpp/algorithm/remove parallel, unsequenced
replace http://en.cppreference.com/w/cpp/algorithm/replace parallel, unsequenced
replace_copy http://en.cppreference.com/w/cpp/algorithm/replace_copy parallel, unsequenced
replace_copy_if http://en.cppreference.com/w/cpp/algorithm/replace_copy parallel, unsequenced
replace_if http://en.cppreference.com/w/cpp/algorithm/replace parallel, unsequenced
reverse http://en.cppreference.com/w/cpp/algorithm/reverse parallel, unsequenced
reverse_copy http://en.cppreference.com/w/cpp/algorithm/reverse_copy parallel, unsequenced
rotate http://en.cppreference.com/w/cpp/algorithm/rotate parallel, unsequenced
rotate_copy http://en.cppreference.com/w/cpp/algorithm/rotate_copy parallel, unsequenced
search http://en.cppreference.com/w/cpp/algorithm/search parallel, unsequenced
search_n http://en.cppreference.com/w/cpp/algorithm/search_n parallel, unsequenced
set_difference http://en.cppreference.com/w/cpp/algorithm/set_difference  
set_intersection http://en.cppreference.com/w/cpp/algorithm/set_intersection  
set_symmetric_difference http://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference  
set_union http://en.cppreference.com/w/cpp/algorithm/set_union  
sort http://en.cppreference.com/w/cpp/algorithm/sort parallel
stable_sort http://en.cppreference.com/w/cpp/algorithm/stable_sort parallel
stable_partition http://en.cppreference.com/w/cpp/algorithm/stable_partition parallel
swap_ranges http://en.cppreference.com/w/cpp/algorithm/swap_ranges parallel, unsequenced
transform http://en.cppreference.com/w/cpp/algorithm/transform parallel, unsequenced
transform_exclusive_scan http://en.cppreference.com/w/cpp/algorithm/transform_exclusive_scan parallel, unsequenced
transform_inclusive_scan http://en.cppreference.com/w/cpp/algorithm/transform_inclusive_scan parallel, unsequenced
transform_reduce http://en.cppreference.com/w/cpp/algorithm/transform_reduce parallel, unsequenced
uninitialized_copy http://en.cppreference.com/w/cpp/memory/uninitialized_copy parallel, unsequenced
uninitialized_copy_n http://en.cppreference.com/w/cpp/memory/uninitialized_copy_n parallel, unsequenced
uninitialized_default_construct http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct parallel, unsequenced
uninitialized_default_construct_n http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct_n parallel, unsequenced
uninitialized_fill http://en.cppreference.com/w/cpp/memory/uninitialized_fill parallel, unsequenced
uninitialized_fill_n http://en.cppreference.com/w/cpp/memory/uninitialized_fill_n parallel, unsequenced
uninitialized_move http://en.cppreference.com/w/cpp/memory/uninitialized_move parallel, unsequenced
uninitialized_move_n http://en.cppreference.com/w/cpp/memory/uninitialized_move_n parallel, unsequenced
uninitialized_value_construct http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct parallel, unsequenced
uninitialized_value_construct_n http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct_n parallel, unsequenced
unique http://en.cppreference.com/w/cpp/algorithm/unique parallel
unique_copy http://en.cppreference.com/w/cpp/algorithm/unique_copy parallel, unsequenced

Known limitations

unseq and par_unseq policies only have effect with compilers that support #pragma omp simd or #pragma simd.

Parallel and vector execution is only supported for the algorithms if random access iterators are provided, while for other iterator types the execution will remain serial.

Semantics of the following algorithms does not allow unsequenced execution: includes, inplace_merge, merge, set_difference, set_intersection, set_symmetric_difference, set_union, stable_partition, unique

The initial value type for exclusive_scan, inclusive_scan, transform_exclusive_scan, transform_inclusive_scan shall satisfy the DefaultConstructible requirements. A default-constructed instance of the initial value type shall be the identity element for binary_op.

For max_element, min_element, minmax_element, partial_sort, partial_sort_copy, sort, stable_sort the dereferenced value type of the provided iterators shall be DefaultConstructible.

For remove, remove_if, unique the dereferenced value type of the provided iterators shall be MoveConstructible.

The following algorithms require additional O(n) memory space for parallel execution: copy_if, inplace_merge, partial_sort, partial_sort_copy, partition_copy, remove, remove_if, rotate, sort, stable_sort, unique, unique_copy.

Legal Information

Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright 2017-2018 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

 

"