Intel® oneAPI DPC++/C++ Compiler Release Notes

版本: 2022.1   最后更新日期: 04/13/2022

This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the toolkit from the Base Toolkit Download page and follow the installation instructions to install.

2022.1 Release

New Features and Improvements

  • Added support for the default context extension on Linux*, which returns the current default context for this platform.
  • Added support for SYCL* sub-group mask extension that can be used to efficiently represent subsets of work items in a sub-group.
  • Implemented discard_events extension, which adds ext::oneapi::property::queue::discard_events property for sycl::queue. By using this property, the application informs a SYCL implementation that it will not use the event returned by any queue member functions. When the application creates a queue with this property, the implementation may be able to optimize some operations on the queue.

  • Implemented SYCL 2020 property traits. 

  • Implemented MAX_WORK_GROUP_QUERY extension to add functionally two new device information descriptors that provide the ability to query a device for the maximum numbers of work-groups submitted in each dimension and globally (across all dimensions). 
  • Added experimental support for SYCL group sorting algorithm:
    • joint_sort uses the work items in a group to execute the corresponding algorithm in parallel.
    • sort_over_group performs a sort overvalues held directly by the work-items in a group, and results returned to work-item i represent values that are in position i in the ordered range. 
  • Added support for debugging on the Intel® FPGA Emulation Platform for OpenCL™ software.
  • Added support for FPGA simulation flow on Windows* system.
  • Added support for aocl binedit utility to extract useful information about the compiled FPGA binary. 
  • Added support for latency controls to specify an exact, minimum or maximum latency between read and write accesses on memories and pipes in FPGA.
  • Added support for -Xsdsp-mode=<option> to control FPGA hardware implementation of the supported data types and math functions.
  • Added support for sycl::ext::intel::fpga_loop_fuse<v>(f) and sycl::ext::intel::fpga_loop_fuse_independent<v>(f) functions, which allow fusing adjacent loops in the FPGA code block overriding the compiler profitability analysis of the fusion.

Bug Fixes

  • Fixed a SYCL driver issue concerning the device binary image, which gets corrupted when you use a two-step AOT build, and there is at least a single call to the devicelib function from within the kernel.

Known Issues and Limitations

  • Having MESA OpenCL implementation, which provides no devices on a system, may cause incorrect device discovery. You can disable an OpenCL implementation by removing the /etc/OpenCL/vendor/mesa.icd as a workaround.    
  • Compilation of a SYCL* program may fail on Windows in debug mode if a kernel uses std::array. This is a limitation that we are not planning to resolve. A workaround is to use sycl::buffer, which captures data() of std::array and accesses the data in SYCL kernel through sycl::accessor. You can perform the following as an alternate workaround: 
    1. Dump compiler pipeline execution strings by passing the -### option to the compiler. The compiler prints the internal execution strings of compilation tools. The actual compilation does not happen.
    2. Modify the (usually) first execution string (it should have -fsycl-is-device option) by adding -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the end of the string. Execute all the strings one by one. 
  • -fsycl-dead-args-optimization cannot help eliminate the offset of an accessor even though it is created with no offset specified.
  • Using sycl::queue::prefetch API on Windows might lead to failure due to issues with cuMemPrefetchAsync
  • Default context does not bind to sub-devices created from the root device the context is bound to. You can create a context explicitly using all required sub-devices as a workaround. 
  • Using forward references within a class member of an array type in SYCL device code may result in the segmentation fault. Add static in front of the array to prevent this crash as a workaround. 
  • DPC++ Compiler does not work together with Windows SDK for Windows 11. The latest supported Windows SDK is Windows 10 SDK version 2104 (10.0.20348.0).
  • To run sys_check by the Diagnostics Utility for Intel® oneAPI Toolkits, one has to do one of the following:
    • Add the path to the sys_check file to the DIAGUTIL_PATH environment variable manually:
      export DIAGUTIL_PATH=/opt/intel/oneapi/compiler/latest/sys_check/sys_check.sh:$DIAGUTIL_PATH; diagnostics.py --filter sys_check 
    • Use  -p option of the Diagnostics Utility for Intel® oneAPI Toolkits to add the path to the sys_check file to the DIAGUTIL_PATH environment variable:
      diagnostics.py -p /opt/intel/oneapi/compiler/latest/sys_check/sys_check.sh --filter sys_check
  • Developers who use Microsoft Visual Studio* 2019 should install CMake 3.21.5 or CMake 3.22.2 to use icx correctly with CMake.
  • Since the vectorization is now enabled/disabled by using #pragma vector [no]vecremainder and internal switches, you must place additional #pragma omp simd in the code to achieve performance.
  • The compile times can be significant when compiling for FPGA and using a read-only accessor for a very wide struct. Using a read-write accessor instead is a workaround to address long compile times. 
  • When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed, but the compiled binary might fail at runtime. There is no workaround available for this issue currently. 
  • When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga <other arguments> -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows: 

    dpcpp -fintelfpga <other arguments> -Xshardware -c src/kernel.cpp -o kernel.o
    dpcpp -fintelfpga <other arguments> -Xshardware -kernel.o

  • When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.

  • In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.

  • The modulefiles-setup.sh script is not supported for FPGA in this release. As a workaround, use the setvars.sh script.

  • FPGA optimization reports are not displayed correctly within Microsoft Visual Studio on Windows. Open the report.html file generated in the project directory to view the reports. 

  • On Windows, compiling FPGA designs in a directory with a long path name might fail, and you might see the following error: 
    dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
    NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'

    As a workaround, either compile the design in a directory with a short path name or reset TMP and TEMP environment variables to point to a shorter path (for example, C:\temp). 

  • When compiling for FPGA, the Windows emulator flow using -c to create object files, linking through to an archive file, and then generating an executable from that archive might result in an executable that fails to launch device kernels. As a workaround for this issue, add the -fsycl-device-code-split=none flag to the archive step as shown in the following:

    # generate .obj files 
    dpcpp /EHsc -fintelfpga -c host.cpp device.cpp device_adder.cpp -DFPGA_EMULATOR 
    
    # generate host.a 
    dpcpp -fintelfpga -fsycl-link=image -fsycl-device-code-split=none host.obj device.obj device_adder.obj 
    
    # generate .exe 
    dpcpp -fintelfpga host.a /link /wholearchive 
    
    # emulator executable 
    host.exe
  • When using the atomic_fence function for FPGA, the memory_scope::system constraint is not supported. The broadest scope supported is the memory_scope::device constraint. There is no workaround available for this currently. 

  • When compiling for FPGA on a Linux system, you might see Unable to open zlib library! error message when the compiler cannot detect the zlib library, which comes standard on most Linux OSes. As a workaround for the compiler to detect this library, install a development version of the library by executing one of the following OS-specific commands:

    • Ubuntu 18: sudo apt install zlib1g-dev

    • RHEL 7/CentOS 7: sudo yum install zlib-devel

  • When launching FPGA optimization reports, the compiler might fail to render certain text characters included in the source file. If the reports are crashing, verify whether the source file has any string literals that end in an escaped backslash in the fileJSON object’s content section within the report_data.js file under the reports/lib/ directory. As a workaround for this issue, modify the report_data.js file to escape the unescaped character. For example, change "hello\\" to "hello\\\". 

  • When compiling for FPGA, integer modulo operation for less than or equal to 4 bits ac_int and ac_fixed is unsupported.

  • When compiling for FPGA, the global scope DSP control option -Xsdsp-mode=<> when passed after some other -Xs command option can result in a compiler failure. As a workaround, always pass the -Xsdsp-mode=<> option as the first -Xs command option. 

2022.0 Release

New Features and Improvements

  • Vectorization for OpenMP SIMD was previously supported at O2 or above when OpenMP language features are enabled. It is now supported at O0 and above if OpenMP language features are enabled (e.g., -qopenmp, -qopenmp-simd).
  • -fopenmp-target-simd to enable OpenMP SIMD support for GPU.
  • -fopenmp-target-simdlen=n to specify GPU vector length for OpenMP SIMD loop.
  • Added support for Target in_reduction clause from OpenMP 5.0 standard.
  • Support for masked construct and tile construct from OpenMP 5.1 standard.
  • nowait for asynchronous offloading.

  • Added support for new SYCL 2020 features sycl::logical_and and sycl::logical_or and completed support for Host Task. A complete list of SYCL 2020 features supported can be found here.
  • Added the following DPC++ Extensions:
  • Removed support for deprecated SYCL 1.2.1 APIs as listed here.
  • Support of SYCL half type in the global namespace has been removed to avoid potential conflicts with the user-defined type. This was previously an alias to the sycl::half type. To resolve compilation failures due to missing ::half type sycl::half type must be used directly.
  • Added an experimental feature to speed up incremental build time of DPC++ applications which can be enabled using the compiler option -fsycl-max-parallel-link-jobs=<N>. This option tells the compiler that it can simultaneously spawn up to the specified number of processes to perform actions required to link DPC++ applications.
  • Previous compiler releases included all LLVM tools in its bin directory. When added to PATH, some of these binaries were found to unexpectedly conflict with other LLVM installations on the system, so they are moved to a sibling bin-llvm directory. Compiler drivers (dpcpp/icx/icpx/ifx) are adjusted to find these internal tools as necessary, typically transparently to users. However, we recognize that there may be cases where the tools which are no longer in PATH were being invoked directly in some application Makefiles (or CMake configuration) and this may require adjustment. Refer to <…/bin/>../bin-llvm/README for more details.
  • The compiler now uses the Windows registry as the default mechanism to discover the backend OpenCL ICDs on Windows. OCL_ICD_FILENAMES environment variable is for debugging only and does not work for administrative privileges on Windows. 
  • Added support for Microsoft Visual Studio* 2022.
  • The Intel-specific header aligned_new is no longer included, as the functionality has been superseded by the C++17 aligned operator new feature. The functionality previously provided by aligned_new is now present in new and should be usable without any other changes besides altering the preprocessor include.
  • Added support for the -Xssfcexit-fifo-type=<value> flag to globally control exit FIFO latency of stall-free clusters in FPGA.
  • Added support for the nofusion loop attribute to prevent a loop from fusing with an adjacent loop in FPGA.
  • Added support for the -Xsread-only-cache-size=<N> flag to enable the read-only cache for read-only accessors in FPGA.
  • Deprecated the support for the hls_float data type and replaced it with ap_float data type for FPGA.
  • Added support for open source runtime environment for FPGA.
  • Added support for fast BSP customization flow for FPGA.

Bug Fixes

  • Fixed an issue where dpcpp compiler was generating a temporary source file that is used during host compilation, which appears as a source dependency potentially breaking build environments that closely keep track of files generated during a compilation. 
  • Fixed an issue where sycl::link API could fail to JIT-compile user code if input kernel bundle/s contain more than one device image within them and specialization constants are used.
  • When compiling for FPGA, if you declare kernel names locally, the kernel name is correctly demangled in FPGA optimization reports. 
  • Fixed an FPGA emulator issue where the compiler would fail if you had also installed a oneAPI-specific GPU platform. 

Known Issues and Limitations

  • The latest GPU driver available at https://dgpu-docs.intel.com/ introduces an Ahead-Of-Time (AOT) build issue for OpenMP offload applications running on Gen9 iGPU when using oneAPI compilers. A fix for this issue will be available in the upcoming driver release. 
    For assistance with downgrading to a version of the driver which does not have this issue, contact us via Graphics - Intel Communities.
  • GPU offload applications using extensive multi-threading (>2 threads) may experience hangs or timeout which can be recovered only through a hard reset or power cycling of the system for the following Linux Distributions. The issue occurs when reading/writing data to the Intel GPU while making extensive use of multi-threading due to a defect in older Linux kernels. 
    Kernel/distribution Problem occurs Problem does not occur
    RedHat Enterprise Linux RHEL 8.4 (kernel 4.18.0-305) and older RHEL 8.5 (kernel 4.18.0-348)
    SUSE Linux SLES15 SP3 and older SLES15 SP4 beta
    Ubuntu Linux Ubuntu releases older than 20.04.03 Ubuntu 20.04.03 (kernel 5.11.0-40-generic #44~20.04.2-ubuntu)*

    Preferred Workaround: Upgrade to a Linux distribution where the defect has been fixed. Note that the software will run, but a warning message will appear in kernel logs.
    GPU software for Ubuntu 20.04.03 is available now via https://dgpu-docs.intel.com. Note that the software will run, but a warning message will appear in kernel logs.
    GPU software for RHEL 8.5. will be available in Q1 2022 at the same location.
    GPU software for SLES15 SP4 will be available shortly after the general availability of SLES15 SP4.
    Alternative Workaround: Do not use extensive multi-threading in GPU-enabled applications, i.e. keep the number of threads no more than 2. For example, for applications using the oneAPI MPI library, use the single-threaded version of the MPI run-time library, rather than the multi-threaded version. Set the environment variable I_MPI_THREAD_SPLIT=0 to use the single-threaded version of MPI.
  • The OpenMP default loop schedule modifier for work-sharing loop constructs was changed to nonmonotonic when the schedule kind is dynamic or guided to conform to the OpenMP 5.0 standard. User code that assumes monotonic behavior may not work correctly with this change. Users can add the monotonic schedule modifier in the schedule clause to keep the previous code behavior.
  • Performance degradation is expected with SYCL 2020 barriers compared to barriers in SYCL 1.2.1. The issue is currently under investigation and is expected to be fixed in a future release.
  • When using a two-step Ahead of Time (AOT) compilation with at least a single call to devicelib function from within the kernel, the device binary image may get corrupted. 
  • The alignment of allocation requests is limited to 64 KB due to limited support by Level Zero Runtime. 
  • SYCL 2020 Specialization constants feature has the following  limitations:
    • Building a program, which uses specialization constants for both JIT and AOT targets at the same time could result in an exception thrown with the following message: Native API failed. Native API returns: -49 (CL_INVALID_ARG_INDEX) -49 (CL_INVALID_ARG_INDEX).
    • Setting specialization constant value to zero is ignored by DPC++ runtime in the non-AOT scenario, i.e. when -fsycl-targets command-line option is not passed or when spir64 is the target. Following is an example code demonstrating the issue. There is currently no workaround.

      specialization_id<int> spec_id(42); // ... queue q; q.submit(handler &cgh) { cgh.set_specialization_constant<spec_id>(0); // spec_id will still have value 42 cgh.set_specialization_constant<spec_id>(41); // spec_id value will be changed to 41 cgh.set_specialization_constant<spec_id>(0); // spec_id will still have value 41 }

    • In AOT mode, setting default values on padded objects can cause misalignment in other default values. This may cause specialization constants to have the wrong default values. For example:

      struct PaddedStruct { uint32_t a; char b; constexpr PaddedStruct() : a(0), b('a') {} constexpr PaddedStruct(uint32_t _a, char _b) : a(_a), b(_b) {} }; constexpr specialization_id<PaddedStruct> padded_struct_spec_id{20, 'c'}; constexpr specialization_id<bool> bool_spec_id{true};

      In this, PaddedStruct has a size of 8 bytes, 3 of which are padding. This can cause the specialization constant identified by bool_spec_id not to have default value of true. A known workaround to this issue is to remove the padding from a padded object by adding __attribute__((packed)) to class or struct, i.e PaddedStruct becomes:

      struct __attribute__((packed)) PaddedStruct { uint32_t a; char b; constexpr PaddedStruct() : a(0), b('a') {} constexpr PaddedStruct(uint32_t _a, char _b) : a(_a), b(_b) {} };

  • Usage of compiler option -Qlong-double on Windows* has limitations when using with the latest Microsoft Visual Studio* releases, detailed information is available here.
  • The error of undefined reference to sinpif and cospif functions such as Compilation from IR - skipping loading of FCL error: undefined reference to `sinpif' without them being used in application code is caused by a compiler optimization phase. Workaround is to use compiler flags -mllvm -enable-transform-sin-cos=0 which disables the faulty optimization.
  • Using #pragma omp declare simd on a member template is currently not supported and can lead to the error "error: function declaration is expected after 'declare simd' directive`. Non-template member functions and template functions that are not a member of a class are not affected. 
  • Using Microsoft Visual Studio* as a host compiler for DPC++ with C++17 enabled causes the error C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\include\sycl\CL/sycl/ONEAPI/accessor_property_list.hpp(199): error C2686: cannot overload static and non-static member functions with the same parameter types. Refer to the article here on how to workaround this issue.
  • USM support for implicit migrations of shared allocations between device and host is currently implemented in SW using access violation mechanisms (e.g. SIGSEV) to identify access from the host. Undefined behavior may occur if applications rely on similar access-violation mechanisms, or they use system calls to access shared-memory allocations before being migrated to the host by the GPU driver.
  • icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
  • Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements
  • User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
  • #pragma float_control that occurs at file scope is not correctly effective for statement blocks that are nested within class definitions. The same issue exists for #pragma clang fp.
  • When debugging FPGA emulator code in Microsoft Visual Studio* on a Windows* system, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently. 
  • The compile times can be significant when compiling for FPGA and using a read-only accessor for a very wide struct. Using a read-write accessor instead is a workaround to address long compile times. 
  • When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed, but the compiled binary might fail at runtime. There is no workaround available for this issue currently. 
  • When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga <other arguments> -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows: 

    dpcpp -fintelfpga <other arguments> -Xshardware -c src/kernel.cpp -o kernel.o
    dpcpp -fintelfpga <other arguments> -Xshardware -kernel.o

  • When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.

  • In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.

  • The modulefiles-setup.sh script is not supported for FPGA in this release. As a workaround, use the setvars.sh script.

  • FPGA optimization reports are not displayed correctly within Microsoft Visual Studio on Windows. Open the report.html file generated in the project directory to view the reports. 

  • On Windows, compiling FPGA designs in a directory with a long path name might fail, and you might see the following error: 
    dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
    NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'

    As a workaround, either compile the design in a directory with a short path name or reset TMP and TEMP environment variables to point to a shorter path (for example, C:\temp). 

  • On Windows system, when compiling for FPGA emulator flow, using -c to create object files, linking through to an archive file, and generating an executable from that archive might result in an executable that fails to launch device kernels. As a workaround for this issue, add the -fsycl-device-code-split=none flag to the archive step as shown in the following:

    # generate .obj files
    dpcpp /EHsc -fintelfpga -c host.cpp device.cpp device_adder.cpp -DFPGA_EMULATOR

    # generate host.a
    dpcpp -fintelfpga -fsycl-link=image -fsycl-device-code-split=none host.obj device.obj device_adder.obj

    # generate .exe
    dpcpp -fintelfpga host.a /link /wholearchive

    # emulator executable
    host.exe

  • When using the atomic_fence function for FPGA, the memory_scope::system constraint is not supported. The broadest scope supported is the memory_scope::device constraint. There is no workaround available for this currently. 

  • When compiling for FPGA on a Linux system, you might see Unable to open zlib library! error message when the compiler cannot detect the zlib library, which comes standard on most Linux OSes. As a workaround for the compiler to detect this library, install a development version of the library by executing one of the following OS-specific commands:

    • Ubuntu 18: sudo apt install zlib1g-dev

    • RHEL 7/CentOS 7: sudo yum install zlib-devel

  • When launching FPGA optimization reports, the compiler might fail to render certain text characters included in the source file. If the reports are crashing, verify whether the source file has any string literals that end in an escaped backslash in the fileJSON object’s content section within the report_data.js file under the reports/lib/ directory. As a workaround for this issue, modify the report_data.js file to escape the unescaped character. For example, change "hello\\" to "hello\\\".

System Requirements

Additional Documentation

Previous oneAPI Releases

Notices and Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

产品和性能信息

1

性能因用途、配置和其他因素而异。请访问 www.Intel.cn/PerformanceIndex 了解更多信息。