仅对英特尔可见 — GUID: zru1523293789016
Ixiasoft
1. 关于本文档
2. 引言
3. 设置主机(Setting Up the Host Machine)
4. 运行诊断程序(Running Diagnostics)
5. 对多卡系统的 OpenCL* 支持
6. 运行示例(Running Samples)
7. 编译OpenCL内核(Compiling OpenCL Kernels)
8. 运行一个OpenCL设计实例
9. OpenCL* on the Intel® PAC with Intel® Arria® 10 GX FPGA快速入门用户指南存档
10. OpenCL* on Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA快速入门用户指南的文档修订历史
11. 禁用非统一存储器访问(NUMA)和DMA工作线程以优化 PCIe* 带宽
仅对英特尔可见 — GUID: zru1523293789016
Ixiasoft
4. 运行诊断程序(Running Diagnostics)
在运行诊断程序之前,请加载一个 OpenCL* 内核(kernel)到板级。以下说明使用hello_world内核(kernel),或者您也可以使用自己的内核(kernel)。
- 加载hello_world OpenCL* 内核(kernel):
$ aocl program acl0 $OPAE_PLATFORM_ROOT/opencl/hello_world.aocx
示例程序输出:aocl program: Running program from $OPAE_PLATFORM_ROOT/opencl/opencl_bsp \ /linux64/libexec Program succeed.
- 运行简单诊断实用程序:
$ aocl diagnose
诊断程序输出示例:------------------------------------------------ Device Name: acl0 Package Pat: $OPAE_PLATFORM_ROOT/opencl/opencl_bsp Vendor: Intel Corp Phys Dev Name Status Information pac_a10_f200000 Passed PAC Arria 10 Platform (pac_a10_f200000) PCIe 04:00.0 FPGA temperature = 46 degrees C. DIAGNOSTIC_PASSED ---------------------------------------------------------
- 运行高级诊断程序:
$ aocl diagnose acl0
高级诊断程序输出示例:aocl diagnose: Running diagnose from $OPAE_PLATFORM_ROOT/opencl \ /opencl_bsp/linux64/libexec Using platform: Intel(R) FPGA SDK for OpenCL(TM) Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_f200000) Using Device from vendor: Intel Corp clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592 clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8588886016 Memory consumed for internal use = 1048576 Actual maximum buffer size 8588886016 bytes Writing 8191 MB to global memory... Allocated 1073741824 Bytes host buffer for large transfers Write speed: 5447.76 MB/s [5100.38 -> 5710.86] Reading and verifying 8191 MB from global memory ... Read speed: 6319.11 MB/s [5829.62 -> 6815.82] Successfully wrote and readback 8191 MB buffer Transferring 262144 KBs in 512 512 KB blocks ... 3295.09 MB/s Transferring 262144 KBs in 256 1024 KB blocks ... 3465.62 MB/s Transferring 262144 KBs in 128 2048 KB blocks ... 4173.86 MB/s Transferring 262144 KBs in 64 4096 KB blocks ... 5069.94 MB/s Transferring 262144 KBs in 32 8192 KB blocks ... 5084.80 MB/s Transferring 262144 KBs in 16 16384 KB blocks ... 5538.76 MB/s Transferring 262144 KBs in 8 32768 KB blocks ... 6165.23 MB/s Transferring 262144 KBs in 4 65536 KB blocks ... 6536.86 MB/s Transferring 262144 KBs in 2 131072 KB blocks ... 6320.60 MB/s Transferring 262144 KBs in 1 262144 KB blocks ... 6619.78 MB/s As a reference: PCIe Gen1 peak speed: 250MB/s/lane PCIe Gen2 peak speed: 500MB/s/lane PCIe Gen3 peak speed: 985MB/s/lane Writing 262144 KBs with block size (in bytes) below: Block_Size Avg Max Min End-End (MB/s) 524288 2509.11 3295.09 1693.93 2018.67 1048576 2543.70 3087.25 1656.82 2279.26 2097152 3634.87 4173.86 2265.05 3410.79 4194304 4548.67 5069.94 3939.32 4362.32 8388608 4813.88 5084.80 4089.09 4722.04 16777216 5266.92 5446.97 4821.61 5206.11 33554432 4818.27 5226.23 3681.99 4792.34 67108864 4964.35 5662.74 4123.11 4952.34 134217728 4367.72 4640.88 4124.93 4366.66 268435456 4546.45 4546.45 4546.45 4546.45 Reading 262144 KBs with block size (in bytes) below: Block_Size Avg Max Min End-End (MB/s) 524288 2487.06 3038.19 1757.40 2015.28 1048576 2934.13 3465.62 2241.64 2613.45 2097152 3485.74 3673.13 2820.99 3296.42 4194304 3406.50 3629.74 3040.80 3300.23 8388608 4474.60 4589.06 4241.70 4378.74 16777216 5289.71 5538.76 5081.67 5219.55 33554432 6014.68 6165.23 5686.37 5976.21 67108864 6440.31 6536.86 6365.68 6421.60 134217728 6106.75 6320.60 5906.89 6098.65 268435456 6691.78 6691.78 6691.78 6691.78 Write top speed = 5662.74 MB/s Read top speed = 6691.78 MB/s Throughput = 6177.26 MB/s DIAGNOSTIC_PASSED