Intel® Architecture Code Analyzer

ID 标签 689548
已更新 4/22/2019
版本 Latest
公共

author-image

作者

April 2019: Intel® Architecture Code Analyzer has reached its End Of Life. Users may want to try LLVM-MCA. This is NOT a recommendation to use LLVM-MCA nor a comment on its accuracy or usefulness. Thanks for being faithful users of Intel Architecture Code Analyzer throughout the years. We hope it was useful for you.

Download

Version 3.0 is out! This is a rewrite of the tool with some improved features.

Product Overview

Intel® Architecture Code Analyzer helps you statically analyze the data dependency, throughput and latency of code snippets on Intel® microarchitectures. The term kernel is used throughout the rest of this document instead of code snippet.

Features and Benefits

For a given binary, Intel Architecture Code Analyzer:

  • Performs static analysis of kernel throughput and latency under ideal front-end, out-of-order engine and memory hierarchy conditions.
  • Identifies the binding of the kernel instructions to the processor ports.
  • Identifies kernel critical path.

The Intel Architecture Code Analyzer enables you to do a first order estimate of relative kernel performance on different micro architectures. The Intel® Architecture Code Analyzer does not provide absolute performance numbers.

Intel Architecture Code Analyzer is a command-line tool with ASCII output. It handles one or more kernels that are marked for analysis within an executable, a shared library, or an object file.

Throughput Analysis

The Throughput Analysis treats the kernel as a body of an infinite loop. It computes the kernel throughput and highlights its bottlenecks.

The Throughput Analysis report contains the following whole kernel information:

  • Throughput of the analyzed kernel, counted in cycles.
    • The kernel bottleneck: front-end, port #, divider unit or inter-iteration dependency.
    • Total number of cycles each processor port was bound with micro-ops.

The Throughput Analysis also provides the following information per instruction:

  • Number of instruction micro-ops.
  • Average number of cycles the instruction was bound to each processor port, per loop iteration
  • An indication whether the instruction is on the critical path of the analyzed kernel.
  • Instruction disassembly in Intel® Software Developer’s Manual (MASM) style.

Technical Requirements

Intel Architecture Code Analyzer is a command-line utility that can analyze a kernel, contained in a binary file, that is delimited with special markers. The tool is capable of analyzing Intel® 64 code, including Intel® AVX, AVX2 and AVX-512 instructions.

Intel Architecture Code Analyzer is available on Windows*, Linux*, and MacOS* operating systems. Only Intel® 64 operating systems are supported.

Release Notes for 3.0

  • Version 3.0 is a rewrite of Intel Architecture Code Analyzer. No new microarchitectures are added, but the UI changed. For example, port columns in the output are now large enough to accommodate up to 99.9 cycles per port. This may affect tools that automatically process Intel Architecture Code Analyzer output.
  • The tool now accepts the -trace <file> switch which generates an Intel Architecture Code Analyzer trace directly to a file without the need for post-processing. A separate switch (-trace-cycle-count) can be used to control how many cycles to trace.
  • Various switches were deprecated; See user guide.

Release Notes for 2.3

  • Added support for Intel® microarchitecture code name Skylake (client and server).
  • Added support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
  • Added support for tracing the execution (see user guide).
  • Dropped the -no_interiteration flag.

Release Notes for 2.2

  • Added support for Intel® microarchitecture code name Broadwell.
  • Better support for Intel® Advanced Vector Extensions (Intel® AVX) Gather operations.
  • Replaced the "InterIteration" throughput bottleneck indication with a more general "long dependency chains" indication.
  • Added an indication when front end bubbles occur (see user guide).
  • Numerous improvements in modelling supported processors.
  • Unsupported instructions are now marked with 'X' instead of '!' for better readability.
  • NHM, WSM microarchitectures are not actively supported any more.
  • Removed support for running Intel Architecture Code Analyzer on 32 bit operating systems and for analyzing 32 bit programs.
  • Dropped latency analysis support.
  • Added support for Windows* OS.

Release Notes for 2.1

  • Added support for Intel® microarchitecture codenamed Haswell.
  • Added support for Microsoft Visual Studio* 64 compiler.
  • Added 64-bit binaries.

Release Notes for 2.0.1

  • Fixed a bug where –graph option failed to produce graph file.

Release Notes for 2.0

  • Added support for Intel® microarchitecture codenamed Sandy Bridge. This replaces the Intel® AVX microarchitecture previously in Intel Architecture Code Analyzer.
  • Added support for Intel® microarchitecture codenamed Ivy Bridge.
  • Added support for MacOS*.
  • Improved analyzer algorithm for throughput analysis
    (new analysis output, see more details in User Manual)
  • Improved analyzer algorithm for latency analysis, output also includes microarchitecture events that will affect the latency. (new analysis output, see more details in the User Manual)
  • Added support for graphic output of the dependency graph

Release Notes for 1.1.3

  • Fixed a bug where using -o option produced truncated output
  • Fixed IACA_UD_BYTES definition in iacaMarks.h to include {}.

Release Notes for 1.1.2

  • Intel Architecture Code Analyzer now supports adding START and END marks in code compiled with Microsoft Visual C++ Compiler* (64-bit). See iacaMarks.h
  • Intel Architecture Code Analyzer now supports multiple block analysis. You can direct the tool to analyze the nth block that is delimited with analyzer marks. When used with n=0, all surrounded blocks in the file are analyzed and the output contains separate reports per block.

Release Notes for 1.1.1

  • Fixed Intel AVX zero idiom instructions wrong identification
  • Fixed empty code blocks (containing only zero idiom instructions / not supported instructions) crashing the analyzer
  • Fixed Analyzer arch nehalem option to treat AES and PCLMUL instructions as illegal. These aren't supported on Intel® microarchitecture codename Nehalem.
  • Changed analyzer marks to abort if the binary is executed. To deactivate the marks when building for execution #define IACA_MARKS_OFF or use -DIACA_MARKS_OFF option in the compiler command line. Binaries with active marks should be used for analysis only.

Release Notes for 1.1

  • Intel Architecture Code Analyzer is now hosted on Linux* operating systems, in addition to Windows* operating systems. Both IA-32 and Intel® 64 operating systems are supported.
  • Intel Architecture Code Analyzer now supports two existing Intel® processors: Intel microarchitecture codenamed Nehalem and Westmere
  • Two critical path types are detected:
    • DATA_DEPENDENCY critical path (similar to previous releases - reflects instruction data dependencies only)
    • PERFORMANCE critical path (new - reflects port conflicts and front-end pressure, as well)

Release Notes for 1.0.2

  • Ignoring pop ebx / push ebx that Intel Architecture Code Analyzer Markers add to IA32 code
  • Fixed misclassifying rcp / rsqrt as divider operations

Release Notes for 1.0.1

  • Graceful handling of unsupported instructions, they are quietly ignored in the analyzed block analysis and do not impact the throughput and latency calculations.
  • A few unsupported instructions are now supported, e.g. CMOV instruction family
  • Intel AVX to Intel® SSE code switch detection. The performance penalty associated with such code switch is noted but not accounted for.
     

Additional Resources

"