Code Migration: Before & After

Source CUDA Code

The Intel DPC++ Compatibility Tool migrates software programs implemented with current and previous versions of CUDA. For details, see the release notes.

#include <cuda.h>
#include <stdio.h>

const int vector_size = 256;

__global__ void SimpleAddKernel(float *A, int offset) 
  A[threadIdx.x] = threadIdx.x + offset;
}int main() 
  float *d_A;
  int offset = 10000;

  cudaMalloc( &d_A, vector_size * sizeof( float ) );
  SimpleAddKernel<<<1, vector_size>>>(d_A, offset);

  float result[vector_size] = { };
  cudaMemcpy(result, d_A, vector_size*sizeof(float), cudaMemcpyDeviceToHost);

  cudaFree( d_A );
  for (int i = 0; i < vector_size; ++i) {
    if (i % 8 == 0) printf( "\n" );
    printf( "%.1f ", result[i] );

  return 0;

Migrated Code

This resulting code is typical of what you can expect to see after code is ported. In most cases, code edits and optimizations will be required to complete the code migration.

1 An Intel estimate as of September 2021. It's based on measurements from a set of 70 HPC benchmarks and samples, with examples like Rodinia, SHOC, Pennant. Results may vary.