cuda kernel answer and explanation

CUDA kernel Assignment Answers

Question:

Consider the following CUDA kernel and the corresponding host function that

𝑏[𝑖]=𝑎[𝑖]+1;

}

𝑏[𝑖]+=𝑗;

}

a. What is the number of warps per block?

iii. What is the SIMD efficiency (in %) of warp 0 of block 0 ?

iv. What is the SIMD efficiency (in %) of warp 1 of block 0 ?

iii. What is the SIMD efficiency (in %) of warp 0 of block 0 ?

e. For the loop on line 09 :

Kernel Analysis

Given kernel code (formatted for clarity):

b[i] = a[i] + 1;

}

for (unsigned int j = b[i]; j < i; ++j) {

Host function calling the kernel:

void foo(int *a_d, int *b_d) {

a. Number of warps per block:

- Each block has `blockDim.x = 256` threads.

c. For the statement on line 4:

i. Active warps in the grid:

- Divergent warps = `8192 / (256 / 32) * 7` = 224 divergent warps.

iii. SIMD efficiency of warp 0 of block 0:

v. SIMD efficiency of warp 3 of block 0:

- Assuming warp 3 has no divergent threads, efficiency = 100%.

- Divergence occurs if `a[i] != b[i]`.

- Divergent warps calculation is similar to before, accounting for the condition.

- Iterations where the condition `a[i] == b[i]` holds true.

ii. Iterations with divergence: