cuda kernel answer and explanation
CUDA kernel Assignment Answers
Question:
Consider the following CUDA kernel and the corresponding host function that
𝑏[𝑖]=𝑎[𝑖]+1;
}
𝑏[𝑖]+=𝑗;
}
}
a. What is the number of warps per block?
iii. What is the SIMD efficiency (in %) of warp 0 of block 0 ?
iv. What is the SIMD efficiency (in %) of warp 1 of block 0 ?
iii. What is the SIMD efficiency (in %) of warp 0 of block 0 ?
e. For the loop on line 09 :
CUDA kernel Answer and Explanation
Kernel Analysis
Given kernel code (formatted for clarity):
b[i] = a[i] + 1;
}
for (unsigned int j = b[i]; j < i; ++j) {
Host function calling the kernel:
void foo(int *a_d, int *b_d) {
a. Number of warps per block:
- Each block has `blockDim.x = 256` threads.
c. For the statement on line 4:
i. Active warps in the grid:
- Divergent warps = `8192 / (256 / 32) * 7` = 224 divergent warps.
iii. SIMD efficiency of warp 0 of block 0:
v. SIMD efficiency of warp 3 of block 0:
- Assuming warp 3 has no divergent threads, efficiency = 100%.
- Divergence occurs if `a[i] != b[i]`.
- Divergent warps calculation is similar to before, accounting for the condition.
- Iterations where the condition `a[i] == b[i]` holds true.
ii. Iterations with divergence: