Data movement moving data between gpu and cpu incurs overhead

Compute Unified Device Architecture Assignment Answers

Question:

a) What are CUDA’s limitations on inter-thread (between thread) communication

Compute Unified Device Architecture Answer and Explanation

3. Limited Communication Across Blocks: CUDA threads in different blocks cannot directly communicate or synchronize with each other. Inter-block communication typically requires using global memory, which can be slower due to higher latency compared to shared memory access.

4. Global Memory Access: While global memory allows communication between threads in different blocks, it's significantly slower than shared memory due to its higher latency and lower bandwidth.