Introduction to CUDA Programming

INTRODUCTION TO CUDA
Prepared for Geek Camp Singapore 2011
Raymond Tay

THE FREE LUNCH IS OVER – HERB
SUTTER

WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS

…

NVIDIA GPUS FPS
  FPS – Floating-point per second aka flops. A measure of how
many flops can a GPU do. More is Better 

GPUs beat CPUs

NVIDIA GPUS MEMORY BANDWIDTH
  With massively parallel processors in Nvidia’s GPUs, providing
high memory bandwidth plays a big role in high performance
computing.

GPUs beat CPUs

GPU VS CPU

CPU GPU
"   Optimised for low-latency "   Optimised for data-parallel,
access to cached data sets throughput computation
"   Control logic for out-of-order "   Architecture tolerant of
and speculative execution memory latency
"   More transistors dedicated to
computation

I DON’T KNOW C/C++, SHOULD I LEAVE?
Your Brain Asks:
Wait a minute, why
  Relax, no worries. Not to fret. should I learn the C/
C++ SDK?

CUDA Answers:
Efficiency!!!

WHAT DO I NEED TO BEGIN WITH CUDA?
  A Nvidia CUDA enabled graphics card e.g. Fermi

HOW DOES CUDA WORK

PCI Bus

1.  Copy input data from CPU memory to
GPU memory
2.  Load GPU program and execute,
caching data on chip for performance
3.  Copy results from GPU memory to CPU
memory

EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int
unsigned int *output_array, unsigned int *input_array, unsigned int *output_array,
shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int
unsigned int array_length)
alphabet_max, unsigned int array_length)
{
{
for(unsigned int i=0;i<array_length;i++)
unsigned int tid = threadIdx.x + blockIdx.x *
{
blockDim.x;

int element = input_array[i];
int shifted = input_array[tid] + shift_amount;
int shifted = element + shift_amount;
if ( shifted > alphabet_max )
if(shifted > alphabet_max)

shifted = shifted % (alphabet_max + 1);
{
shifted = shifted % (alphabet_max + 1);
output_array[tid] = shifted;
}
}
output_array[i] = shifted;
}
Int main() {
}
dim3 dimGrid(ceil(array_length)/block_size);
Int main() {
dim3 dimBlock(block_size);
host_shift_cypher(input_array, output_array,
shift_cypher<<<dimGrid,dimBlock>>>(input_array,
shift_amount, alphabet_max, array_length);
output_array, shift_amount, alphabet_max,
}
array_length);
}
CPU GPU
Program Program

EXAMPLE: VECTOR ADDITION
// CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
unsigned int N)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < N)
C[i] = A[i] + B[i];
}

// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
for( int i = 0; i < N; ++i)
C[i] = A[i] + B[i];
}

DEBUGGER
CUDA-GDB
• Based on GDB
• Linux
• Mac OS X

Parallel Nsight
• Plugin inside
Visual Studio

VISUAL PROFILER & MEMCHECK
Profiler
•  Microsoft Windows
•  Linux
•  Mac OS X

•  Analyze
Performance

CUDA-MEMCHECK
•  Microsoft Windows
•  Linux
•  Mac OS X

•  Detect memory
access errors

WHERE’S CUDA AT IN 2011?
  60,000 researchers use it to aid drug discovery
  470 universities teach CUDA

WHERE’S CUDA AT IN 2011? (PART 2..)
  NVIDIA Show Case (1000+ applications)

ADDITIONAL RESOURCES
  CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
  CUDA Tools & Ecosystem (
http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
  CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
  NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
  GPGPU (http://gpgpu.org )
  CUDA By Example (
http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
general-purpose-gpu-programming-0)
  Jason Sanders & Edward Kandrot
  GPU Computing Gems Emerald Edition (
http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
0123849888/ )
  Editor in Chief: Prof Hwu Wen-Mei

CUDA LIBRARIES
  Visit this site
http://developer.nvidia.com/cuda-tools-
ecosystem#Libraries
  Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
GPU AI-Tree Search, GPU AI-Path Finding
  A lot of the libraries are hosted in Google Code.
Many more gems in there too!

Introduction to CUDA Programming

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (7)

Similaire à Introduction to CUDA Programming

Similaire à Introduction to CUDA Programming (20)

Dernier

Dernier (20)

Introduction to CUDA Programming