SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
INTRODUCTION TO CUDA
Prepared for Geek Camp Singapore 2011
                                  Raymond Tay
THE FREE LUNCH IS OVER – HERB
SUTTER
WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS




…
NVIDIA GPUS FPS
    FPS – Floating-point per second aka flops. A measure of how
     many flops can a GPU do. More is Better 


                                                   GPUs beat CPUs
NVIDIA GPUS MEMORY BANDWIDTH
    With massively parallel processors in Nvidia’s GPUs, providing
     high memory bandwidth plays a big role in high performance
     computing.

                                                    GPUs beat CPUs
GPU VS CPU




CPU                                  GPU
"   Optimised for low-latency        "   Optimised for data-parallel,
    access to cached data sets           throughput computation
"   Control logic for out-of-order   "   Architecture tolerant of
    and speculative execution            memory latency
                                     "   More transistors dedicated to
                                         computation
I DON’T KNOW C/C++, SHOULD I LEAVE?
                           Your Brain Asks:
                                       Wait a minute, why
  Relax,   no worries. Not to fret.   should I learn the C/
                                       C++ SDK?

                                       CUDA Answers:
                                       Efficiency!!!
WHAT DO I NEED TO BEGIN WITH CUDA?
  A   Nvidia CUDA enabled graphics card e.g. Fermi
HOW DOES CUDA WORK



                                  PCI Bus




1.  Copy input data from CPU memory to
    GPU memory
2.  Load GPU program and execute,
    caching data on chip for performance
3.  Copy results from GPU memory to CPU
    memory
EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array,    __global__ void shift_cypher(unsigned int
   unsigned int *output_array, unsigned int              *input_array, unsigned int *output_array,
   shift_amount, unsigned int alphabet_max,              unsigned int shift_amount, unsigned int
   unsigned int array_length)	
                          alphabet_max, unsigned int array_length)	
{	
                                                  {	
  for(unsigned int i=0;i<array_length;i++)	
           unsigned int tid = threadIdx.x + blockIdx.x *
 {	
                                                      blockDim.x;	

       int element = input_array[i];	
                 int shifted = input_array[tid] + shift_amount;	
       int shifted = element + shift_amount;	
         if ( shifted > alphabet_max )	
       if(shifted > alphabet_max)	
                        	
shifted = shifted % (alphabet_max + 1);	
       {	
         shifted = shifted % (alphabet_max + 1);	
     output_array[tid] = shifted;	
       }	
                                           }	
       output_array[i] = shifted;	
  }	
                                                Int main() {	
}	
                                                  dim3 dimGrid(ceil(array_length)/block_size);	
Int main() {	
                                                     dim3 dimBlock(block_size);	
host_shift_cypher(input_array, output_array,
                                                     shift_cypher<<<dimGrid,dimBlock>>>(input_array,
   shift_amount, alphabet_max, array_length);	
                                                          output_array, shift_amount, alphabet_max,
}	
                                                       array_length);	
                                                     }	
                    CPU                                               GPU
                    Program                                           Program
EXAMPLE: VECTOR ADDITION
 // CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
    unsigned int N)
{
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  if (i < N)
   C[i] = A[i] + B[i];
}

// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
 for( int i = 0; i < N; ++i)
  C[i] = A[i] + B[i];
}
DEBUGGER
              CUDA-GDB	
           • Based on GDB
           • Linux
           • Mac OS X



                             Parallel Nsight	
                            • Plugin inside
                            Visual Studio
VISUAL PROFILER & MEMCHECK
                                 Profiler	
                           •  Microsoft Windows
                           •  Linux
                           •  Mac OS X

                           •  Analyze
                           Performance




     CUDA-MEMCHECK	
    •  Microsoft Windows
    •  Linux
    •  Mac OS X

    •  Detect memory
    access errors
WHERE’S CUDA AT IN 2011?
  60,000 researchers use it to aid drug discovery
  470 universities teach CUDA
WHERE’S CUDA AT IN 2011? (PART 2..)
  NVIDIA   Show Case (1000+ applications)
ADDITIONAL RESOURCES
    CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
    CUDA Tools & Ecosystem (
     http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
    CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
    NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
    GPGPU (http://gpgpu.org )
    CUDA By Example (
     http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
     general-purpose-gpu-programming-0)
         Jason Sanders & Edward Kandrot
    GPU Computing Gems Emerald Edition (
     http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
     0123849888/ )
         Editor in Chief: Prof Hwu Wen-Mei
CUDA LIBRARIES
  Visit this site
   http://developer.nvidia.com/cuda-tools-
   ecosystem#Libraries
  Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
   GPU AI-Tree Search, GPU AI-Path Finding
  A lot of the libraries are hosted in Google Code.
   Many more gems in there too!
THANK YOU
  @RaymondTayBL

Contenu connexe

Tendances

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linuxMiller Lee
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUsShree Kumar
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4nomaddo
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Igalia
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Daniel Lemire
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIINEEVEE Technologies
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 

Tendances (18)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Cuda
CudaCuda
Cuda
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 

En vedette

Toying with spark
Toying with sparkToying with spark
Toying with sparkRaymond Tay
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloodsRaymond Tay
 
Network Security
Network SecurityNetwork Security
Network SecurityMAJU
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and SolutionsColin058
 

En vedette (7)

Toying with spark
Toying with sparkToying with spark
Toying with spark
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network security
Network securityNetwork security
Network security
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
 

Similaire à Introduction to CUDA Programming

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda MoayadMoayadhn
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022ssuser866937
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages toolslaparuma
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 명신 김
 

Similaire à Introduction to CUDA Programming (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
 
Cuda materials
Cuda materialsCuda materials
Cuda materials
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Introduction to CUDA Programming

  • 1. INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011 Raymond Tay
  • 2. THE FREE LUNCH IS OVER – HERB SUTTER
  • 3. WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS …
  • 4. NVIDIA GPUS FPS   FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better  GPUs beat CPUs
  • 5. NVIDIA GPUS MEMORY BANDWIDTH   With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing. GPUs beat CPUs
  • 6. GPU VS CPU CPU GPU "   Optimised for low-latency "   Optimised for data-parallel, access to cached data sets throughput computation "   Control logic for out-of-order "   Architecture tolerant of and speculative execution memory latency "   More transistors dedicated to computation
  • 7. I DON’T KNOW C/C++, SHOULD I LEAVE? Your Brain Asks: Wait a minute, why   Relax, no worries. Not to fret. should I learn the C/ C++ SDK? CUDA Answers: Efficiency!!!
  • 8. WHAT DO I NEED TO BEGIN WITH CUDA?   A Nvidia CUDA enabled graphics card e.g. Fermi
  • 9. HOW DOES CUDA WORK PCI Bus 1.  Copy input data from CPU memory to GPU memory 2.  Load GPU program and execute, caching data on chip for performance 3.  Copy results from GPU memory to CPU memory
  • 10. EXAMPLE: BLOCK CYPHER void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int unsigned int *output_array, unsigned int *input_array, unsigned int *output_array, shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int unsigned int array_length) alphabet_max, unsigned int array_length) { { for(unsigned int i=0;i<array_length;i++) unsigned int tid = threadIdx.x + blockIdx.x * { blockDim.x; int element = input_array[i]; int shifted = input_array[tid] + shift_amount; int shifted = element + shift_amount; if ( shifted > alphabet_max ) if(shifted > alphabet_max) shifted = shifted % (alphabet_max + 1); { shifted = shifted % (alphabet_max + 1); output_array[tid] = shifted; } } output_array[i] = shifted; } Int main() { } dim3 dimGrid(ceil(array_length)/block_size); Int main() { dim3 dimBlock(block_size); host_shift_cypher(input_array, output_array, shift_cypher<<<dimGrid,dimBlock>>>(input_array, shift_amount, alphabet_max, array_length); output_array, shift_amount, alphabet_max, } array_length); } CPU GPU Program Program
  • 11. EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C, unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; } // C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }
  • 12. DEBUGGER CUDA-GDB • Based on GDB • Linux • Mac OS X Parallel Nsight • Plugin inside Visual Studio
  • 13. VISUAL PROFILER & MEMCHECK Profiler •  Microsoft Windows •  Linux •  Mac OS X •  Analyze Performance CUDA-MEMCHECK •  Microsoft Windows •  Linux •  Mac OS X •  Detect memory access errors
  • 14. WHERE’S CUDA AT IN 2011?   60,000 researchers use it to aid drug discovery   470 universities teach CUDA
  • 15. WHERE’S CUDA AT IN 2011? (PART 2..)   NVIDIA Show Case (1000+ applications)
  • 16. ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)   CUDA Tools & Ecosystem ( http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)   CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)   NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)   GPGPU (http://gpgpu.org )   CUDA By Example ( http://tegradeveloper.nvidia.com/content/cuda-example-introduction- general-purpose-gpu-programming-0)   Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition ( http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/ 0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei
  • 17. CUDA LIBRARIES   Visit this site http://developer.nvidia.com/cuda-tools- ecosystem#Libraries   Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding   A lot of the libraries are hosted in Google Code. Many more gems in there too!
  • 18. THANK YOU @RaymondTayBL